EugenyHo

Thursday, August 2, 2012

London's bridge is falling down...

OMG! Last few days one of my LUN at PVE cluster is falling down after night backup!

Symptoms:



root@cl02-n02:~# pvscan

/dev/NAS01LUN0VG0/vzsnap-cl02-n02-0: read failed after 0 of 4096 at 21478965248: Input/output error

/dev/NAS01LUN0VG0/vzsnap-cl02-n02-0: read failed after 0 of 4096 at 21479022592: Input/output error

/dev/NAS01LUN0VG0/vzsnap-cl02-n02-0: read failed after 0 of 4096 at 0: Input/output error

/dev/NAS01LUN0VG0/vzsnap-cl02-n02-0: read failed after 0 of 4096 at 4096: Input/output error

   /dev/sdb: Checksum error

PV /dev/sdk    VG NAS01LUN9VG0    lvm2 [2.00 TiB / 2.00 TiB free]

PV /dev/sdj    VG NAS01LUN8VG0    lvm2 [2.00 TiB / 2.00 TiB free]

PV /dev/sdi    VG NAS01LUN7VG0    lvm2 [511.98 GiB / 511.98 GiB free]

PV /dev/sdh    VG NAS01LUN6VG0    lvm2 [511.98 GiB / 511.98 GiB free]

PV /dev/sdg    VG NAS01LUN5VG0    lvm2 [255.99 GiB / 159.99 GiB free]

PV /dev/sdf    VG NAS01LUN4VG0    lvm2 [255.99 GiB / 157.99 GiB free]

PV /dev/sde    VG NAS01LUN3VG0    lvm2 [127.99 GiB / 29.99 GiB free]

PV /dev/sdd    VG NAS01LUN2VG0    lvm2 [127.99 GiB / 95.99 GiB free]

PV /dev/sdc    VG NAS01LUN1VG1    lvm2 [127.99 GiB / 26.99 GiB free]

PV /dev/sda2   VG pve             lvm2 [297.59 GiB / 16.00 GiB free]

PV /dev/sdb                       lvm2 [128.00 GiB]

Total: 11 [6.29 TiB] / in use: 10 [6.17 TiB] / in no VG: 1 [128.00 GiB]

SMS notification from "The Dude" monitoring system

For now at my server's farm I use perfect (and free) monitoring system named "The dude". I think its wonderful soft because its easy to use, easy to setup and perfectly working at linux host under wine, and have a package (npk) for Mikrotik RouterOS.

HA Storage cluster setup process

Here is step-by-step instruction to do HA storage cluster showed New Big Picture.

New Big Picture

Finally I assemble my HA storage cluster, which passed my tests! Here is:

Friday, April 13, 2012

Disappointment

So, finaly I must agree with El Di Pablo that the GlusterFS is Not Ready For HA SAN Storage. Because my HA storage with last config could not pass my failover test.

It was better than the first time: after rebooting first node, the iSCSI target still available and VM keep to write data, but after boot first node, and reboot second node iSCSI target was lost, and finaly all data on storage was corrupted. OMG!

Well, Now I have decided to leave GlusterFS and use the classical solution: Ubuntu+DRBD(master/slave)+IET+Heartbeat. Yes, I know about possible problems with perfomance. But for now, I think, it is only one sutable HA solution (available without payment) for production purpose.

Thursday, April 5, 2012

Test#2

Today the updated configuration to the nodes of my HA Storage. Created a virtual machine and started to install ubuntu on it. During installation, I rebooted the active node. As a result, the PM node has received:


root@cl02-n01:~# pvscan

  /dev/NASLUN0VG0/vm-100-disk-1: read failed after 0 of 4096 at 34359672832: Input/output error

  /dev/NASLUN0VG0/vm-100-disk-1: read failed after 0 of 4096 at 34359730176: Input/output error

  /dev/NASLUN0VG0/vm-100-disk-1: read failed after 0 of 4096 at 0: Input/output error

  /dev/NASLUN0VG0/vm-100-disk-1: read failed after 0 of 4096 at 4096: Input/output error

  PV /dev/sdb    VG NASLUN0VG0   lvm2 [4.00 TiB / 3.97 TiB free]

  PV /dev/sda2   VG pve          lvm2 [297.59 GiB / 16.00 GiB free]

  Total: 2 [4.29 TiB] / in use: 2 [4.29 TiB] / in no VG: 0 [0   ]

And VM is hungup.

So, after rebooted node returned to online, I rebooted the second one. At PM node has received:


root@cl02-n01:~# pvscan

  Found duplicate PV 4hKm0l1uebbcn5s3eV3ZUT6nb6exKjOD: using /dev/sdc not /dev/sdb

  PV /dev/sdc    VG NASLUN0VG0   lvm2 [4.00 TiB / 3.97 TiB free]

  PV /dev/sda2   VG pve          lvm2 [297.59 GiB / 16.00 GiB free]

  Total: 2 [4.29 TiB] / in use: 2 [4.29 TiB] / in no VG: 0 [0   ]

Seems to be /dev/NASLUN0VG0/vm-100-disk-1 is available now, but VM is in hungup stage anyway. At PM GUI about my VM:


Status: running

CPU usage: 50% of 2CPUs (im use 2 CPU at config)

PS.
At active Storage node dmesg show:

...

[ 3984.639799] iscsi_trgt: scsi_cmnd_start(972) e000080 25
[ 3984.665551] iscsi_trgt: cmnd_skip_pdu(475) e000080 1c 25 0
[ 3984.690853] iscsi_trgt: scsi_cmnd_start(972) f000080 25
[ 3984.716664] iscsi_trgt: cmnd_skip_pdu(475) f000080 1c 25 0
[ 3984.741993] iscsi_trgt: scsi_cmnd_start(972) 10000080 25
[ 3984.767753] iscsi_trgt: cmnd_skip_pdu(475) 10000080 1c 25 0
[ 3984.793068] iscsi_trgt: scsi_cmnd_start(972) 11000080 1a
[ 3984.818886] iscsi_trgt: cmnd_skip_pdu(475) 11000080 1c 1a 0
[ 3984.844352] iscsi_trgt: scsi_cmnd_start(972) 12000080 1a
[ 3984.870156] iscsi_trgt: cmnd_skip_pdu(475) 12000080 1c 1a 0
[ 3984.895546] iscsi_trgt: scsi_cmnd_start(972) 13000080 1a
[ 3984.921360] iscsi_trgt: cmnd_skip_pdu(475) 13000080 1c 1a 0
[ 3984.947039] iscsi_trgt: scsi_cmnd_start(972) 14000080 1a
...
and these messages more and more

Wednesday, April 4, 2012

Lets change GlusterFS config!

As a result, the first test showed that the storage reboot the cluster nodes with the current configuration causes data corruption. As a result, the virtual machine image becomes corrupted.I think the problem is in GlusterFS synchonizing and self-heal mechanism. My other thought is that maybe the cause of damage in the use of "thin provisioning". I'll check this option if the new configuration will not work correctly.

Now will change setup for my GlusterFS server and client using AFR translator:

Serever config on NAS01-NODE01 (172.16.0.1):


##############################################

###  GlusterFS Server Volume Specification  ##

##############################################



# dataspace on node1

volume gfs-ds

  type storage/posix

  option directory /data

end-volume



# posix locks

volume gfs-ds-locks

  type features/posix-locks

  subvolumes gfs-ds

end-volume



# dataspace on node2

volume gfs-node2-ds

  type protocol/client

  option transport-type tcp/client

  option remote-host 172.16.0.2         # storage network

  option remote-subvolume gfs-ds-locks

  option transport-timeout 10           # value in seconds; it should be set relatively low

end-volume



# automatic file replication translator for dataspace

volume gfs-ds-afr

  type cluster/afr

  subvolumes gfs-ds-locks gfs-node2-ds  # local and remote dataspaces

end-volume



# the actual exported volume

volume gfs

  type performance/io-threads

  option thread-count 8

  option cache-size 64MB

  subvolumes gfs-ds-afr

end-volume



# finally, the server declaration

volume server

  type protocol/server

  option transport-type tcp/server

  subvolumes gfs

  # storage network access only

  option auth.ip.gfs-ds-locks.allow 172.16.0.*,127.0.0.1

  option auth.ip.gfs.allow 172.16.0.*

end-volume

Serever config on NAS01-NODE02 (172.16.0.2):


##############################################

###  GlusterFS Server Volume Specification  ##

##############################################



# dataspace on node2

volume gfs-ds

  type storage/posix

  option directory /data

end-volume



# posix locks

volume gfs-ds-locks

  type features/posix-locks

  subvolumes gfs-ds

end-volume



# dataspace on node1

volume gfs-storage1-ds

  type protocol/client

  option transport-type tcp/client

  option remote-host 172.16.0.1         # storage network

  option remote-subvolume gfs-ds-locks

  option transport-timeout 10           # value in seconds; it should be set relatively low

end-volume



# automatic file replication translator for dataspace

volume gfs-ds-afr

  type cluster/afr

  subvolumes gfs-ds-locks gfs-node1-ds  # local and remote dataspaces

end-volume



# the actual exported volume

volume gfs

  type performance/io-threads

  option thread-count 8

  option cache-size 64MB

  subvolumes gfs-ds-afr

end-volume



# finally, the server declaration

volume server

  type protocol/server

  option transport-type tcp/server

  subvolumes gfs

  # storage network access only

  option auth.ip.gfs-ds-locks.allow 172.16.0.*,127.0.0.1

  option auth.ip.gfs.allow 172.16.0.*

end-volume

Client config on both nodes:


#############################################

##  GlusterFS Client Volume Specification  ##

#############################################



# the exported volume to mount                    # required!

volume cluster

  type protocol/client

  option transport-type tcp/client

  option remote-host 172.16.0.1 / or .2 for node2 !!!

  option remote-subvolume gfs                     # exported volume

  option transport-timeout 10                     # value in seconds, should be relatively low

end-volume



# performance block for cluster                   # optional!

volume writeback

  type performance/write-behind

  option aggregate-size 131072

  subvolumes cluster

end-volume



# performance block for cluster                   # optional!

volume readahead

  type performance/read-ahead

  option page-size 65536

  option page-count 16

  subvolumes writeback

end-volume