Thursday, August 2, 2012

London's bridge is falling down...

OMG! Last few days one of my LUN at PVE cluster is falling down after night backup!

Symptoms:

root@cl02-n02:~# pvscan
/dev/NAS01LUN0VG0/vzsnap-cl02-n02-0: read failed after 0 of 4096 at 21478965248: Input/output error
/dev/NAS01LUN0VG0/vzsnap-cl02-n02-0: read failed after 0 of 4096 at 21479022592: Input/output error
/dev/NAS01LUN0VG0/vzsnap-cl02-n02-0: read failed after 0 of 4096 at 0: Input/output error
/dev/NAS01LUN0VG0/vzsnap-cl02-n02-0: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdb: Checksum error
PV /dev/sdk VG NAS01LUN9VG0 lvm2 [2.00 TiB / 2.00 TiB free]
PV /dev/sdj VG NAS01LUN8VG0 lvm2 [2.00 TiB / 2.00 TiB free]
PV /dev/sdi VG NAS01LUN7VG0 lvm2 [511.98 GiB / 511.98 GiB free]
PV /dev/sdh VG NAS01LUN6VG0 lvm2 [511.98 GiB / 511.98 GiB free]
PV /dev/sdg VG NAS01LUN5VG0 lvm2 [255.99 GiB / 159.99 GiB free]
PV /dev/sdf VG NAS01LUN4VG0 lvm2 [255.99 GiB / 157.99 GiB free]
PV /dev/sde VG NAS01LUN3VG0 lvm2 [127.99 GiB / 29.99 GiB free]
PV /dev/sdd VG NAS01LUN2VG0 lvm2 [127.99 GiB / 95.99 GiB free]
PV /dev/sdc VG NAS01LUN1VG1 lvm2 [127.99 GiB / 26.99 GiB free]
PV /dev/sda2 VG pve lvm2 [297.59 GiB / 16.00 GiB free]
PV /dev/sdb lvm2 [128.00 GiB]
Total: 11 [6.29 TiB] / in use: 10 [6.17 TiB] / in no VG: 1 [128.00 GiB]


Friday, June 22, 2012

SMS notification from "The Dude" monitoring system

For now at my server's farm I use perfect (and free) monitoring system named "The dude". I think its wonderful soft because its easy to use, easy to setup and perfectly working at linux host under wine, and have a package (npk) for Mikrotik RouterOS.


Saturday, April 21, 2012

Tuesday, April 17, 2012

New Big Picture

Finally I assemble my HA storage cluster, which passed my tests! Here is:

Friday, April 13, 2012

Disappointment

So, finaly I must agree with El Di Pablo that the GlusterFS is Not Ready For HA SAN Storage. Because my HA storage with last config could not pass my failover test. 

It was better than the first time: after rebooting first node, the iSCSI target still available and VM keep to write data, but after boot first node, and reboot second node iSCSI target was lost, and finaly all data on storage was corrupted. OMG!

Well, Now I have decided to leave GlusterFS and use the classical solution: Ubuntu+DRBD(master/slave)+IET+Heartbeat. Yes, I know about possible problems with perfomance. But for now, I think, it is only one sutable HA solution (available without payment) for production purpose.

Thursday, April 5, 2012

Test#2

Today the updated configuration to the nodes of my HA Storage. Created a virtual machine and started to install ubuntu on it. During installation, I rebooted the active node. As a result, the PM node has received:

root@cl02-n01:~# pvscan
  /dev/NASLUN0VG0/vm-100-disk-1: read failed after 0 of 4096 at 34359672832: Input/output error
  /dev/NASLUN0VG0/vm-100-disk-1: read failed after 0 of 4096 at 34359730176: Input/output error
  /dev/NASLUN0VG0/vm-100-disk-1: read failed after 0 of 4096 at 0: Input/output error
  /dev/NASLUN0VG0/vm-100-disk-1: read failed after 0 of 4096 at 4096: Input/output error
  PV /dev/sdb    VG NASLUN0VG0   lvm2 [4.00 TiB / 3.97 TiB free]
  PV /dev/sda2   VG pve          lvm2 [297.59 GiB / 16.00 GiB free]
  Total: 2 [4.29 TiB] / in use: 2 [4.29 TiB] / in no VG: 0 [0   ]


And VM is hungup.

So, after rebooted node returned to online, I rebooted the second one. At PM node has received:

root@cl02-n01:~# pvscan
Found duplicate PV 4hKm0l1uebbcn5s3eV3ZUT6nb6exKjOD: using /dev/sdc not /dev/sdb
PV /dev/sdc VG NASLUN0VG0 lvm2 [4.00 TiB / 3.97 TiB free]
PV /dev/sda2 VG pve lvm2 [297.59 GiB / 16.00 GiB free]
Total: 2 [4.29 TiB] / in use: 2 [4.29 TiB] / in no VG: 0 [0 ]

Seems to be /dev/NASLUN0VG0/vm-100-disk-1 is available now, but VM is in hungup stage anyway. At PM GUI about my VM:

Status: running
CPU usage: 50% of 2CPUs (im use 2 CPU at config)



PS.
At active Storage node dmesg show:


...

[ 3984.639799] iscsi_trgt: scsi_cmnd_start(972) e000080 25
[ 3984.665551] iscsi_trgt: cmnd_skip_pdu(475) e000080 1c 25 0
[ 3984.690853] iscsi_trgt: scsi_cmnd_start(972) f000080 25
[ 3984.716664] iscsi_trgt: cmnd_skip_pdu(475) f000080 1c 25 0
[ 3984.741993] iscsi_trgt: scsi_cmnd_start(972) 10000080 25
[ 3984.767753] iscsi_trgt: cmnd_skip_pdu(475) 10000080 1c 25 0
[ 3984.793068] iscsi_trgt: scsi_cmnd_start(972) 11000080 1a
[ 3984.818886] iscsi_trgt: cmnd_skip_pdu(475) 11000080 1c 1a 0
[ 3984.844352] iscsi_trgt: scsi_cmnd_start(972) 12000080 1a
[ 3984.870156] iscsi_trgt: cmnd_skip_pdu(475) 12000080 1c 1a 0
[ 3984.895546] iscsi_trgt: scsi_cmnd_start(972) 13000080 1a
[ 3984.921360] iscsi_trgt: cmnd_skip_pdu(475) 13000080 1c 1a 0
[ 3984.947039] iscsi_trgt: scsi_cmnd_start(972) 14000080 1a
...
and these messages more and more



Wednesday, April 4, 2012

Lets change GlusterFS config!

As a result, the first test showed that the storage reboot the cluster nodes with the current configuration causes data corruption. As a result, the virtual machine image becomes corrupted.I think the problem is in GlusterFS synchonizing and self-heal mechanism. My other thought is that maybe the cause of damage in the use of "thin provisioning". I'll check this option if the new configuration will not work correctly.

Now will change setup for my GlusterFS server and client using AFR translator:

Serever config on NAS01-NODE01 (172.16.0.1):

##############################################
###  GlusterFS Server Volume Specification  ##
##############################################

# dataspace on node1
volume gfs-ds
  type storage/posix
  option directory /data
end-volume

# posix locks
volume gfs-ds-locks
  type features/posix-locks
  subvolumes gfs-ds
end-volume

# dataspace on node2
volume gfs-node2-ds
  type protocol/client
  option transport-type tcp/client
  option remote-host 172.16.0.2         # storage network
  option remote-subvolume gfs-ds-locks
  option transport-timeout 10           # value in seconds; it should be set relatively low
end-volume

# automatic file replication translator for dataspace
volume gfs-ds-afr
  type cluster/afr
  subvolumes gfs-ds-locks gfs-node2-ds  # local and remote dataspaces
end-volume

# the actual exported volume
volume gfs
  type performance/io-threads
  option thread-count 8
  option cache-size 64MB
  subvolumes gfs-ds-afr
end-volume

# finally, the server declaration
volume server
  type protocol/server
  option transport-type tcp/server
  subvolumes gfs
  # storage network access only
  option auth.ip.gfs-ds-locks.allow 172.16.0.*,127.0.0.1
  option auth.ip.gfs.allow 172.16.0.*
end-volume


Serever config on NAS01-NODE02 (172.16.0.2):

##############################################
###  GlusterFS Server Volume Specification  ##
##############################################

# dataspace on node2
volume gfs-ds
  type storage/posix
  option directory /data
end-volume

# posix locks
volume gfs-ds-locks
  type features/posix-locks
  subvolumes gfs-ds
end-volume

# dataspace on node1
volume gfs-storage1-ds
  type protocol/client
  option transport-type tcp/client
  option remote-host 172.16.0.1         # storage network
  option remote-subvolume gfs-ds-locks
  option transport-timeout 10           # value in seconds; it should be set relatively low
end-volume

# automatic file replication translator for dataspace
volume gfs-ds-afr
  type cluster/afr
  subvolumes gfs-ds-locks gfs-node1-ds  # local and remote dataspaces
end-volume

# the actual exported volume
volume gfs
  type performance/io-threads
  option thread-count 8
  option cache-size 64MB
  subvolumes gfs-ds-afr
end-volume

# finally, the server declaration
volume server
  type protocol/server
  option transport-type tcp/server
  subvolumes gfs
  # storage network access only
  option auth.ip.gfs-ds-locks.allow 172.16.0.*,127.0.0.1
  option auth.ip.gfs.allow 172.16.0.*
end-volume


Client config on both nodes:

#############################################
##  GlusterFS Client Volume Specification  ##
#############################################

# the exported volume to mount                    # required!
volume cluster
  type protocol/client
  option transport-type tcp/client
  option remote-host 172.16.0.1 / or .2 for node2 !!!
  option remote-subvolume gfs                     # exported volume
  option transport-timeout 10                     # value in seconds, should be relatively low
end-volume

# performance block for cluster                   # optional!
volume writeback
  type performance/write-behind
  option aggregate-size 131072
  subvolumes cluster
end-volume

# performance block for cluster                   # optional!
volume readahead
  type performance/read-ahead
  option page-size 65536
  option page-count 16
  subvolumes writeback
end-volume