Thursday, August 2, 2012

London's bridge is falling down...

OMG! Last few days one of my LUN at PVE cluster is falling down after night backup!

Symptoms:

root@cl02-n02:~# pvscan
/dev/NAS01LUN0VG0/vzsnap-cl02-n02-0: read failed after 0 of 4096 at 21478965248: Input/output error
/dev/NAS01LUN0VG0/vzsnap-cl02-n02-0: read failed after 0 of 4096 at 21479022592: Input/output error
/dev/NAS01LUN0VG0/vzsnap-cl02-n02-0: read failed after 0 of 4096 at 0: Input/output error
/dev/NAS01LUN0VG0/vzsnap-cl02-n02-0: read failed after 0 of 4096 at 4096: Input/output error
/dev/sdb: Checksum error
PV /dev/sdk VG NAS01LUN9VG0 lvm2 [2.00 TiB / 2.00 TiB free]
PV /dev/sdj VG NAS01LUN8VG0 lvm2 [2.00 TiB / 2.00 TiB free]
PV /dev/sdi VG NAS01LUN7VG0 lvm2 [511.98 GiB / 511.98 GiB free]
PV /dev/sdh VG NAS01LUN6VG0 lvm2 [511.98 GiB / 511.98 GiB free]
PV /dev/sdg VG NAS01LUN5VG0 lvm2 [255.99 GiB / 159.99 GiB free]
PV /dev/sdf VG NAS01LUN4VG0 lvm2 [255.99 GiB / 157.99 GiB free]
PV /dev/sde VG NAS01LUN3VG0 lvm2 [127.99 GiB / 29.99 GiB free]
PV /dev/sdd VG NAS01LUN2VG0 lvm2 [127.99 GiB / 95.99 GiB free]
PV /dev/sdc VG NAS01LUN1VG1 lvm2 [127.99 GiB / 26.99 GiB free]
PV /dev/sda2 VG pve lvm2 [297.59 GiB / 16.00 GiB free]
PV /dev/sdb lvm2 [128.00 GiB]
Total: 11 [6.29 TiB] / in use: 10 [6.17 TiB] / in no VG: 1 [128.00 GiB]


Friday, June 22, 2012

SMS notification from "The Dude" monitoring system

For now at my server's farm I use perfect (and free) monitoring system named "The dude". I think its wonderful soft because its easy to use, easy to setup and perfectly working at linux host under wine, and have a package (npk) for Mikrotik RouterOS.


Saturday, April 21, 2012

Tuesday, April 17, 2012

New Big Picture

Finally I assemble my HA storage cluster, which passed my tests! Here is:

Friday, April 13, 2012

Disappointment

So, finaly I must agree with El Di Pablo that the GlusterFS is Not Ready For HA SAN Storage. Because my HA storage with last config could not pass my failover test. 

It was better than the first time: after rebooting first node, the iSCSI target still available and VM keep to write data, but after boot first node, and reboot second node iSCSI target was lost, and finaly all data on storage was corrupted. OMG!

Well, Now I have decided to leave GlusterFS and use the classical solution: Ubuntu+DRBD(master/slave)+IET+Heartbeat. Yes, I know about possible problems with perfomance. But for now, I think, it is only one sutable HA solution (available without payment) for production purpose.

Thursday, April 5, 2012

Test#2

Today the updated configuration to the nodes of my HA Storage. Created a virtual machine and started to install ubuntu on it. During installation, I rebooted the active node. As a result, the PM node has received:

root@cl02-n01:~# pvscan
  /dev/NASLUN0VG0/vm-100-disk-1: read failed after 0 of 4096 at 34359672832: Input/output error
  /dev/NASLUN0VG0/vm-100-disk-1: read failed after 0 of 4096 at 34359730176: Input/output error
  /dev/NASLUN0VG0/vm-100-disk-1: read failed after 0 of 4096 at 0: Input/output error
  /dev/NASLUN0VG0/vm-100-disk-1: read failed after 0 of 4096 at 4096: Input/output error
  PV /dev/sdb    VG NASLUN0VG0   lvm2 [4.00 TiB / 3.97 TiB free]
  PV /dev/sda2   VG pve          lvm2 [297.59 GiB / 16.00 GiB free]
  Total: 2 [4.29 TiB] / in use: 2 [4.29 TiB] / in no VG: 0 [0   ]


And VM is hungup.

So, after rebooted node returned to online, I rebooted the second one. At PM node has received:

root@cl02-n01:~# pvscan
Found duplicate PV 4hKm0l1uebbcn5s3eV3ZUT6nb6exKjOD: using /dev/sdc not /dev/sdb
PV /dev/sdc VG NASLUN0VG0 lvm2 [4.00 TiB / 3.97 TiB free]
PV /dev/sda2 VG pve lvm2 [297.59 GiB / 16.00 GiB free]
Total: 2 [4.29 TiB] / in use: 2 [4.29 TiB] / in no VG: 0 [0 ]

Seems to be /dev/NASLUN0VG0/vm-100-disk-1 is available now, but VM is in hungup stage anyway. At PM GUI about my VM:

Status: running
CPU usage: 50% of 2CPUs (im use 2 CPU at config)



PS.
At active Storage node dmesg show:


...

[ 3984.639799] iscsi_trgt: scsi_cmnd_start(972) e000080 25
[ 3984.665551] iscsi_trgt: cmnd_skip_pdu(475) e000080 1c 25 0
[ 3984.690853] iscsi_trgt: scsi_cmnd_start(972) f000080 25
[ 3984.716664] iscsi_trgt: cmnd_skip_pdu(475) f000080 1c 25 0
[ 3984.741993] iscsi_trgt: scsi_cmnd_start(972) 10000080 25
[ 3984.767753] iscsi_trgt: cmnd_skip_pdu(475) 10000080 1c 25 0
[ 3984.793068] iscsi_trgt: scsi_cmnd_start(972) 11000080 1a
[ 3984.818886] iscsi_trgt: cmnd_skip_pdu(475) 11000080 1c 1a 0
[ 3984.844352] iscsi_trgt: scsi_cmnd_start(972) 12000080 1a
[ 3984.870156] iscsi_trgt: cmnd_skip_pdu(475) 12000080 1c 1a 0
[ 3984.895546] iscsi_trgt: scsi_cmnd_start(972) 13000080 1a
[ 3984.921360] iscsi_trgt: cmnd_skip_pdu(475) 13000080 1c 1a 0
[ 3984.947039] iscsi_trgt: scsi_cmnd_start(972) 14000080 1a
...
and these messages more and more



Wednesday, April 4, 2012

Lets change GlusterFS config!

As a result, the first test showed that the storage reboot the cluster nodes with the current configuration causes data corruption. As a result, the virtual machine image becomes corrupted.I think the problem is in GlusterFS synchonizing and self-heal mechanism. My other thought is that maybe the cause of damage in the use of "thin provisioning". I'll check this option if the new configuration will not work correctly.

Now will change setup for my GlusterFS server and client using AFR translator:

Serever config on NAS01-NODE01 (172.16.0.1):

##############################################
###  GlusterFS Server Volume Specification  ##
##############################################

# dataspace on node1
volume gfs-ds
  type storage/posix
  option directory /data
end-volume

# posix locks
volume gfs-ds-locks
  type features/posix-locks
  subvolumes gfs-ds
end-volume

# dataspace on node2
volume gfs-node2-ds
  type protocol/client
  option transport-type tcp/client
  option remote-host 172.16.0.2         # storage network
  option remote-subvolume gfs-ds-locks
  option transport-timeout 10           # value in seconds; it should be set relatively low
end-volume

# automatic file replication translator for dataspace
volume gfs-ds-afr
  type cluster/afr
  subvolumes gfs-ds-locks gfs-node2-ds  # local and remote dataspaces
end-volume

# the actual exported volume
volume gfs
  type performance/io-threads
  option thread-count 8
  option cache-size 64MB
  subvolumes gfs-ds-afr
end-volume

# finally, the server declaration
volume server
  type protocol/server
  option transport-type tcp/server
  subvolumes gfs
  # storage network access only
  option auth.ip.gfs-ds-locks.allow 172.16.0.*,127.0.0.1
  option auth.ip.gfs.allow 172.16.0.*
end-volume


Serever config on NAS01-NODE02 (172.16.0.2):

##############################################
###  GlusterFS Server Volume Specification  ##
##############################################

# dataspace on node2
volume gfs-ds
  type storage/posix
  option directory /data
end-volume

# posix locks
volume gfs-ds-locks
  type features/posix-locks
  subvolumes gfs-ds
end-volume

# dataspace on node1
volume gfs-storage1-ds
  type protocol/client
  option transport-type tcp/client
  option remote-host 172.16.0.1         # storage network
  option remote-subvolume gfs-ds-locks
  option transport-timeout 10           # value in seconds; it should be set relatively low
end-volume

# automatic file replication translator for dataspace
volume gfs-ds-afr
  type cluster/afr
  subvolumes gfs-ds-locks gfs-node1-ds  # local and remote dataspaces
end-volume

# the actual exported volume
volume gfs
  type performance/io-threads
  option thread-count 8
  option cache-size 64MB
  subvolumes gfs-ds-afr
end-volume

# finally, the server declaration
volume server
  type protocol/server
  option transport-type tcp/server
  subvolumes gfs
  # storage network access only
  option auth.ip.gfs-ds-locks.allow 172.16.0.*,127.0.0.1
  option auth.ip.gfs.allow 172.16.0.*
end-volume


Client config on both nodes:

#############################################
##  GlusterFS Client Volume Specification  ##
#############################################

# the exported volume to mount                    # required!
volume cluster
  type protocol/client
  option transport-type tcp/client
  option remote-host 172.16.0.1 / or .2 for node2 !!!
  option remote-subvolume gfs                     # exported volume
  option transport-timeout 10                     # value in seconds, should be relatively low
end-volume

# performance block for cluster                   # optional!
volume writeback
  type performance/write-behind
  option aggregate-size 131072
  subvolumes cluster
end-volume

# performance block for cluster                   # optional!
volume readahead
  type performance/read-ahead
  option page-size 65536
  option page-count 16
  subvolumes writeback
end-volume

Monday, April 2, 2012

Meanwhile, Cluster Storage

Interesting dmesg:

[ 3773.406686] iscsi_trgt: Abort Task (01) issued on tid:1 lun:1 by sid:844424967684608 (Function Complete)
[ 3801.645553] iscsi_trgt: cmnd_rx_start(1863) 1 4b000010 -7
[ 3801.645861] iscsi_trgt: cmnd_skip_pdu(475) 4b000010 1 28 0
[ 3863.275926] iscsi_trgt: Abort Task (01) issued on tid:1 lun:1 by sid:844424967684608 (Function Complete)
[ 3863.276207] iscsi_trgt: Abort Task (01) issued on tid:1 lun:1 by sid:844424967684608 (Function Complete)
[ 3873.259744] iscsi_trgt: Abort Task (01) issued on tid:1 lun:1 by sid:844424967684608 (Function Complete)
[ 3873.260209] iscsi_trgt: Logical Unit Reset (05) issued on tid:1 lun:1 by sid:844424967684608 (Function Complete)
[ 3883.243507] iscsi_trgt: Abort Task (01) issued on tid:1 lun:1 by sid:844424967684608 (Function Complete)
[ 3883.243788] iscsi_trgt: Target Warm Reset (06) issued on tid:1 lun:0 by sid:844424967684608 (Function Complete)
[ 3893.227235] iscsi_trgt: Abort Task (01) issued on tid:1 lun:1 by sid:844424967684608 (Function Complete)
[ 4019.005333] iscsi_trgt: Abort Task (01) issued on tid:1 lun:0 by sid:844424967684608 (Function Complete)
[ 4019.005688] iscsi_trgt: Abort Task (01) issued on tid:1 lun:0 by sid:844424967684608 (Function Complete)
[ 4028.989010] iscsi_trgt: Abort Task (01) issued on tid:1 lun:0 by sid:844424967684608 (Function Complete)
[ 4028.989375] iscsi_trgt: Logical Unit Reset (05) issued on tid:1 lun:0 by sid:844424967684608 (Function Complete)
[ 4038.972785] iscsi_trgt: Abort Task (01) issued on tid:1 lun:0 by sid:844424967684608 (Function Complete)
[ 4038.973091] iscsi_trgt: Target Warm Reset (06) issued on tid:1 lun:0 by sid:844424967684608 (Function Complete)
[ 4048.956574] iscsi_trgt: Abort Task (01) issued on tid:1 lun:0 by sid:844424967684608 (Function Complete)
[ 4090.888634] iscsi_trgt: Abort Task (01) issued on tid:1 lun:2 by sid:844424967684608 (Unknown LUN)
[ 4090.888985] iscsi_trgt: Logical Unit Reset (05) issued on tid:1 lun:2 by sid:844424967684608 (Unknown LUN)
[ 4090.889242] iscsi_trgt: Target Warm Reset (06) issued on tid:1 lun:0 by sid:844424967684608 (Function Complete)
[ 4090.889575] iscsi_trgt: scsi_cmnd_start(972) 16000010 0
[ 4090.897476] iscsi_trgt: cmnd_skip_pdu(475) 16000010 1c 0 0
[ 4100.873321] iscsi_trgt: Abort Task (01) issued on tid:1 lun:2 by sid:844424967684608 (Unknown LUN)
[ 4100.873629] iscsi_trgt: Logical Unit Reset (05) issued on tid:1 lun:2 by sid:844424967684608 (Unknown LUN)
[ 4100.873892] iscsi_trgt: Target Warm Reset (06) issued on tid:1 lun:0 by sid:844424967684608 (Function Complete)
[ 4142.810375] iscsi_trgt: Abort Task (01) issued on tid:1 lun:2 by sid:844424967684608 (Unknown LUN)
[ 4142.810768] iscsi_trgt: Logical Unit Reset (05) issued on tid:1 lun:2 by sid:844424967684608 (Unknown LUN)
[ 4142.811135] iscsi_trgt: Target Warm Reset (06) issued on tid:1 lun:0 by sid:844424967684608 (Function Complete)
[ 4142.811485] iscsi_trgt: scsi_cmnd_start(972) 56000010 0
[ 4142.819205] iscsi_trgt: cmnd_skip_pdu(475) 56000010 1c 0 0
[ 4152.795191] iscsi_trgt: Abort Task (01) issued on tid:1 lun:2 by sid:844424967684608 (Unknown LUN)
[ 4152.795542] iscsi_trgt: Logical Unit Reset (05) issued on tid:1 lun:2 by sid:844424967684608 (Unknown LUN)
[ 4152.795799] iscsi_trgt: Target Warm Reset (06) issued on tid:1 lun:0 by sid:844424967684608 (Function Complete)
[ 4194.738037] iscsi_trgt: Abort Task (01) issued on tid:1 lun:2 by sid:844424967684608 (Unknown LUN)
[ 4194.738300] iscsi_trgt: Logical Unit Reset (05) issued on tid:1 lun:2 by sid:844424967684608 (Unknown LUN)
[ 4194.738583] iscsi_trgt: Target Warm Reset (06) issued on tid:1 lun:0 by sid:844424967684608 (Function Complete)
[ 4194.738789] iscsi_trgt: scsi_cmnd_start(972) 68000010 0
[ 4194.746329] iscsi_trgt: cmnd_skip_pdu(475) 68000010 1c 0 0
[ 4204.721919] iscsi_trgt: Abort Task (01) issued on tid:1 lun:2 by sid:844424967684608 (Unknown LUN)
[ 4204.722125] iscsi_trgt: Logical Unit Reset (05) issued on tid:1 lun:2 by sid:844424967684608 (Unknown LUN)
[ 4204.722380] iscsi_trgt: Target Warm Reset (06) issued on tid:1 lun:0 by sid:844424967684608 (Function Complete)
[ 4245.655427] iscsi_trgt: nop_out_start(907) ignore this request 69000010
[ 4245.663146] iscsi_trgt: cmnd_rx_start(1863) 0 69000010 -7
[ 4245.670986] iscsi_trgt: cmnd_skip_pdu(475) 69000010 0 0 0
[ 4246.653815] iscsi_trgt: Abort Task (01) issued on tid:1 lun:2 by sid:844424967684608 (Unknown LUN)
[ 4246.653981] iscsi_trgt: cmnd_rx_start(1863) 2 4b000010 -7
[ 4246.661635] iscsi_trgt: cmnd_skip_pdu(475) 4b000010 2 0 0

Add to test#1

Finaly do:
root@cl02-n01:~# qm start 100                                                     /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 3221159936: Input/output error
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 3221217280: Input/output error
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 0: Input/output error
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 4096: Input/output error
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 3221159936: Input/output error
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 3221217280: Input/output error
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 0: Input/output error
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 4096: Input/output error
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 3221159936: Input/output error
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 3221217280: Input/output error
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 0: Input/output error
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 4096: Input/output error
  Volume group "NAS01LUN1VG0" not found
can't activate LV 'NAS01-LUN1:vm-100-disk-1':   Skipping volume group NAS01LUN1VG0


Hehe, now "content" via PM GUI for both NAS01-PVELUNS and NAS01-LUN1 is empty.

I successfully broke all stuff.


Test#1: Check HA for Storage cluster

For this testing, I created 1 VM. Then I setuped mikrotik 5.5. MT is just because, I have ISO here :)


Prepearing for testing

The cluster is ready to use.
Now I need to add my "HA-cluster-Storage" as a storage for my virtual machines:

Go to "Datacenter"-"Storage"-""Add" - iSCSITarget
Fill
 ID:NAS01-PVELUNS
 Portal : 172.16.70.2
 Target: iqn.2012-03.nas01:iscsi.PVELUNS
 Uncheck "Use LUNs directly"

And then, "Add" - LVM group
Fill
 ID:NAS01-LUN1
 Base storage: NAS01-PVELUNS 
 Base volume: CH 00 ID 0 LUN 1
 Volume group: NAS01LUN1VG0
 Check "Shared"

Ok, now we can see:

Now lookup to node1:

root@cl02-n01:~# pvdisplay
  Found duplicate PV MiXXJdMcRElPXQPEtzc6pPFAQLhQn0lC: using /dev/sde not /dev/sdd
  --- Physical volume ---
  PV Name               /dev/sde
  VG Name               NAS01LUN1VG0
  PV Size               4.00 TiB / not usable 4.00 MiB
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              1048575
  Free PE               1047807
  Allocated PE          768
  PV UUID               MiXXJd-McRE-lPXQ-PEtz-c6pP-FAQL-hQn0lC

  --- Physical volume ---
  PV Name               /dev/sda2
  VG Name               pve
  PV Size               297.59 GiB / not usable 0
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              76183
  Free PE               4095
  Allocated PE          72088
  PV UUID               mMvpci-6zko-K3uS-VOG1-GO4Z-Nsdj-t9WbtS


What does  "Found duplicate PV MiXXJdMcRElPXQPEtzc6pPFAQLhQn0lC: using /dev/sde not /dev/sdd" mean? And why this appear?

I will thinking about it...

Cluster Tips: Keep your clocks on all nodes synchronized

Yes, I had not thought of.
There is a simple solutions for my first problem is just correct time at all nodes.

So, for now cluster looking well!

Saturday, March 31, 2012

Creating PM 2.0 Cluster

So, today was installed my fresh-new Proxmox Virtual Environment.

First problem occured with GUI.
Asked at forum: http://forum.proxmox.com/threads/8949-GUI-login-problem
Thinking....

Proxmox VE 2.0 final release !!!

Thats' perfect!
11 hours ago, was Proxmox VE 2.0 final release !!!


First of all, will make a USB stick for instalation on my ready-to-use 4 nodes.

Friday, March 30, 2012

Project: HA Storage cluster for Proxmox 2.0

Hi all!


Last few month I waiting for release stable version on Proxmox 2.0. Hope its will happen soon. And for my new cluster I was searching suitable HA "solid-rock" (hehe, "dream-come-true") solution.

Due to post from El Di Pablo : http://www.bauer-power.net/2011/08/roll-your-own-fail-over-san-cluster.html I was try to construct such cluster, and test it.

So here is:


But later El Di Pablo told horrible story about fault of his cluster:
http://www.bauer-power.net/2012/03/glusterfs-is-not-ready-for-san-storage.html

Anyway, I want to check this up, and make this project realy useful and easy-to-use solution.

At this blog, I will try to inform, what I done and what happining with my cluster.