Monday, April 2, 2012

Test#1: Check HA for Storage cluster

For this testing, I created 1 VM. Then I setuped mikrotik 5.5. MT is just because, I have ISO here :)




So, our VM is running. Well, now let's try check, what happend when one node in storage cluster was rebooted. At NAS's first node I typed "reboot". Node start rebooting. At second node heartbeat server bringup new subinterface with 172.16.70.2 and start iSCSItarget service. NAS still reply to ping.

VM continue working.
In Proxmox GUI, I going to storage "NAS01-PVELUNS on node cl02-n01". GUI is hungup.

Console at cl02-n01 do:


root@cl02-n01:~# pvscan


command hungup, but after about 10-15 sec:


  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 3221159936: Input/output error

  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 3221217280: Input/output error

  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 0: Input/output error

  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 4096: Input/output error

  PV /dev/sdd    VG NAS01LUN1VG0   lvm2 [4.00 TiB / 4.00 TiB free]

  PV /dev/sda2   VG pve            lvm2 [297.59 GiB / 16.00 GiB free]

  Total: 2 [4.29 TiB] / in use: 2 [4.29 TiB] / in no VG: 0 [0   ]


I repeat command:

root@cl02-n01:~# pvscan
  PV /dev/sdd    VG NAS01LUN1VG0   lvm2 [4.00 TiB / 4.00 TiB free]
  PV /dev/sda2   VG pve            lvm2 [297.59 GiB / 16.00 GiB free]
  Total: 2 [4.29 TiB] / in use: 2 [4.29 TiB] / in no VG: 0 [0   ]


Hehe! Previous warning about "Found duplicate PV bla-bla-bla" is disapear.

Very interesting.
Ok, let's restart our VM: Done "shutdown" and "start" - all fine. VM started successfuly

The first NAS's node already rebooted and online again. Now I try to reboot second NAS's node.
Do "reboot", node going to reboot...

Now check GUI: no hungup.
Check console:


root@cl02-n01:~# pvscan
  Found duplicate PV MiXXJdMcRElPXQPEtzc6pPFAQLhQn0lC: using /dev/sde not /dev/sdd
  PV /dev/sde    VG NAS01LUN1VG0   lvm2 [4.00 TiB / 4.00 TiB free]
  PV /dev/sda2   VG pve            lvm2 [297.59 GiB / 16.00 GiB free]
  Total: 2 [4.29 TiB] / in use: 2 [4.29 TiB] / in no VG: 0 [0   ]


OMG! "Duplicate" again. :(
But the worst is something happend with VM:

root@cl02-n01:~# qm stop 100
trying to aquire lock... OK
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 3221159936: Input/output error
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 3221217280: Input/output error
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 0: Input/output error
  /dev/NAS01LUN1VG0/vm-100-disk-1: read failed after 0 of 4096 at 4096: Input/output error


Seems to be data at /dev/NAS01LUN1VG0/vm-100-disk-1 was corrupted.

Result: need check my NAS.

No comments:

Post a Comment