After a lot of:
– ata1.00: device reported invalid CHS sector 0
– ata1.00: failed command: WRITE FPDMA QUEUED
And many more raid fail messages in dmesg. A raid1 in Degraded state that would not come to live,
Made me a bit nervous: “shit, broken hard drives” ….
googling about this problem, one solution came back on every bug-list or blog:
— check the sata cables —
So I openend my nas case, pulled out all sata cables and reconnected them again, took out the 2 hard drives from the hot-swap-bays and put them back again.
I started the system, immediately shutted down every service (apache, netatalk, smb, icecast….).
Did a cat /proc/mdstat, telling me the raid was rebuilding…. It should take about 620 minutes. So went to bed. In first thing I did after waking up this morning was checking the dmesg.
The last lines where very joyable:
knilluz@nas1:~$ dmesg | tail -n5 [37529.928085] md: md0: recovery done. [37530.134508] RAID1 conf printout: [37530.134517] --- wd:2 rd:2 [37530.134526] disk 0, wo:0, o:1, dev:sdb1 [37530.134532] disk 1, wo:0, o:1, dev:sdc1
(that took about 625 minutes)
knilluz@nas1:~$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid1 sdc1[1] sdb1[0] 1953513408 blocks [2/2] [UU] nused devices:
One line from all messages I will keep in mind:
“I never believed in anything other than the cheapest SATA cables, but for me the problems went away after using a thicker more expensive SATA cable with firm braced connectors.”