Solution: broken raid1, first check the sata cables.

on

After a lot of:
– ata1.00: device reported invalid CHS sector 0
– ata1.00: failed command: WRITE FPDMA QUEUED
And many more raid fail messages in dmesg. A raid1 in Degraded state that would not come to live,
Made me a bit nervous: “shit, broken hard drives” ….

googling about this problem, one solution came back on every bug-list or blog:

— check the sata cables —

So I openend my nas case, pulled out all sata cables and reconnected them again, took out the 2 hard drives from the hot-swap-bays and put them back again.

I started the system, immediately shutted down every service (apache, netatalk, smb, icecast….).

Did a cat /proc/mdstat, telling me the raid was rebuilding…. It should take about 620 minutes. So went to bed. In first thing I did after waking up this morning was checking the dmesg.
The last lines where very joyable:

knilluz@nas1:~$ dmesg | tail -n5
[37529.928085] md: md0: recovery done.
[37530.134508] RAID1 conf printout:
[37530.134517] --- wd:2 rd:2
[37530.134526] disk 0, wo:0, o:1, dev:sdb1
[37530.134532] disk 1, wo:0, o:1, dev:sdc1

(that took about 625 minutes)

knilluz@nas1:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdc1[1] sdb1[0]
1953513408 blocks [2/2] [UU]
nused devices:

One line from all messages I will keep in mind:
“I never believed in anything other than the cheapest SATA cables, but for me the problems went away after using a thicker more expensive SATA cable with firm braced connectors.”

syslog output