software RAID Detecting, querying and testing
November 15th, 2009
Comments off
It’s always a must for /var/log/messages to fill screens with tons of error messages, no matter what happened. But, when it’s about a disk crash, huge lots of kernel errors are reported. Some nasty examples, for the masochists,
kernel: scsi0 channel 0 : resetting for second half of retries. kernel: SCSI bus is being reset for host 0 channel 0. kernel: scsi0: Sending Bus Device Reset CCB #2666 to Target 0 kernel: scsi0: Bus Device Reset CCB #2666 to Target 0 Completed kernel: scsi : aborting command due to timeout : pid 2649, scsi0, channel 0, id 0, lun 0 Write (6) 18 33 11 24 00 kernel: scsi0: Aborting CCB #2669 to Target 0 kernel: SCSI host 0 channel 0 reset (pid 2644) timed out - trying harder kernel: SCSI bus is being reset for host 0 channel 0. kernel: scsi0: CCB #2669 to Target 0 Aborted kernel: scsi0: Resetting BusLogic BT-958 due to Target 0 kernel: scsi0: *** BusLogic BT-958 Initialized Successfully ***
Most often, disk failures look like these,
kernel: sidisk I/O error: dev 08:01, sector 1590410 kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 28000002
or these
kernel: hde: read_intr: error=0x10 { SectorIdNotFound }, CHS=31563/14/35, sector=0
kernel: hde: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
And, as expected, the classic /proc/mdstat look will also reveal problems,
Personalities : [linear] [raid0] [raid1] [translucent] read_ahead not set md7 : active raid1 sdc9[0] sdd5[8] 32000 blocks [2/1] [U_]
Later on this section we will learn how to monitor RAID with mdadm so we can receive alert reports about disk failures. Now it’s time to learn more about /proc/mdstat interpretation.














