Home > Linux > software RAID Detecting, querying and testing

software RAID Detecting, querying and testing

November 15th, 2009 admin

More about software RAID…

It’s always a must for /var/log/messages to fill screens with tons of error messages, no matter what happened. But, when it’s about a disk crash, huge lots of kernel errors are reported. Some nasty examples, for the masochists,

   kernel: scsi0 channel 0 : resetting for second half of retries.
   kernel: SCSI bus is being reset for host 0 channel 0.
   kernel: scsi0: Sending Bus Device Reset CCB #2666 to Target 0
   kernel: scsi0: Bus Device Reset CCB #2666 to Target 0 Completed
   kernel: scsi : aborting command due to timeout : pid 2649, scsi0, channel 0, id 0, lun 0 Write (6) 18 33 11 24 00
   kernel: scsi0: Aborting CCB #2669 to Target 0
   kernel: SCSI host 0 channel 0 reset (pid 2644) timed out - trying harder
   kernel: SCSI bus is being reset for host 0 channel 0.
   kernel: scsi0: CCB #2669 to Target 0 Aborted
   kernel: scsi0: Resetting BusLogic BT-958 due to Target 0
   kernel: scsi0: *** BusLogic BT-958 Initialized Successfully ***

Most often, disk failures look like these,

   kernel: sidisk I/O error: dev 08:01, sector 1590410
   kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 28000002

or these

   kernel: hde: read_intr: error=0x10 { SectorIdNotFound }, CHS=31563/14/35, sector=0
   kernel: hde: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }

And, as expected, the classic /proc/mdstat look will also reveal problems,

   Personalities : [linear] [raid0] [raid1] [translucent]
   read_ahead not set
   md7 : active raid1 sdc9[0] sdd5[8] 32000 blocks [2/1] [U_]

Later on this section we will learn how to monitor RAID with mdadm so we can receive alert reports about disk failures. Now it’s time to learn more about /proc/mdstat interpretation.

Categories: Linux Tags:
(+1 rating, 1 votes)