Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add

JonathanM · January 12, 2009

Update on clearing 3 new 1.5 Seagates. In the original post, I had just unzipped a fresh install of 4.4.2, installed the smart libraries, and kicked off the script. It precleared 2 out of three disks successfully. I then pulled the USB stick, added the newest bubbaraid and enabled it, and booted bubbaraid. The disk assignments changed, and I suspect I may have given you status on the wrong drive. I looked at the current syslog, and it seems the disk that failed the preclear the first time through may be bad. When I originally ran the script, the drives were sdb, sdc, and sdd. Now they are sda, sdb, and sdc. sda seems to be sick.

Jan 12 13:20:25 Tower preclear_disk-start[6674]: 1 Raw_Read_Error_Rate 0x000f 117 100 006 Pre-fail Always - 235560959
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 3 Spin_Up_Time 0x0003 094 094 000 Pre-fail Always - 0
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 7
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 5 Reallocated_Sector_Ct 0x0033 095 095 036 Pre-fail Always - 220
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 7 Seek_Error_Rate 0x000f 100 253 030 Pre-fail Always - 220180
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 67
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 7
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 187 Reported_Uncorrect 0x0032 041 041 000 Old_age Always - 59
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 188 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 4295032833
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 189 High_Fly_Writes 0x003a 076 076 000 Old_age Always - 24
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 190 Airflow_Temperature_Cel 0x0022 067 065 045 Old_age Always - 33 (Lifetime Min/Max 28/33)
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 195 Hardware_ECC_Recovered 0x001a 044 044 000 Old_age Always - 235560959
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 18
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 18
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: SMART Error Log Version: 1
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ATA Error Count: 65 (device log contains only the most recent five errors)
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ^ICR = Command Register [HEX]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ^IFR = Features Register [HEX]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ^ISC = Sector Count Register [HEX]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ^ISN = Sector Number Register [HEX]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ^ICL = Cylinder Low Register [HEX]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ^ICH = Cylinder High Register [HEX]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ^IDH = Device/Head Register [HEX]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ^IDC = Device Command Register [HEX]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ^IER = Error register [HEX]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ^IST = Status register [HEX]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: Powered_Up_Time is measured from power on, and printed as
Jan 12 13:20:25 Tower preclear_disk-start[6674]: DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
Jan 12 13:20:25 Tower preclear_disk-start[6674]: SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: Error 65 occurred at disk power-on lifetime: 7 hours (0 days + 7 hours)
Jan 12 13:20:25 Tower preclear_disk-start[6674]: When the command that caused the error occurred, the device was active or idle.
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: After command completion occurred, registers were:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ER ST SC SN CL CH DH
Jan 12 13:20:25 Tower preclear_disk-start[6674]: -- -- -- -- -- -- --
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 04 71 04 81 87 80 e0 Device Fault; Error: ABRT
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: Commands leading to the command that caused the error were:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
Jan 12 13:20:25 Tower preclear_disk-start[6674]: -- -- -- -- -- -- -- -- ---------------- --------------------
Jan 12 13:20:25 Tower preclear_disk-start[6674]: a1 00 00 00 00 00 a0 00 07:00:26.454 IDENTIFY PACKET DEVICE
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ec 00 00 00 00 00 a0 00 07:00:26.442 IDENTIFY DEVICE
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 00 00 00 00 00 00 00 04 07:00:26.243 NOP [Abort queued commands]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 00 00 00 00 00 00 00 ff 07:00:25.911 NOP [Abort queued commands]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: a1 00 00 00 00 00 a0 00 07:00:20.959 IDENTIFY PACKET DEVICE
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: Error 64 occurred at disk power-on lifetime: 7 hours (0 days + 7 hours)
Jan 12 13:20:25 Tower preclear_disk-start[6674]: When the command that caused the error occurred, the device was active or idle.
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: After command completion occurred, registers were:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ER ST SC SN CL CH DH
Jan 12 13:20:25 Tower preclear_disk-start[6674]: -- -- -- -- -- -- --
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 04 71 04 81 87 80 e0
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: Commands leading to the command that caused the error were:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
Jan 12 13:20:25 Tower preclear_disk-start[6674]: -- -- -- -- -- -- -- -- ---------------- --------------------
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ec 00 00 00 00 00 a0 00 07:00:26.442 IDENTIFY DEVICE
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 00 00 00 00 00 00 00 04 07:00:26.243 NOP [Abort queued commands]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 00 00 00 00 00 00 00 ff 07:00:25.911 NOP [Abort queued commands]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: a1 00 00 00 00 00 a0 00 07:00:20.959 IDENTIFY PACKET DEVICE
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ec 00 00 00 00 00 a0 00 07:00:20.936 IDENTIFY DEVICE
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: Error 63 occurred at disk power-on lifetime: 7 hours (0 days + 7 hours)
Jan 12 13:20:25 Tower preclear_disk-start[6674]: When the command that caused the error occurred, the device was active or idle.
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: After command completion occurred, registers were:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ER ST SC SN CL CH DH
Jan 12 13:20:25 Tower preclear_disk-start[6674]: -- -- -- -- -- -- --
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 04 71 04 81 87 80 e0 Device Fault; Error: ABRT
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: Commands leading to the command that caused the error were:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
Jan 12 13:20:25 Tower preclear_disk-start[6674]: -- -- -- -- -- -- -- -- ---------------- --------------------
Jan 12 13:20:25 Tower preclear_disk-start[6674]: a1 00 00 00 00 00 a0 00 07:00:20.959 IDENTIFY PACKET DEVICE
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ec 00 00 00 00 00 a0 00 07:00:20.936 IDENTIFY DEVICE
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 00 00 00 00 00 00 00 04 07:00:20.729 NOP [Abort queued commands]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 00 00 00 00 00 00 00 ff 07:00:20.395 NOP [Abort queued commands]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: a1 00 00 00 00 00 a0 00 07:00:15.534 IDENTIFY PACKET DEVICE
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: Error 62 occurred at disk power-on lifetime: 7 hours (0 days + 7 hours)
Jan 12 13:20:25 Tower preclear_disk-start[6674]: When the command that caused the error occurred, the device was active or idle.
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: After command completion occurred, registers were:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ER ST SC SN CL CH DH
Jan 12 13:20:25 Tower preclear_disk-start[6674]: -- -- -- -- -- -- --
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 04 71 04 81 87 80 e0
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: Commands leading to the command that caused the error were:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
Jan 12 13:20:25 Tower preclear_disk-start[6674]: -- -- -- -- -- -- -- -- ---------------- --------------------
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ec 00 00 00 00 00 a0 00 07:00:20.936 IDENTIFY DEVICE
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 00 00 00 00 00 00 00 04 07:00:20.729 NOP [Abort queued commands]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 00 00 00 00 00 00 00 ff 07:00:20.395 NOP [Abort queued commands]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: a1 00 00 00 00 00 a0 00 07:00:15.534 IDENTIFY PACKET DEVICE
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ec 00 00 00 00 00 a0 00 07:00:15.421 IDENTIFY DEVICE
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: Error 61 occurred at disk power-on lifetime: 7 hours (0 days + 7 hours)
Jan 12 13:20:25 Tower preclear_disk-start[6674]: When the command that caused the error occurred, the device was active or idle.
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: After command completion occurred, registers were:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ER ST SC SN CL CH DH
Jan 12 13:20:25 Tower preclear_disk-start[6674]: -- -- -- -- -- -- --
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 04 71 04 81 87 80 e0 Device Fault; Error: ABRT
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: Commands leading to the command that caused the error were:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
Jan 12 13:20:25 Tower preclear_disk-start[6674]: -- -- -- -- -- -- -- -- ---------------- --------------------
Jan 12 13:20:25 Tower preclear_disk-start[6674]: a1 00 00 00 00 00 a0 00 07:00:15.534 IDENTIFY PACKET DEVICE
Jan 12 13:20:25 Tower preclear_disk-start[6674]: ec 00 00 00 00 00 a0 00 07:00:15.421 IDENTIFY DEVICE
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 00 00 00 00 00 00 00 04 07:00:15.213 NOP [Abort queued commands]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 00 00 00 00 00 00 00 ff 07:00:14.888 NOP [Abort queued commands]
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 35 00 00 ff ff ff ef 00 06:59:14.132 WRITE DMA EXT
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: SMART Self-test log structure revision number 1
Jan 12 13:20:25 Tower preclear_disk-start[6674]: No self-tests have been logged. [To run self-tests, use: smartctl -t]
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 13:20:25 Tower preclear_disk-start[6674]: SMART Selective self-test log data structure revision number 1
Jan 12 13:20:25 Tower preclear_disk-start[6674]: SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 1 0 0 Not_testing
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 2 0 0 Not_testing
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 3 0 0 Not_testing
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 4 0 0 Not_testing
Jan 12 13:20:25 Tower preclear_disk-start[6674]: 5 0 0 Not_testing
Jan 12 13:20:25 Tower preclear_disk-start[6674]: Selective self-test flags (0x0):
Jan 12 13:20:25 Tower preclear_disk-start[6674]: After scanning selected spans, do NOT read-scan remainder of disk.
Jan 12 13:20:25 Tower preclear_disk-start[6674]: If Selective self-test is pending on power-up, resume after 0 minute delay.
Jan 12 13:20:25 Tower preclear_disk-start[6674]:
Jan 12 15:34:25 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan 12 15:34:25 Tower kernel: ata4.00: BMDMA stat 0x64
Jan 12 15:34:25 Tower kernel: ata4.00: cmd 25/00:00:00:8d:58/00:01:5d:00:00/e0 tag 0 dma 131072 in
Jan 12 15:34:25 Tower kernel: res 51/40:00:a5:8d:58/40:00:5d:00:00/00 Emask 0x9 (media error)
Jan 12 15:34:25 Tower kernel: ata4.00: status: { DRDY ERR }
Jan 12 15:34:25 Tower kernel: ata4.00: error: { UNC }
Jan 12 15:34:25 Tower kernel: ata4.00: configured for UDMA/133
Jan 12 15:34:25 Tower kernel: ata4.01: configured for UDMA/133
Jan 12 15:34:25 Tower kernel: ata4: EH complete
Jan 12 15:34:28 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan 12 15:34:28 Tower kernel: ata4.00: BMDMA stat 0x64
Jan 12 15:34:28 Tower kernel: ata4.00: cmd 25/00:00:00:8d:58/00:01:5d:00:00/e0 tag 0 dma 131072 in
Jan 12 15:34:28 Tower kernel: res 51/40:00:a5:8d:58/40:00:5d:00:00/00 Emask 0x9 (media error)
Jan 12 15:34:28 Tower kernel: ata4.00: status: { DRDY ERR }
Jan 12 15:34:28 Tower kernel: ata4.00: error: { UNC }
Jan 12 15:34:28 Tower kernel: ata4.00: configured for UDMA/133
Jan 12 15:34:28 Tower kernel: ata4.01: configured for UDMA/133
Jan 12 15:34:28 Tower kernel: ata4: EH complete
Jan 12 15:34:31 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan 12 15:34:31 Tower kernel: ata4.00: BMDMA stat 0x64
Jan 12 15:34:31 Tower kernel: ata4.00: cmd 25/00:00:00:8d:58/00:01:5d:00:00/e0 tag 0 dma 131072 in
Jan 12 15:34:31 Tower kernel: res 51/40:00:a5:8d:58/40:00:5d:00:00/00 Emask 0x9 (media error)
Jan 12 15:34:31 Tower kernel: ata4.00: status: { DRDY ERR }
Jan 12 15:34:31 Tower kernel: ata4.00: error: { UNC }
Jan 12 15:34:31 Tower kernel: ata4.00: configured for UDMA/133
Jan 12 15:34:31 Tower kernel: ata4.01: configured for UDMA/133
Jan 12 15:34:31 Tower kernel: ata4: EH complete
Jan 12 15:34:34 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan 12 15:34:34 Tower kernel: ata4.00: BMDMA stat 0x64
Jan 12 15:34:34 Tower kernel: ata4.00: cmd 25/00:00:00:8d:58/00:01:5d:00:00/e0 tag 0 dma 131072 in
Jan 12 15:34:34 Tower kernel: res 51/40:00:a5:8d:58/40:00:5d:00:00/00 Emask 0x9 (media error)
Jan 12 15:34:34 Tower kernel: ata4.00: status: { DRDY ERR }
Jan 12 15:34:34 Tower kernel: ata4.00: error: { UNC }
Jan 12 15:34:34 Tower kernel: ata4.00: configured for UDMA/133
Jan 12 15:34:34 Tower kernel: ata4.01: configured for UDMA/133
Jan 12 15:34:34 Tower kernel: ata4: EH complete
Jan 12 15:34:37 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan 12 15:34:37 Tower kernel: ata4.00: BMDMA stat 0x64
Jan 12 15:34:37 Tower kernel: ata4.00: cmd 25/00:00:00:8d:58/00:01:5d:00:00/e0 tag 0 dma 131072 in
Jan 12 15:34:37 Tower kernel: res 51/40:00:a5:8d:58/40:00:5d:00:00/00 Emask 0x9 (media error)
Jan 12 15:34:37 Tower kernel: ata4.00: status: { DRDY ERR }
Jan 12 15:34:37 Tower kernel: ata4.00: error: { UNC }
Jan 12 15:34:37 Tower kernel: ata4.00: configured for UDMA/133
Jan 12 15:34:37 Tower kernel: ata4.01: configured for UDMA/133
Jan 12 15:34:37 Tower kernel: ata4: EH complete
Jan 12 15:34:40 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jan 12 15:34:40 Tower kernel: ata4.00: BMDMA stat 0x64
Jan 12 15:34:40 Tower kernel: ata4.00: cmd 25/00:00:00:8d:58/00:01:5d:00:00/e0 tag 0 dma 131072 in
Jan 12 15:34:40 Tower kernel: res 51/40:00:a5:8d:58/40:00:5d:00:00/00 Emask 0x9 (media error)
Jan 12 15:34:40 Tower kernel: ata4.00: status: { DRDY ERR }
Jan 12 15:34:40 Tower kernel: ata4.00: error: { UNC }
Jan 12 15:34:40 Tower kernel: ata4.00: configured for UDMA/133
Jan 12 15:34:40 Tower kernel: ata4.01: configured for UDMA/133
Jan 12 15:34:40 Tower kernel: sd 4:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
Jan 12 15:34:40 Tower kernel: sd 4:0:0:0: [sda] Sense Key : 0x3 [current] [descriptor]
Jan 12 15:34:40 Tower kernel: Descriptor sense data with sense descriptors (in hex):
Jan 12 15:34:40 Tower kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Jan 12 15:34:40 Tower kernel: 5d 58 8d a5
Jan 12 15:34:40 Tower kernel: sd 4:0:0:0: [sda] ASC=0x11 ASCQ=0x4
Jan 12 15:34:40 Tower kernel: end_request: I/O error, dev sda, sector 1566084517
Jan 12 15:34:40 Tower kernel: Buffer I/O error on device sda, logical block 195760564
Jan 12 15:34:40 Tower kernel: Buffer I/O error on device sda, logical block 195760565
Jan 12 15:34:40 Tower kernel: Buffer I/O error on device sda, logical block 195760566
Jan 12 15:34:40 Tower kernel: Buffer I/O error on device sda, logical block 195760567
Jan 12 15:34:40 Tower kernel: Buffer I/O error on device sda, logical block 195760568
Jan 12 15:34:40 Tower kernel: Buffer I/O error on device sda, logical block 195760569
Jan 12 15:34:40 Tower kernel: Buffer I/O error on device sda, logical block 195760570
Jan 12 15:34:40 Tower kernel: Buffer I/O error on device sda, logical block 195760571
Jan 12 15:34:40 Tower kernel: Buffer I/O error on device sda, logical block 195760572
Jan 12 15:34:40 Tower kernel: Buffer I/O error on device sda, logical block 195760573
Jan 12 15:34:40 Tower kernel: ata4: EH complete
Jan 12 15:34:40 Tower kernel: sd 4:0:0:0: [sda] 2930277168 512-byte hardware sectors (1500302 MB)
Jan 12 15:34:40 Tower kernel: sd 4:0:0:0: [sda] Write Protect is off
Jan 12 15:34:40 Tower kernel: sd 4:0:0:0: [sda] Mode Sense: 00 3a 00 00
Jan 12 15:34:40 Tower kernel: sd 4:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jan 12 15:34:42 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

I assume the failures at 7 hours are when the preclear failed the first time.

batfink · February 1, 2009

I'm guessing the message I'm getting isn't normal !?!? Cannot seem to upload a screenshot for some reason, so have typed below (telnet also locked up so cannot copy/paste).

Having a hell of a job getting a PCI SATA card to work (Sil3114) - see thread here:

http://lime-technology.com/forum/index.php?topic=3179.15

Thought I would try this pre-clear script to bypass the unRAID clearing but alas no such luck!

Means nothing to me so would be grateful for any help. Cheers.

Pre clear DONE

Step 1 of 10 DONE

blah blah blah.....

Step 2 of 10 - Copying zeros to remainder of disk to clear it

**** This will take a while... you can follow progress below:

Elapsed Time: 4:33:52

238432+0 records in

238432_0 records out

500028145664 bytes (500GB) copied, 8550.05 s, 58.5 MB/s

dd: writing '/dev/sdf': No space left on device

oconnellc · February 5, 2009

I ran this for a single cycle against a WD 1TB drive. Got up this morning and the display seems 'frozen'. No errors or anything, but it isn't doing anything. While this was running I also did a single cycle for a WD 750 GB drive which appears to have completed successfully. Here is the display for the frozen drive:

===========================================================================

= unRAID server Pre-Clear disk /dev/sda

= cycle 1 of 1

= Disk Pre-Clear-Read completed DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward. DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE

= Step 5 of 10 - Clearing MBR code area DONE

= Step 6 of 10 - Setting MBR signature bytes DONE

= Step 7 of 10 - Setting partition 1 to precleared state DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries DONE

= Step 10 of 10 - Testing if the clear has been successful. DONE

= Post-Read in progress: 88% complete.

( 888,330,240,000 of 1,000,204,886,016 bytes read )

Elapsed Time: 9:17:36

I'm not sure if this 'happens' sometimes and I should let it continue or if something horribly wrong has happened with the drive. Any ideas?

Thanks,

Chris

edit

FWIW, I just ran 'top' and "preclear" seems to still be doing something, even though it isn't updating the display:

top - 07:16:06 up 10:47, 2 users, load average: 1.00, 1.00, 1.00

Tasks: 64 total, 2 running, 62 sleeping, 0 stopped, 0 zombie

Cpu(s): 50.0%us, 0.0%sy, 0.0%ni, 50.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 4113156k total, 172132k used, 3941024k free, 80k buffers

Swap: 0k total, 0k used, 0k free, 153560k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

27981 root 20 0 2708 872 328 R 100 0.0 27:54.22 preclear_disk.s

1 root 20 0 776 304 264 S 0 0.0 0:01.57 init

SSD · February 5, 2009

I ran this for a single cycle against a WD 1TB drive. Got up this morning and the display seems 'frozen'. No errors or anything, but it isn't doing anything ...

I know that JoeL. is pretty scarce at the moment. He will likely not be back on the forum until after Valentine's Day. He would be the best person to answer this question, but I will tell you what I would do.

1 - Take a syslog from another telnet session and post it. If there are nasty things happening, likely they'd be in that log.

2 - Run a smartctrl report on that drive from another telnet session and post it. If the drive is failing, there will likely be some sign of it in the smart report.

3 - Go to the server and listen. If you are hearing pops or clicks that sound bad, I'd stop this process. Otherwise, I'd probably let it keep going, giving it ample time to complete. I am not sure if there may be something stopping the screen from being updated while it is still working behind the scenes.

After looking at the syslog and smartctl report, I may have other suggestions.

oconnellc · February 6, 2009

Well, I can't send you either of the logs. I went over to the drive to 'listen' and didn't hear anything. The sides are not attached, so I thought I would slide one off and double check the cables and so I had to move the case a few inches and hit the 'power' button and shut it down.

So, I had gone through the syslog and didn't see anything that looked scary, but that doesn't mean anything... So, I restarted the box and for grins, I went to the tower management page and added the two disks back to the array and started it up, and surprise, the management page told me that the drive that had 'frozen' was all right and formatted. The drive that had finished said 'unformatted'. No idea how that happens...

So, I'm running a smartctl report on both disks now. I have to go to work, so I'll post the logs when I get home and then maybe rerun the preclear for a single cycle.

Thanks for the help.

BTW, I'm pretty new to all this command line stuff, so here is the syntax for the smartctl report I ran. There are a lot of options and so I'm not sure which is correct.

root@Tower:~# smartctl --test=long /dev/sda

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===

Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".

Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.

Testing has begun.

Please wait 221 minutes for test to complete.

Test will complete after Thu Feb 5 11:44:39 2009

Use smartctl -X to abort test.

Where will smartctl write the report?

Chris

SSD · February 6, 2009

What you did was NOT run a smart report, it was to initiate a long smart test. That's not a bad tihing, but not what I requested.

Look at the troubleshooting page again (see my sig for a link) and read the section on smartctl again more carefully, it will answer your question about how to get the report. When you kick off the long test, it does not produce a report. To get the results you need to run the report however.

I am a little fuzzy on the current state of your array. Can you post a screenshot of your main page?

oconnellc · February 6, 2009

Ok, I think I am in a better state now. I have run the preclear on both disks and it completed. The 750GB disk finished with no issue. The 1TB preclear looks like this:

===========================================================================

= unRAID server Pre-Clear disk /dev/sda

= cycle 1 of 1

= Disk Pre-Clear-Read completed DONE

= Step 1 of 10 - Copying zeros to first 2048k bytes DONE

= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE

= Step 3 of 10 - Disk is now cleared from MBR onward. DONE

= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4 DONE

= Step 5 of 10 - Clearing MBR code area DONE

= Step 6 of 10 - Setting MBR signature bytes DONE

= Step 7 of 10 - Setting partition 1 to precleared state DONE

= Step 8 of 10 - Notifying kernel we changed the partitioning DONE

= Step 9 of 10 - Creating the /dev/disk/by* entries DONE

= Step 10 of 10 - Testing if the clear has been successful. DONE

= Disk Post-Clear-Read completed DONE

Elapsed Time: 9:43:07

============================================================================

==

== Disk /dev/sda has been successfully precleared

==

============================================================================

S.M.A.R.T. error count differences detected after pre-clear

note, some 'raw' values may change, but not be an indication of a problem

19,20c19,20

< Offline data collection status: (0x82) Offline data collection activity

< was completed without error.

---

> Offline data collection status: (0x84) Offline data collection activity

> was suspended by an interrupting command from host.

============================================================================

So, here is a smart report for that disk (smart.txt). I don't know if the fact that "Offline data collection activity was suspended" is a big deal or not. So, I decided to run another smart test, and then I got another report (smart2.txt).

I appreciate you taking the time to look at this.

oconnellc · February 6, 2009

So, I got an error trying to upload the files, so here is the actual contents. The only difference I can see between the two is the the last section with the info about the self test log structure.

smart.txt

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Device Model: WDC WD1001FALS-00J7B0

Serial Number: WD-WMATV0509648

Firmware Version: 05.00K05

User Capacity: 1,000,204,886,016 bytes

Device is: Not in smartctl database [for details use: -P showall]

ATA Version is: 8

ATA Standard is: Exact ATA specification draft version not indicated

Local Time is: Fri Feb 6 07:15:23 2009 GMT+6

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status: (0x84) Offline data collection activity

was suspended by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (19200) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 2) minutes.

Extended self-test routine

recommended polling time: ( 221) minutes.

Conveyance self-test routine

recommended polling time: ( 5) minutes.

SCT capabilities: (0x303f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0

3 Spin_Up_Time 0x0027 232 229 021 Pre-fail Always - 8375

4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 17

5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0

7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0

9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 47

10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0

11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 10

192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 4

193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 17

194 Temperature_Celsius 0x0022 122 107 000 Old_age Always - 28

196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0

200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Extended offline Completed without error 00% 27 -

# 2 Extended offline Aborted by host 90% 24 -

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

and smart2.txt

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Device Model: WDC WD1001FALS-00J7B0

Serial Number: WD-WMATV0509648

Firmware Version: 05.00K05

User Capacity: 1,000,204,886,016 bytes

Device is: Not in smartctl database [for details use: -P showall]

ATA Version is: 8

ATA Standard is: Exact ATA specification draft version not indicated

Local Time is: Fri Feb 6 07:31:31 2009 GMT+6

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status: (0x85) Offline data collection activity

was aborted by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (19200) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 2) minutes.

Extended self-test routine

recommended polling time: ( 221) minutes.

Conveyance self-test routine

recommended polling time: ( 5) minutes.

SCT capabilities: (0x303f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0

3 Spin_Up_Time 0x0027 232 229 021 Pre-fail Always - 8375

4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 17

5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0

7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0

9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 47

10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0

11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 10

192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 4

193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 17

194 Temperature_Celsius 0x0022 122 107 000 Old_age Always - 28

196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0

198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0

200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Completed without error 00% 47 -

# 2 Extended offline Completed without error 00% 27 -

# 3 Extended offline Aborted by host 90% 24 -

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

Thanks again.

oconnellc · February 6, 2009

Oh, and as for the status of the array. Well, I can't upload anything, so I'll describe it to you. I reassigned the devices to a spot in the array (same place they were in before), then I went to the main page and started the array. Both disks are 'green' and have lots of reads/writes. For the 'Free' number, the parity disk (sda in all my previous posts, the 1TB WD drive) just has a dash "-". The other disk (a WD 750 GB drive) says 'Unformatted'. I don't understand that at all... Since both completed the preclear, shouldn't they both be 'Unformatted'?

SSD · February 6, 2009

After pre-clear, the drives should be unformatted.

Just so you understand, preclear fills the drives with binary zeros and writes a special signature to the drive which tells unRAID that these drives have been prepared in this way. unRAID TRUSTS that indicator and does not, itself, fill the drives with zeros (the normal behavior). If the drive is really not clear yet has the special signature, than parity will be incorrect the second the drive is added to the array. I don't know if the signature is added at the beginning or end of the preclear process (hopefully it is at the end), otherwise a partically complete pre-clear could introduce an opportunity for many sync errors to come.

The smart reports you posted do not indicate any drive problems that I can see. Some of your long tests stopped at the 90% mark. Not sure why. It was previously suggested that unRAID spin down timers might somehow cause the test to be canceled. That could be responsible, as could a reboot I suppose. Long tests can take several hours to complete.

I am still confused as to the state of your array. Is everything good? A screenshot would have been hepful, but if you are seeing all green drives and the two drives that you successfully precleared as unformatted, that is normal. Go ahead and format them. The array should be good to go. If you are having problems, please describe them in more detail.

If you think all is good, I would recommend running a parity check on the array overnight tonight. If it completes with zero sync errors and zero drive errors, your array is healthy.

oconnellc · February 6, 2009

Thanks for taking a look. Just to make sure I am clear and understand you, 'Unformatted' is alright. However, only the first drive disk had that. The parity disk didn't say 'Unformatted', it just had a "-" character. Maybe parity is "different" and that is what it is supposed to say? I didn't click the button to have unraid format any unformatted drives. Assuming that "-" is not a problem, I will do that and then start a parity check.

Sorry I didn't upload the screen shot, but there is a problem with the forum now and it reports an error when trying to upload a file.

Thanks again,

Chris

SSD · February 6, 2009

Thanks for taking a look. Just to make sure I am clear and understand you, 'Unformatted' is alright. However, only the first drive disk had that. The parity disk didn't say 'Unformatted', it just had a "-" character. Maybe parity is "different" and that is what it is supposed to say? I didn't click the button to have unraid format any unformatted drives. Assuming that "-" is not a problem, I will do that and then start a parity check.

Sorry I didn't upload the screen shot, but there is a problem with the forum now and it reports an error when trying to upload a file.

Thanks again,

Chris

Here is a good place to upload pictures for posting. I use it frequently. http://imageshack.us/ Once you get a url of the picture (grab the direct link to the picture), paste the link into your message, highlight it, and then press the "Iinsert image" button. Preview your message and it should show up inline in the message. (Make sure you save the screenshot as a .JPG before uploading so that it isn't huge!)

Preclear does two things - it burns in disks AND it prepares them in a way they can be quickly added as data disks. The burnin part is a good thing to do, even if there is no need to prepare them.

In your situation, there was no need to prepare either of them. A parity disk NEVER needs to be prepared. And a data disk does not need to be prepared if you don't have valid parity (if you are adding a parity disk, you don't have valid parity).

The parity disk does not have a "format". It is just a parity calculation. The "-" is fine.

Has the array started? If so, it should be building parity.

oconnellc · February 6, 2009

Thanks. I didn't realize that the parity didn't have a filesystem. In retrospect, it makes sense. I just started the array, didn't have it rebuild parity. I've learned a lot from your last two posts, and I'll try to learn more before asking questions next time to avoid wasting your time. Given what I have learned, I will probably start preclear runs on both disks with a count of 5 or 10 just to put them through their paces a bit before starting things up. This is a brand new, first time build for me, so I have been a bit paranoid before starting out. Given how long a singly cycle takes for me, I should have an operational unraid by Monday or so.

Thanks again for all of your help.

Unrelated... what do you use to turn a screen grab into an image? My only thought was to hit "alt + prtscrn" and paste it into word. If you know of a tool that would let me save it directly as an image that would help.

Thanks again.

SSD · February 6, 2009

MS Paint - Part of Windows forever - Under Program Files -> Accessories.

If parity didn't build you have a serious problem. You zeroed out your parity disk! The only way for parity to be valid is if it built AFTER you did your preclear.

I'd start a parity check. If you start getting sync errors, cancel it and post back for instructions. If it is not getting sync errors then parity did build. It is still a good idea to run a full parity check after any parity build!

JarDo · February 6, 2009

I had a preclear operation running on two WD 1TB drives at the same time. The preclear seemed to finish just fine with on both of the drives, but each indicated that SMART Raw_Read_Error_Rate counts changed after the pre-clear. What I find interesting is that even though it was to distinct drives, the error counts were exactly the same.

Disk #1

===========================================================================
=                unRAID server Pre-Clear disk /dev/sdh
=                       cycle 1 of 1
= Disk Pre-Clear-Read completed                                 DONE
= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE
= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE
= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE
= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE
= Step 5 of 10 - Clearing MBR code area                         DONE
= Step 6 of 10 - Setting MBR signature bytes                    DONE
= Step 7 of 10 - Setting partition 1 to precleared state        DONE
= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE
= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE
= Step 10 of 10 - Testing if the clear has been successful.     DONE
= Disk Post-Clear-Read completed                                DONE
Elapsed Time:  15:52:25
============================================================================
==
== Disk /dev/sdh has been successfully precleared
==
============================================================================
S.M.A.R.T. error count differences detected after pre-clear
note, some 'raw' values may change, but not be an indication of a problem
54c54
<   1 Raw_Read_Error_Rate     0x000f   100   253   051    Pre-fail  Always                                  -       0
---
>   1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always                                  -       0
============================================================================

Disk #2

===========================================================================
=                unRAID server Pre-Clear disk /dev/sda
=                       cycle 1 of 1
= Disk Pre-Clear-Read completed                                 DONE
= Step 1 of 10 - Copying zeros to first 2048k bytes             DONE
= Step 2 of 10 - Copying zeros to remainder of disk to clear it DONE
= Step 3 of 10 - Disk is now cleared from MBR onward.           DONE
= Step 4 of 10 - Clearing MBR bytes for partition 2,3 & 4       DONE
= Step 5 of 10 - Clearing MBR code area                         DONE
= Step 6 of 10 - Setting MBR signature bytes                    DONE
= Step 7 of 10 - Setting partition 1 to precleared state        DONE
= Step 8 of 10 - Notifying kernel we changed the partitioning   DONE
= Step 9 of 10 - Creating the /dev/disk/by* entries             DONE
= Step 10 of 10 - Testing if the clear has been successful.     DONE
= Disk Post-Clear-Read completed                                DONE
Elapsed Time:  15:48:04
============================================================================
==
== Disk /dev/sda has been successfully precleared
==
============================================================================
S.M.A.R.T. error count differences detected after pre-clear
note, some 'raw' values may change, but not be an indication of a problem
19,20c19,20
< Offline data collection status:  (0x80)       Offline data collection activity
<                                       was never started.
---
> Offline data collection status:  (0x84)       Offline data collection activity
>                                       was suspended by an interrupting command                                       from host.
54c54
<   1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always                                             -       0
---
>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always                                             -       0
============================================================================

SSD · February 6, 2009

I'm not overly concerned with those attributes. I think that is all fine.

I'm just wondering if you somehow did an "end around" on unRAID and confused it into thinking your parity was valid and didn't need to be rebuilt.

Did you run the parity check in the way I requested in my prior post?

oconnellc · February 7, 2009

I can't remember if I had to stop parity manually or if it didn't run when I restarted the array. In any case, I kicked it off and it ran just fine. So, now I'm just burning it in with 'preclear'.

I'm doing the as the preclear script updates and it looks like I'm reading from each disk at about 100MB/sec as I run preclear on both disks at the same time. That seems like it is pretty fast, but I don't know if this is necessarily a good test or not.

Chris

RockDawg · February 12, 2009

I recently purchased a Seagate 750G drive and rand preclear on it. Once with a single pass and again with a double pass. I was adding it to my unraid tower as a replacement to a smaller existing drive. After the preclear I powered down the server and removed the smaller drive and installed the new "precleared" Seagate. When I powered up I went under devices and assigned the new drive to the disk number (disk9) of the old (now missing) drive. When I went back to the main page it showed a blue dot next to the new drive and presented me with the option to "upgrade disk and rebuild data" (or something like that) or "restore". I chose "upgrade disk" and expected it to then format the drive, but it never said that it did. It just proceeded to start the rebuild. Why didn't it say anything about formatting the drive?

SSD · February 12, 2009

I recently purchased a Seagate 750G drive and rand preclear on it. Once with a single pass and again with a double pass. I was adding it to my unraid tower as a replacement to a smaller existing drive. After the preclear I powered down the server and removed the smaller drive and installed the new "precleared" Seagate. When I powered up I went under devices and assigned the new drive to the disk number (disk9) of the old (now missing) drive. When I went back to the main page it showed a blue dot next to the new drive and presented me with the option to "upgrade disk and rebuild data" (or something like that) or "restore". I chose "upgrade disk" and expected it to then format the drive, but it never said that it did. It just proceeded to start the rebuild. Why didn't it say anything about formatting the drive?

Although the preclear script might have done a nice burnin test of the drive, I don't think it benefits you on a drive rebuild (unless it prevents clearing of the rest of the disk, which am doubtful about).

If you are doing a rebuild I don't think there is any prompting or options to format. It just does it. When you start the array with that blue disk, the status of array says something about that it is going to rebuild the disk. That's all you get!

RobJ · February 12, 2009

There is no need to format a disk before a drive rebuild, and in fact if it had formatted it first, the 'format' would then be overwritten by the rebuild.

All a format is, is the creation of a brand new file system on the drive, including the hidden file system structures, and an empty root directory, but no files and folders. In a sense, a drive rebuild is almost the same thing, in that it copies a complete file system with all of its hidden structures to the drive, but it *also* copies all of the files and folders. Put another way, a drive rebuild copies the 'format' of the previous drive.

RockDawg · February 13, 2009

Thanks for the feedback guys. I thought I remembered unRaid formatting the drive even before an upgrade, but it's been so long since I've done one, I must be mistaken. It makes sense that a rebuild would copy over the file system thereby 'formatting' the drive. I never thought of it like that before.

I did the preclear primarily to test the drive (nice utility Joe L.!).

Joe L. · February 14, 2009

I recently purchased a Seagate 750G drive and rand preclear on it. Once with a single pass and again with a double pass. I was adding it to my unraid tower as a replacement to a smaller existing drive. After the preclear I powered down the server and removed the smaller drive and installed the new "precleared" Seagate. When I powered up I went under devices and assigned the new drive to the disk number (disk9) of the old (now missing) drive. When I went back to the main page it showed a blue dot next to the new drive and presented me with the option to "upgrade disk and rebuild data" (or something like that) or "restore". I chose "upgrade disk" and expected it to then format the drive, but it never said that it did. It just proceeded to start the rebuild. Why didn't it say anything about formatting the drive?

Although the preclear script might have done a nice burnin test of the drive, I don't think it benefits you on a drive rebuild (unless it prevents clearing of the rest of the disk, which am doubtful about).

The preclear script does nothing that will save time when rebuilding an existing drive. The array downtime it saves is only when adding an additional drive to an existing array where parity is already configured and calculated.

As you said, the preclear script will let you have some confidence the drive you will be rebuilding onto is working properly.

If you are doing a rebuild I don't think there is any prompting or options to format. It just does it. When you start the array with that blue disk, the status of array says something about that it is going to rebuild the disk. That's all you get!

True, a drive that is "rebuilt" onto does not get cleared, or formatted. It gets the original drive's contents (and formatting).

Joe L.

SSD · February 14, 2009

Welcome back, JoeL!

If you are doing a drive rebuild, say from a 250G drive to a 1T drive, does unRAID have to clear the 750G portion of the drive. I think the answer is yes, otherwise parity would need to be recomputed for the remainder of the disk. If it does have to clear it, are you sure that a precleared disk will not skip that step? (I don't think it is that smart, but as I posted a few posts back in this thread, there is at least an opportunity for it to save some effort in this of case).

Joe L. · February 14, 2009

Welcome back, JoeL!

If you are doing a drive rebuild, say from a 250G drive to a 1T drive, does unRAID have to clear the 750G portion of the drive.

The first step unRAID performs in upgrading a drive is to resize the file-system on the in-memory version of the drive being replaced. At that point, from there onward during the rebuild of the new drive onto its replacement, you have access to the entire new drive up to its physical limits. It is entirely possible to write to ANY block on the drive being upgraded, while it is being rebuilt on the replacement.

For that reason, it is possible that the 750Gig portion of the 1T drive being used as a replacement is not all zeros as you might expect.

I think the answer is yes, otherwise parity would need to be recomputed for the remainder of the disk.

You are not computing parity when replacing an existing drive, you are instead computing the "data" using the existing parity and remaining data drives.

If it does have to clear it, are you sure that a precleared disk will not skip that step? (I don't think it is that smart, but as I posted a few posts back in this thread, there is at least an opportunity for it to save some effort in this of case).

As I already said, the opportunity to save a bit of time when re-constructing a drive might not be there, as the expanded file-system is already in place.

When replacing/upgrading an existing drive, the pre-clear script will only make it easier to burn-in and identify a defective drive before you use it to replace the existing drive. There is no logic in unRAID to make the rebuild go faster if the upper portion of the disk was already filled with zeros. (and I'm not sure it could regardless, since that portion of the drive has already been "formatted" to its final expanded size. and the corresponding bits in the physical drive need to be brought into sync..)

Joe L.

prostuff1 · February 16, 2009

Just finished a preclear on a 1.5TB Seagate with CC1H firmware. Here is the output from the preclear, if anyone could give some input on the output that would be great!

Feb 16 09:42:32 Tower preclear_disk-finish[12233]: smartctl version 5.38 [i486-slackware-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Home page is http://smartmontools.sourceforge.net/
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: === START OF INFORMATION SECTION ===
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Device Model:     ST31500341AS
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Serial Number:    9VS1392C
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Firmware Version: CC1H
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: User Capacity:    1,500,301,910,016 bytes
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Device is:        Not in smartctl database [for details use: -P showall]
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ATA Version is:   8
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ATA Standard is:  ATA-8-ACS revision 4
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Local Time is:    Mon Feb 16 09:42:32 2009 EST
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: SMART support is: Available - device has SMART capability.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: SMART support is: Enabled
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: === START OF READ SMART DATA SECTION ===
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: SMART overall-health self-assessment test result: PASSED
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: General SMART Values:
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Offline data collection status:  (0x82)^IOffline data collection activity
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^Iwas completed without error.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^IAuto Offline Data Collection: Enabled.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Self-test execution status:      (   0)^IThe previous self-test routine completed
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^Iwithout error or no self-test has ever 
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^Ibeen run.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Total time to complete Offline 
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: data collection: ^I^I ( 617) seconds.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Offline data collection
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: capabilities: ^I^I^I (0x7b) SMART execute Offline immediate.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^IAuto Offline data collection on/off support.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^ISuspend Offline collection upon new
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^Icommand.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^IOffline surface scan supported.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^ISelf-test supported.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^IConveyance Self-test supported.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^ISelective Self-test supported.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: SMART capabilities:            (0x0003)^ISaves SMART data before entering
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^Ipower-saving mode.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^ISupports SMART auto save timer.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Error logging capability:        (0x01)^IError logging supported.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^IGeneral Purpose Logging supported.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Short self-test routine 
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: recommended polling time: ^I (   1) minutes.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Extended self-test routine
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: recommended polling time: ^I ( 255) minutes.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Conveyance self-test routine
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: recommended polling time: ^I (   2) minutes.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: SCT capabilities: ^I       (0x103f)^ISCT Status supported.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^ISCT Feature Control supported.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ^I^I^I^I^ISCT Data Table supported.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: SMART Attributes Data Structure revision number: 10
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Vendor Specific SMART Attributes with Thresholds:
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:   1 Raw_Read_Error_Rate     0x000f   120   100   006    Pre-fail  Always       -       1754804
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:   3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:   4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       3
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:   7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       186952
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:   9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       38
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:  12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       3
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always       -       0
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 189 High_Fly_Writes         0x003a   094   094   000    Old_age   Always       -       6
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 190 Airflow_Temperature_Cel 0x0022   067   066   045    Old_age   Always       -       33 (Lifetime Min/Max 22/34)
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 195 Hardware_ECC_Recovered  0x001a   055   052   000    Old_age   Always       -       1754804
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       211892211548198
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 241 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       474140439
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 242 Unknown_Attribute       0x0000   100   253   000    Old_age   Offline      -       1964868157
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: SMART Error Log Version: 1
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: No Errors Logged
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: SMART Self-test log structure revision number 1
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: No self-tests have been logged.  [To run self-tests, use: smartctl -t]
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: 
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: SMART Selective self-test log data structure revision number 1
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:     1        0        0  Not_testing
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:     2        0        0  Not_testing
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:     3        0        0  Not_testing
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:     4        0        0  Not_testing
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:     5        0        0  Not_testing
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: Selective self-test flags (0x0):
Feb 16 09:42:32 Tower preclear_disk-finish[12233]:   After scanning selected spans, do NOT read-scan remainder of disk.
Feb 16 09:42:32 Tower preclear_disk-finish[12233]: If Selective self-test is pending on power-up, resume after 0 minute delay.

Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Joe L.

sureguy

sureguy

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation