LFletcher Posted November 22, 2017 Share Posted November 22, 2017 Hi, Disk 7 has gone into an error state with a nice big red cross next to it. I followed the steps in this section of the troubleshooting guide, https://wiki.lime-technology.com/Troubleshooting#What_do_I_do_if_I_get_a_red_X_next_to_a_hard_disk.3F and have the diagnostics from before and after the reboot (see attached). From looking at the info in the syslog this is when the issue occured; Nov 22 18:23:31 unraid kernel: sd 1:0:12:0: task abort: SUCCESS scmd(ffff8807e28d1080) Nov 22 18:23:31 unraid kernel: sd 1:0:12:0: [sdn] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Nov 22 18:23:31 unraid kernel: sd 1:0:12:0: [sdn] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 a4 92 1b 98 00 00 00 08 00 00 Nov 22 18:23:31 unraid kernel: blk_update_request: I/O error, dev sdn, sector 2761038744 Nov 22 18:23:31 unraid kernel: md: disk7 read error, sector=2761038680 Nov 22 18:23:31 unraid kernel: md: disk7 read error, sector=5824529848 Nov 22 18:23:31 unraid kernel: md: disk7 read error, sector=5824529856 Nov 22 18:23:31 unraid kernel: md: disk7 read error, sector=5824529864 Looking at the smart info for disk 7 I can see that the Reallocated_Sector_Ct isn't great. 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 480 I ran a quick smart test on the drive after the reboot and it appeared to get stuck at 90%. The sector count increased to; 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 536 So at this stage I assume I better RMA the drive back to Seagate as if it's not dead yet it will be soon? I have 2 questions with regards to replacing the drive as this will be the first time I've had to do it with unRaid and I don't want to do anything stupid and lose any data. I have no idea if anything was copying specifically to this drive at the time of the failure, but I was moving stuff off my cache into the main array. How would I know if any of the data I was copying at the time has become corrupt - or more to the point, how does unRaid deal with a write failure? Looking at the re-enable a drive section (https://wiki.lime-technology.com/Troubleshooting#Re-enable_the_drive) seems to indicate the data went to an emulated drive, so should be ok and I won't have to hunt for the file(s) which may now be corrupt - is that assumption correct? Reading the Replace a drive section (https://wiki.lime-technology.com/Replacing_a_Data_Drive) is the following procedure correct for replacing the bad drive; Stop the array Unassign the old drive if still assigned (to unassign, set it to No Device) Power down [ Optional ] Pull the old drive (you may want to leave it installed for Preclearing or testing) Install the new drive Power on Assign the new drive in the slot of the old drive Go to the Main -> Array Operation section Put a check in the Yes, I'm sure checkbox (next to the information indicating the drive will be rebuilt), and click the Start button Does the checkbox mentioned in step 9 appear once you have unassigned the old drive and reassigned the new drive as this option is currently available with the rebooted and stopped array? Thanks for any help, it's very much appreciated. unraid-diagnostics-20171122-2037.zip unraid-diagnostics-20171122-2209.zip Quote Link to comment
JorgeB Posted November 22, 2017 Share Posted November 22, 2017 So at this stage I assume I better RMA the drive back to Seagate as if it's not dead yet it will be soon? I would replace it ASAP. I have no idea if anything was copying specifically to this drive at the time of the failure, but I was moving stuff off my cache into the main array. How would I know if any of the data I was copying at the time has become corrupt - or more to the point, how does unRaid deal with a write failure? By the way it happened, i.e., it started with a read error and when this happens unRAID tries to write the data back to those sectors using parity plus all other data disks to calculate what should be there, so it looks like it didn't happened while writing new files to that disk, if it did you'd need to have checksums or be using btrfs to check for corruption, unRAID does it's best to start writing to the emulated disk without losing anything, but depending on how a disk fails corruptions is possible on the file being written at the moment it switches to the emulated disk. is the following procedure correct for replacing the bad drive; Yes, but you can skip step 2. Quote Link to comment
LFletcher Posted November 23, 2017 Author Share Posted November 23, 2017 Thanks for the response. So in summary some or no data on disk 7 may or may not now be corrupt once I restore it back to a new disk? And I should also check the new data which was copying off the cache when the issue ordered just in case. Are there any tools which would assist with checking the files? In the past I've used mediainfo as that won't show container info if the file is corrupt. I assume I could create a disk share once the restore is complete and just scan that? Quote Link to comment
JorgeB Posted November 23, 2017 Share Posted November 23, 2017 2 minutes ago, LFletcher said: So in summary some or no data on disk 7 may or may not now be corrupt once I restore it back to a new disk? Most likely no corrupt data but without checksums or a btrfs filesystem no way of knowing for sure. 3 minutes ago, LFletcher said: Are there any tools which would assist with checking the files? For xfs disks you can use the dynamix file integrity plugin. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.