inh Posted November 22, 2014 Share Posted November 22, 2014 This is one of my oldest drives, a WD green from ~2011 if I recall correctly. It hasn't red-balled or anything but it does store my most important information. TEST for WDC_WD30EZRX-00MMMB0_WD-WCAWZ1116457 on 201411212005 smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s) Device Model: WDC WD30EZRX-00MMMB0 Serial Number: WD-WCAWZ1116457 LU WWN Device Id: 5 0014ee 2b0c7e823 Firmware Version: 80.00A80 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Fri Nov 21 20:05:49 2014 HST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 114) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (49800) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 478) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x3035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 1 3 Spin_Up_Time 0x0027 146 144 021 Pre-fail Always - 9658 4 Start_Stop_Count 0x0032 093 093 000 Old_age Always - 7024 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 070 070 000 Old_age Always - 22266 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 202 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 47 193 Load_Cycle_Count 0x0032 149 149 000 Old_age Always - 155215 194 Temperature_Celsius 0x0022 120 108 000 Old_age Always - 32 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 1 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 20% 22259 273017280 # 2 Extended offline Interrupted (host reset) 90% 22253 - # 3 Short offline Completed: read failure 10% 22253 273017280 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Quote Link to comment
WeeboTech Posted November 22, 2014 Share Posted November 22, 2014 rsync your data off the drive or rebuild it with unRAID onto 'another' drive. The pending sector or offline uncorrectable sector is deadly to a rebuild. If you have another drive fail, this drive may prevent rebuilding the other drive. Even the firmware can't get by the sector. I give that a higher level of urgency as the fact is that even with retries it cannot get past it. Especially since you clarify that "it does store my most important information." The only way to re-allocate the sector is to write to it. However the math to do that eludes me and probably a great deal of people here. So generally what people do is move the data and preclear the drive. Me, I would move the data, or rebuild onto another drive. Then run this drive through 5 passes of badblocks in write/read mode with different patterns. (before I've validated the moved data) I've had success with that and it's also proven that drives were not worthy of my data. It's times like this, I'm usually happy that I have boxes of spare drives. Quote Link to comment
dgaschk Posted November 23, 2014 Share Posted November 23, 2014 See here: http://lime-technology.com/wiki/index.php/Troubleshooting#Resolving_a_Pending_Sector Try this: http://daemon-notes.com/articles/system/smartmontools/current-pending Quote Link to comment
Joe L. Posted November 23, 2014 Share Posted November 23, 2014 I do not see a sector "pending re-allocation" in that smart report. I see one that was detected offline... not exactly the same thing. 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 1 Advice is still valid. If you do not have another copy of your most important files, make a copy elsewhere. unRAID is NOT a backup, it is a way to recover from a single hard-disk failure. Joe L. Quote Link to comment
WeeboTech Posted November 23, 2014 Share Posted November 23, 2014 My mistake about the colums in a quick response, However,This read failure below is deadly to a rebuild. unRAID may or may not kick the drive out of the array depending on time out factor and value returned. Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 20% 22259 273017280 Chances are you'll be able to rsync most of the data and a few files will get a read error. This just happened to me recently as I was trying to access a file that ended up having pending sectors. Here's something you can attempt within the drive's firmware. I've never executed this test. So this will be new territory in recovery. http://daemon-notes.com/articles/system/smartmontools/offline-uncorrectable Frankly, I would try to rsync my most important data to a spare drive first. (but that's me). Once you have that offline spare backup tucked away somewhere you can try other recovery methods i.e. the smart offline test, a rebuild of the drive onto a new drive or whatever other recovery procedure you want to attempt. Quote Link to comment
inh Posted November 27, 2014 Author Share Posted November 27, 2014 I used rsync to pull the files off and there were a ton that couldn't be copied due to errors. Instinct tells me I should image the drive before it gets worse but I don't have anything big enough to hold the image, so an rsync backup will have to do. Should I start with reiserfsk to try and recover, or ddrescue? Quote Link to comment
SSD Posted November 27, 2014 Share Posted November 27, 2014 A read failure from the drive (while operating as part of the array) would trigger unRaid to reconstruct the bad block using all the other disks in the array and perform a write back to the bad disk. In theory, that should force a sector remap. So even a parity check should force this correction to occur. Quote Link to comment
inh Posted November 27, 2014 Author Share Posted November 27, 2014 I probably should have mentioned that this drive failed sometime during/after reconstructing data on another drive after it also failed, so parity is of no help here Quote Link to comment
WeeboTech Posted November 27, 2014 Share Posted November 27, 2014 ddrescue requires a drive of equal size to be of use in this recovery. If you don't have one, or do not want to acquire one, reiserfsck is the only choice. Quote Link to comment
inh Posted November 27, 2014 Author Share Posted November 27, 2014 I do have a disk I could use if I really need to, would you recommend ddrescue over reiserfsck? Quote Link to comment
WeeboTech Posted November 27, 2014 Share Posted November 27, 2014 I do have a disk I could use if I really need to, would you recommend ddrescue over reiserfsck? ddrescue is not going to fix the problem alone. ddrescue copies the disk to another disk. It makes many attempts. It LOGS what sectors are bad. You can then use those sectors to attempt another copy in reverse or retry the copies. Once you have copied the whole disk, you can attempt the reiserfsck on the copied disk. This leaves the original disk in the most untouched state possible so you can try and repair on the temporary copy without destroying your only copy. It takes a long time and a bunch of command line work, but I was able to retrieve all but 1 sector of a failed disk. If you plan to go that route search on the board and google for how to use ddrescue I can't remember the details of how i used it. It all depends on how precious the data is to you. You can attempt the reiserfsck on the current disk and hope for the best. From what we've seen reiserfs has been quite resilient and recovers allot. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.