disk errors


Recommended Posts

There is no problem with 35 reallocated sectors. If they are reallocated and all of the other sectors are 100% solid, no problem. But the truth is, once sectors start to go, we've seen a trend that this is the beginning of disk failure. So whether is is 35, 60, 2, 1029. Doesn't much matter. If the number is changing, that is bad. Think of reallocated sectors as soccer players with broken legs sitting on the bench. They are out of the game. Question is, is your team so geriatric that others are going to soon break their legs!

 

Pending reallocations means that the spot has been flagged as potentially bad, and that reallocations can happen in the future. If one or more of those spots on the disk have truly failed, then getting its data is going to be impossible. That is bad. You might think of pending reallocations are like limping soccer players on the field. Do they have broken legs or just a mild cramps? A player on the field with a broken leg is very bad! Like real reallocations, you still have to ask if you're team is really fit for duty to play this game!

 

Rule of thumb for me - if you can run 3 consecutive parity checks and your reallocated sectors hold steady, you can probably trust your disk. At least for a while. I've had a VERY few that have 1 or 2 reallocated sectors almost since birth, and they work great for years. But if I see even 1 crop up on an older drive, I have always seen it be the beginning of the end.

 

Pending reallocations sometimes reallocate or clear with parity checks, but if not you can exchange with a fresh disk and then try preclearing them. That usually causes them to either reallocate or be marked good. I had a very old disk that had like 10 pending reallocations that would not clear. And stayed that way. I think it was a bug in the SMART BIOS. Anyway, it worked great for a long time. It is really looking for movement in these attributes and not a hard number that tells you how bad they are.

Link to comment

Definitely agree -- even if the numbers were reversed [e.g. 250 reallocated sectors vs. 35 pending sectors] I'd still be far more concerned with the drive with pending sectors.   Reallocated sectors are a normal feature of all modern disks -- that's what the spare sectors are for.    You should, however, as Brian noted, monitor the count, and if it's continuously increasing then I'd replace the disk.    Pending sectors are more problematic -- essentially they mean that a sector has been flagged as bad, but can't be reallocated because the disk can't get a good read of the sector to reallocate the data.   So I'd definitely replace any disk with pending sectors .. regardless of the count.

 

 

Link to comment
1 hour ago, LinuxGuyGary said:

What about attribute 187,  Reported uncorrect ?

   

  Problem is that some attributes are not reported by all manufacturers and they can slightly different meanings,again, depending on the manufacturer.  Anything that indicates that a sector is unreadable and/or the drive is having major problems with reading from a sector should be a big yellow caution flag.    As I understand it, a pending sector will be remapped to a sector in a pool of reserved sectors the NEXT time that it is written to.  Until then, it remains in use...

 

Having written that, I wonder if you have a real question, or if it was a rhetorical remark.

Link to comment
46 minutes ago, LinuxGuyGary said:

My real question is should Reported uncorrect, be looked at like Pending Sectors or more like  Reallocated Sectors  ,   e.g. one pretty normal and the other as implying a problem thats not getting corrected by the drives firmware as expected. 

Here is the explanation of the attribute from the Wikipedia entry for SMART:

 

" 187  Reported Uncorrectable Errors       The count of errors that could not be recovered using hardware ECC (see attribute 195)."

 

From what I am reading there, I would put it in the same category as #197 Current Pending Sectors.  My WD, Hitachi, and Seagate drives do not even report the 187 Attribute.  So they are using another number to report this condition and I suspect that it is # 197.  As you can see from, different manufacturers may well treat the same symptom with a different attribute number.  That is why the whole issue of interpreting the meaning of the numbers is so difficult! 

 

Remember unRAID  is much less tolerant of read failures than most operating systems are.  It needs to able to read EVERY sector on a disk (during a data rebuild on a replacement disk)  whereas other OS's only need to be able to read the ones that have data written to them.  That is why many of us are reluctant to leave a disk in the array that shows any signs of instability!  You should realize that if unRAID can't read a sector on a disk, it can recreate that sector on the fly by using PArity to rebuild the data if it can read the sector on all of the remaining disks.  So you never see that the read on that sector completely failed until you actually have to be able to read it...

Edited by Frank1940
Link to comment

Agree that the SMART attributes are unfortunately not uniform among manufacturers (the meaning of individual attributes is very standard; but different manufacturers report different subsets of the available attributes) ... and this can make it a real pain to really understand what's going on.

 

One thing you can TRY if you have a disk with non-zero "current pending sectors" or "reported uncorrectable" errors is to rebuild this disk onto itself -- i.e. Stop the array; unassign the disk;  Start the array so it shows as missing; Stop the array and re-assign the disk; then Start the array and let it do a rebuild.   This will automatically cause those sectors marked as pending or uncorrectable to be reallocated -- which should change the count for those attributes to zero.

 

Note that it's "safer"  to do this with a new disk -- since if anything goes awry you still have the original disk to recover the data (or at least most of it).   If you really want to see if the disk can still be used, you could rebuild it on a new disk; and then rebuild it again on the original disk (which would give you the new disk as a backup for the data).

 

Link to comment

I'd also note that the attributes are not in isolation. A reported uncorrectable would likely also create a pending or reallocated sector. But we are not inside the disk to understand exactly what it does. It is quite possible that an uncorrectable situation would trigger a retry, and that on retry the read would succeed, and that the disk would turn around and remap the sector. That's probably what you'd want to happen.

 

Again, the attributes should not be looked at as individual conditions each requiring a spot solution. The attributes should be looked at to help make a holistic diagnosis of the drive. (The drive is great at spot solutions, and for a young drive with a weak spot on the media (extremely rare situation in my experience) it can handle a couple events and the drive can be restored to fully functioning status.) But if the drive is failing, reallocations, pending sectors, reported uncorrects, offline uncorrectables, runtime bad blocks, end to end errors, and other attributes start to rise. If you start to see them, it is a good idea to start looking for hard disks on sale.

 

The one attribute that will sometimes start to rise which is not an indication of a failing drive is the UDMA_CRC_ERROR_COUNT. That attribute is an indication of a problem with the connection between the drive and the controller. Could be a lot of things but normally a symptom of a bad or loose cable.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.