Bad Drive or Cable Issue?


Recommended Posts

On 8/16/2017 at 3:26 AM, johnnie.black said:

Possibly a cable issue, but disk dropped offline so there's no SMART report, so reboot and get new diags.

PS: disk13 does need a new cable since there are a lot of CRC errors.

 

Thanks. I replaced the cables on both drives. I see far few CRC errors in the latest log, but still seeing some. Can you tell me which drives are getting the CRC errors, now? And also, where you're pulling that info from? I'm only seeing ATA references, but no drive names/serials in the log

tower-diagnostics-20170819-1749.zip

Link to comment
  • 3 weeks later...
On 8/19/2017 at 6:00 PM, johnnie.black said:

In the syslog beginning you see which is which, ata2 is disk13, still problems there:

 


Aug 19 14:16:54 Tower kernel: ata2.00: ATA-9: ST8000AS0002-1NA17Z,             Z840S3KD, RT17, max UDMA/133

 

 

So I replaced the sata cables on both drives and the BadCRC errors disappeared completely for the last fefew weeks. Now, out of left field, the same drive just got errors again. No BadCRC errors, but I'm seeing I/O errors this time. Not sure what to try next... 

Help would be greatly appreciated

 

PS -- Since it's Parity2 that has errors, do I need to do anything drastic while problem solving? Or can I leave the Array alone and essentially operate with only 1 valid parity drive?

 

tower-diagnostics-20170905-1244.zip

Edited by newoski
Link to comment
17 minutes ago, johnnie.black said:

Parity2 dropped offline so there's no SMART report, reboot and post new diags.

 

Hmmmm. So after reboot, the Parity2 disk is completely MIA. It doesn't show up as red balled nor does it show up at all as an unassigned device -- in unassigned devices or after stopping Array...

 

Where to from here? Should I post a Diagnostics or reseat cables and reboot to try to get that disk visible again?

Link to comment

Weird. So I reseated and rebooted. Drive still didn't show up anywhere. I swapped drive slots and rebooted again. Now it shows up when the Array is stopped, but it doesn't show up in Unassigned Devices. Would Diagnostics help in this scenario or do I need to get it to show up in Unassigned Devices first? 

I'm stumped

Link to comment
  • 2 weeks later...
On 9/5/2017 at 5:26 PM, johnnie.black said:

SMART looks fine, since you already swapped slots re-sync parity and see if it holds up.

 

So the saga continues. After swapping parity slots back on the 5th, both parity drives were OK for about 2 weeks. Parity2 now redballed, which to me implies that it's something to do with that slot, not a hard drive. That said, I'm a bit uncertain how to proceed with regard to testing all the hardware in that chain.

 

1. Replace SATA cable to that drive and rebuild parity and see if issue is resolved, yes?

2. ?

3. ?

 

Diagnostics attached

tower-diagnostics-20170919-1220.zip

Link to comment

So, my system has been rock solid until just the other day... had a drive red X on me.  Powered down, jiggled (technical term) the SATA cables in their respective connectors, powered on and had UnRAID rebuild they array.  All is now well.

 

Just wondering if this is a common problem?  Is there a specific brand of high quality SATA cables people recommend?

 

While I do try to do backups regularly, I am going to install a 2nd parity drive... just for added peace of mind during array rebuilds, that you aren't solely dependent on a single parity drive.

 

Thoughts?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.