Sync errors on parity check [SOLVED]


Recommended Posts

54 minutes ago, bjp999 said:

Here are the controllers I tested back in 2011.

 

SuperMicro C2SEE-O MB

IBM Br10i controller

Hong Kong cheap PCIe x1 controller

Adaptec 1430sa controller

SuperMicro AOC-SASLP-MV8

SuperMicro AOC-SAS2LP-MV8

Promise TX2 controller (2 eSata ports)

ASUS P5B VM DO MB

JMicron JMB383 (on the ASUS MB)

 

As mentioned above, only the BR10i failed to recognize >2T (>2,2T drives to be more precise) drives.

That's all good to know...When I set up my server I thought the Supermicro cards would be the best bet with my intent on upgrading my motherboard to a supermicro type in the future (which I recently did)...oh well Dell 310's it will be and hopefully my problems will do away...

Link to comment

I also had a SuperMirco SAS2LP and SASLP (2 cards) in my server for years and they worked great. But guess time marches on and new kernels revealed problems with virtualization technology. .

 

I would suggest looking at the LSI SAS9201-xx cards. They are pure HBA, so no need to flash them. Price of the 9201-8i are similar to the Dell on eBay.

 

The 9201-16e is a very good deal. It has 4 SAS connectors (16 drives). Only drawback is that they are external, and you'd have to feed the cables back into the case. But putting one of those in a PCIe 2.0 x8 slot is equivalent to two -8i cards for ~$50.They do use a different connector (8088 vs 8087), so you'd not be able to reuse the ones from the SuperMicro.

  • Upvote 1
Link to comment
5 hours ago, bjp999 said:

I also had a SuperMirco SAS2LP and SASLP (2 cards) in my server for years and they worked great. But guess time marches on and new kernels revealed problems with virtualization technology. .

 

I would suggest looking at the LSI SAS9201-xx cards. They are pure HBA, so no need to flash them. Price of the 9201-8i are similar to the Dell on eBay.

 

The 9201-16e is a very good deal. It has 4 SAS connectors (16 drives). Only drawback is that they are external, and you'd have to feed the cables back into the case. But putting one of those in a PCIe 2.0 x8 slot is equivalent to two -8i cards for ~$50.They do use a different connector (8088 vs 8087), so you'd not be able to reuse the ones from the SuperMicro.

Maybe that's a good plan. I think it's just one of my sas2lp cards that's the issue. If I replace it with a 9201-16e and a couple of 8088 - sata cables, I can add the other couple of cables as I add more hdds

Link to comment
5 hours ago, HellDiverUK said:

The SAS2LP still works fine in unRAID.  Of course, a failing card will cause all sorts of grief, but don't tar all SAS2LP cards with the one brush.

 

I have two unRAID boxes at work as backup destinations, both use SAS2LP cards, and both work fine.

True, I think it is just one of the cards being an issue rather than the cards in general. It's frustrating doing an upgrade and realising that it's the card I've bought in advance and has been sitting on a shelf awaiting the upgrade that is causing the problem...hey ho, that's life I suppose! 

 

Link to comment
  • 1 month later...

OK,

 

so it has been a bit of time since I last updated this query - I ordered a Dell 9201-16e card from the US as a working pull and ordered the new cables to accompany the card. It took a couple of weeks for the card to arrive and, unfortunately, by the time it did - I was suffering with a severe deep tissue infection in my leg as a reaction to being bitten by something. I've been on bed rest for 5 weeks now and am pretty much recovered (although I rattle with the amount of pills I have had to swallow over the last 5 weeks).

 

So I removed the old SAS2LP cards, installed my lovely new 9201-16e card and hey presto - absolutely nothing connected to the new card is detected by unRAID. I have tried double checking the cables, changing the cables around, changing the card to different ports and nothing works. I eventually replaced the SAS2LP card back in and am now back to where i was 6 weeks ago. Can't think of anything more I can do with the card - don't have any other way to test it. I've ordered an LSI 9201-16e card from a different vendor to see if I was just unlucky and the card was dead. Unless anyone has any better ideas?

 

I shall keep this updated with the experience with the new card when it arrives in the next week or so...  

Link to comment

I am running an LSI SAS9201-16e. Works fine for me. Assuming you got the SFF-8088 cables, and understand you have to pull the little round circular piece out slightly to insert the cable and have it lock into place.

 

Was the card recognized during boot up but no drives detected? Or was the card totally not recognized?

 

Not saying it can't happen, but I cannot remember a similar story of a dead controller card. 

Link to comment
20 hours ago, bjp999 said:

I am running an LSI SAS9201-16e. Works fine for me. Assuming you got the SFF-8088 cables, and understand you have to pull the little round circular piece out slightly to insert the cable and have it lock into place.

 

Was the card recognized during boot up but no drives detected? Or was the card totally not recognized?

 

Not saying it can't happen, but I cannot remember a similar story of a dead controller card. 

Yep, 

 

bought the SFF8088 to SATA to replace the SFF8087 to SATA cables I had for my other cards and I made sure they were locked in place in each port. I also bought the cables from 3 different suppliers to ensure I didn't get a bad batch!

 

Card was not recognised I believe and all drives connected to it were missing...

Link to comment
  • 2 weeks later...

Well all you helpful chaps that have contributed before, I am continuing this thread as my replacement card arrived yesterday. AS above, this was an LSI SAS920-16e. I opened my server and removed the second of my SAS2LP cards. I then moved the original SAS2LP card into the PCIe x8 slot and put the new SAS920-16e into the PCIe x16 slot. I connected two SFF8088>4 SATA cables intot he first two ports of my new card and connected them to the devices that were connected to my 2nd SAS2LP card.

 

Booted the machine and yes, the new LSI card works out of the box exactly as I hoped. The only issue is that I still have a redball on Disk 2. (My relatively new 8Tb HDD).

 

My server is connected like this 

Parity & Disk 1 - mobo

Disk 2 - Disk 5 - 16e

Disk 6 - Disk 8 - mobo

Disk 9 - Disk 16 - 2LP

Disk 17 - Disk 19 - 16e

2No cache disks - mobo

Leaving 1No spare on one of the ports of the 16e and 2No free ports on the 16e for future drives.

 

Attached is the diagnostics form this morning. I stopped a parity check so I could write this and hopefully get some more advice. Should I try Disk 2 on the mobo for example?

tower-diagnostics-20170719-0657.zip

Link to comment
13 minutes ago, itimpi said:

reset the array configuration.

And lose all changes that have been made to the red balled slot. Keep in mind that normally a red balled disk slot is still available to read and write, thanks to parity protection. If you reset the array instead of rebuilding the disk, all activity to that slot since the red ball occurred will be lost.

 

The usual correct action is to rebuild the disk.

Link to comment
28 minutes ago, itimpi said:

Once a disk has been ‘red-balled’ the only way to remove this status is to either rebuild the disk or reset the array configuration.

 

12 minutes ago, jonathanm said:

And lose all changes that have been made to the red balled slot. Keep in mind that normally a red balled disk slot is still available to read and write, thanks to parity protection. If you reset the array instead of rebuilding the disk, all activity to that slot since the red ball occurred will be lost.

 

The usual correct action is to rebuild the disk.

The issue I have, from further up the post is that I don't believe it necessarily had an issue - it was the controller that had the issue rather than the disk - hence the hardware changes I made. The question now is that it is connected to the new controller but still shows an issue - is this a hangover from the controller problem or is this actually a problem with the disk?

Link to comment

Whether or not the disk has a problem isn't relevant to rebuilding. You will need to rebuild that slot if you wish to retain any changes that have occurred since the red ball event.

 

You can rebuild on to the same disk if you believe the disk is OK, or you can rebuild to a new disk if it's not.

 

Try to separate the data slot from the physical disk in your mind, because what's currently on that data slot is NOT what is on the physical disk right now, whether the drive is OK or not.

 

11 minutes ago, aspdend said:

The question now is that it is connected to the new controller but still shows an issue - is this a hangover from the controller problem or is this actually a problem with the disk?

A red ball is triggered when a write to a drive fails, regardless of what caused the failure, the drive, the interface, the cable, whatever. After the slot is red balled, all further writes are done to the virtual drive that is mathematically constructed from all the rest of your drives. Until you rebuild the drive or tell unraid to forget the changes to that slot and rebuild parity from what is actually on the physical drive the red ball will remain.

 

Think of the red ball as an indicator of whether all your drives are in sync with parity, not an error on that physical drive. What caused that drive to get out of sync with parity could be any number of things, but until it's synced back up, it will remain red.

 

Link to comment
1 hour ago, jonathanm said:

Whether or not the disk has a problem isn't relevant to rebuilding. You will need to rebuild that slot if you wish to retain any changes that have occurred since the red ball event.

 

You can rebuild on to the same disk if you believe the disk is OK, or you can rebuild to a new disk if it's not.

 

Try to separate the data slot from the physical disk in your mind, because what's currently on that data slot is NOT what is on the physical disk right now, whether the drive is OK or not.

 

A red ball is triggered when a write to a drive fails, regardless of what caused the failure, the drive, the interface, the cable, whatever. After the slot is red balled, all further writes are done to the virtual drive that is mathematically constructed from all the rest of your drives. Until you rebuild the drive or tell unraid to forget the changes to that slot and rebuild parity from what is actually on the physical drive the red ball will remain.

 

Think of the red ball as an indicator of whether all your drives are in sync with parity, not an error on that physical drive. What caused that drive to get out of sync with parity could be any number of things, but until it's synced back up, it will remain red.

 

OK - that makes a lot of sense to me and pushes my understanding forward (I hope). SO, if I get this right, then I should stop the array, unassign disk 2, start the array, stop the array , assign disk 2 as disk 2 and then rebuild disk 2 onto the existing drive. This way parity will be maintained and no data would be lost and the array would be happy again *(assuming there are no underlying issues with Disk 2 of course).

Link to comment
17 hours ago, jonathanm said:

Yes, like you said, assuming that physical disk is ok. I have not looked at your diagnostics, is the current SMART report for that drive clean? Has it recently passed a long SMART test? You may want to manually run a long SMART test to be sure if not. Those typically take a few hours.

Well it was working fine until I added the second SAS2LP card and looking on unraid it has a healthy pass on the smart summary. I will do a long SMART test tonight to make sure though!

 

I started the extended SMART test late last night and it was still at 70% through when I had to leave this morning so will be able to see the results tonight when I get home...

Edited by aspdend
update
Link to comment

OK, so the SMART tests completed and my disk 2 passed with flying colours as expected. I therefore stopped the array, unassigned disk 2, started the array, stopped the array, reassigned disk 2 and commenced the data rebuild onto disk 2. Current progress should have it finish late tonight and then I can see what happens

Link to comment

Well, a further update for anyone still reading this!

 

Rebuilt Disk 2 - all fine and disk 2 is happy and back in the array. However, during the process, 2 other disks flagged up read errors (disk 12 and disk 18). As this disk (12) then redballed, but again I am confident in this disk, I pulled the other SAS2LP controller from my machine that I thought was running fine and re-cabled all disks with the new LSI 9201-16e. Rebooted and then did a data re-build on disk 12. Which all went fine (except for read errors on disk 18. I am now currently in the process of removing the information from disk 18 so I can remove it from the array. Out of interest - the reason I changed my motherboard and upgraded my array was to use a second box to house some additional disks so I can expand the array (looking to the future when I can get some rack mounted kit). The disks in the second box are generally older units that I have had in a media PC then used in the original UnRAID box before they were retired when disks were upgraded. So they have been knocking around for about 7 years and had some serious usage in that time and disk 18 is one of those disks, so I am confident that it will be no good and I will just get rid of it. Hopefully, then I can finally be error free. I wil see after the parity check I shall run tonight...

Link to comment

OK, as a final note on this issue - Disk 12 had some issues after the data rebuild as it became unmountable. I ran xfs_repair on it and this seemed to resolve all the problems. I did a parity check and it flagged up some errors on disk 1 this time...seems like the legacy of the SAS2LP controllers are not going away easily for me...Anyway, I ran xfs_repair on disk 1 after the parity check and all seemed fine. Ran another parity check and all is looking good - green balls across the line! Yes.

 

However, after a reboot I noticed I couldn't access the shares form my network..I tried for a while thinking it was a windows issue - but surprisingly, my shares have all disappeared now! Anyway - I will close this thread as solved and start a new one as I presume the issues are not necessarily connected with the SAS2LP controllers that are no longer in my system.

 

A big thank you to the ever friendly and helpful community here that is one of the reasons I went with UnRaid over some of the other choices and thanks to all that helped steer a newbie like me and the problems with my array!

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.