RMA'd Seagate drives but getting more errors with replacements


Recommended Posts

I was using my Unraid server only for Plex. I started noticing that my Unraid was offline without me turning it off and I would just boot it back up again. After finally logging in and taking a look a notice that my Seagate drives were showing errors in the error tab and after trying to get them working one just became unusable. I put both drives in my windows PC and ran Seatools and they passed with flying colors but I decided to advanced RMA them anyway.

 

I received the replacement drives yesterday and promptly put them in my Unraid. I also added a 500GB Samsung SSD for a cache drive. One of my old drives(WD Red) still had my old data on it but I copied everything off onto my Windows PC before this so I changed the FS from xfs to brtfs and back to xfs to format it and start clean. I let the parity check run overnight and this morning started to copy files over through the windows share I created. 

 

Halfway through the first 100GB the first disk that it was writing to(ST4000VN000-2AH166_ZDH15K2P) started showing errors and Unraid disabled it. 

 

I tried running a smart test on the drive and it says 

Short INQUIRY response, skip product id
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

So I'm not sure what to do here. I haven't shipped the old disks back. Was planning on doing it today but I will wait a bit. 

 

Am I doing something wrong with setting things up? I thought I kept it pretty simple. 

 

Also, I don't think my cache disk was being used. I just stuck it in and told Unraid it was a cache disk. Do I need to do anything else to set it up?

 

I've attached my old diagnostics report from before the RMA if its useful. Also the current one has a date of today and is labeled NEW in the front.

NEW-tower-diagnostics-20170621-0800.zip

OLD-tower-diagnostics-20170608-1624.zip

Link to comment
2 minutes ago, SimpleMind said:

I started noticing that my Unraid was offline without me turning it off and I would just boot it back up again.

I can't think of a way that a bad drive would cause those symptoms.

 

We need more info about your hardware, and the condition of things when you noticed the server was offline.

 

Was the tower completely powered down? If not, was there an error message on the screen?

 

My immediate gut reaction is a bad power supply, but without more info that's a pretty useless guess.

  • Upvote 1
Link to comment

Also these were the errors I received when copying the files over:

Jun 21 07:05:52 Tower kernel: md: disk1 write error, sector=5870528136
Jun 21 07:05:52 Tower kernel: md: disk1 write error, sector=5870528144
Jun 21 07:05:52 Tower kernel: md: disk1 write error, sector=5870528152
Jun 21 07:05:52 Tower kernel: md: disk1 write error, sector=5870528160

 

Link to comment

Thanks jonnie.black. I've never updated any of the firmware or BIOS on here. I just realized I could have set up BMC. I'm going to work on all this  and update here when I'm finished. 

 

15 minutes ago, johnnie.black said:

Update the firmware of the Marvell controller to see if it help, they are a know problem on those boards.

 

Link to comment

I set up BMC and I've updated 3 things: BIOS, BMC and the Marvell SE 9230 FW. 

 

Link to my mobo website: http://www.asrockrack.com/general/productdetail.asp?Model=C2750D4I#Download

 

From there I downloaded BMC 00.30.00, BIOS 2.90 and there is a link at the top to Marvell SE 9230 FW. I created a DOS bootable USB and used that to update the FW successfully.

 

I just realized this motherboard has a million SATA ports. Most are plugged into SATA 3 but I'm not sure which controller.

 

What should be my next step now? Is there an easy way to blow the drive configuration away and start over? I don't have anything copied on there I care about. 

 

I've attached some pictures of the box and the firmware update.

2017-06-21 10.45.52 (Large).jpg

2017-06-21 10.45.48 (Large).jpg

2017-06-21 10.46.02 (Large).jpg

firmware.png

Edited by SimpleMind
Link to comment
3 minutes ago, SimpleMind said:

What should be my next step now? Is there an easy way to blow the drive configuration away and start over? I don't have anything copied on there I care about. 

 

You can make unRAID forget all assignments by going to Tools and clicking on New Config, but this won't delete any data, if you want to do that easiest way is change to a different filesystem, format, change back to the intended filesystem and format one more time.

  • Upvote 1
Link to comment

The blue cable by the exhaust fan is connected to the Intel controller. The 2 closest to the memory are sata3 the other 4 are sata2. The cables on the right by the drive cage are all Marvell. The 2 closest to the memory are the 9172 the bottom 4 are the 9230 and all are sata3.

 

I noticed from the pictures that you have the oem fans. Your board won't be able to controller those. They will just run at full speed. I also noticed the dust. If you use the magnetic filters you'll have zero. With this case you'll also need to cut a piece of cardboard or plastic that is as long as the drive cage and a little wider than the distance from the motherboard to the drive cage. Otherwise your hard drives will run hot.

 

As for your problems the firmware updates should fix everything. But my first reaction was power supply. Although I had 7-3.5" and 4-2.5" drives in mine with the same power supply. But recently I replace a drive with a 4TB Seagate and soon after my server would randomly freeze every day. It might have been something else I was doing with dockers. But I ordered a Corsair 600 sfx and put the 450 in my backup server. Both are running fine now.

 

 

 

 

Link to comment
Thanks for taking a look dmacias. Do you think I should switch up what everything is cabled to?
 
I've never heard of magnetic filters! Should I buy three of these?
 
Would this be a good PSU to replace my current one with?

Your case should have come with 1 side filter that covered the 2-120mm fans and one on top for the power supply intake, similar to the link you posted. Silverstone probably has replacements or might send you some it you didn't get any.

That power supply is a full size you would need a sfx. This is the one I got. https://www.amazon.com/gp/aw/d/B01CGI5M24/ref=mp_s_a_1_1?ie=UTF8&qid=1498069747&sr=8-1&pi=AC_SX236_SY340_FMwebp_QL65&keywords=corsair+sfx+600&dpPl=1&dpID=51KMMloK5gL&ref=plSrch

But I would wait and see if your board still keeps shutting off. I had that same power supply with more drives and it worked fine. It's still working in my backup server. Maybe it was just a power outage/flicker. If you don't have a UPS, that would be a better investment. If it's just freezing you can log in to the ipmi and load up the console and see if there some errors or hook up a monitor.
  • Upvote 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.