(SOLVED) Super slow data rebuild


Recommended Posts

Hello!

 

System specs:

AMD Sempron 140 AM3 CPU

ASUS M4A785T-M/CSM Mobo

Corsair 2GB DDR3-1333 Ram

Corsair TX650 PSU

Unraid 5.0-rc12 - monthly parity checks

Parity: 2GB WD Green

Cache: 500GB Hitachi

Array: 3x2GB WD Green, 1x500GB WD Blue

 

I just recently transplanted my setup into a bigger case (Antec 300 > Zalman MS800) as well as decided to use some 5x3 cages (Supermicro CSE-M35T-1).

Only have one powered for now. Ran into some issues. The drive in question (a 2TB Green - sdf) was not originally being detected by unraid. Reseated SATA cables.

AHCI enabled in bios (Sata 1-4, Sata 5-6). Finally detected and it shows as orange.

 

Rebuild is going at ~4MB/s. I'm not quite sure what I did wrong. It can't be this slow to rebuild a 2TB drive.

 

1) Is it the 5x3 cage causing the slow speeds?

2) What could have caused the drive to have its contents invalid - amber/yellow sphere?

3) If the 2TB Green - sdf is dying, can I simply pop in another 2TB in its place and have a go at rebuilding again?

 

Any tips or hints will be greatly appreciated. Thanks! Attached are my syslog and smart status on the drive in question.

smart-report-sdf.txt

syslog-2013-05-13.txt

Link to comment

Nothing looks unusual, let the re-construction continue.  /dev/sdf looks fine.

 

It is reading all the disks and writing the one being re-construted.  You'll get better performance if you disable any add-on processes that also access the disks.  Turn off cache_dirs (if you have it installed) by typing

cache_dirs -q

 

Link to comment

Thanks for the quick reply! Forgot to mention that I've already reverted to a stock go file and temporarily renamed my plug-ins folders to something else.

Good to know that this seems normal. Rebuild speed is up to 5-6 MB/s. Could be better. Will update if anything changes.

Link to comment

Your rebuild speed will increase a good bit when it crosses the 500MB point and the 500GB WD Blue is no longer involved.  [actually, that may have already happened, since it's bumped up 50% (4MB to 6MB) since you originally posted.]

 

Also, as Joe noted, if you have ANY other activity on the array (e.g. streaming a video) it will slow everything down a LOT (especially during a rebuild, as that activity would cause EVERY disk to thrash).

Link to comment

Untitled.jpg

 

I've been monitoring the speed of the rebuild and it has been up to 10MB/s now. However, just now I noticed all activity halted. Here's an updated syslog. Disk3 (dev/sdf) is throwing errors and is now RED.

 

Under maintenance mode, I took the array offline and powered it down. About to head off to work (running late ::)). What should I do now? Replace Disk3 with another 2TB?

 

Edit: When I get home, I'm thinking to stop the array, unassign dev/sdf, start the array, stop the array, try again and reassign dev/sdf and commence rebuilding again. Am I right to assume the smart report on the drive is OK?

syslog-may132013-update1.txt

Link to comment

Same drive - rebuild failed.

Sata cable replaced, Same drive - rebuild failed.

Sata cable replaced, New 2TB drive - rebuild going at 50MB/s right off the bat!

 

Untitled3.jpg

 

Looks like it was the drive. Hope the rebuild goes through without any problems. Thanks again everyone for the help! RMA the drive ASAP.

Edit: But first, preclearing the new drive.

Link to comment

Uh oh. Experiencing the same symptoms. Slow rebuild and unresponsive UI.

 

Untitled4.jpg

 

Getting this:

 

May 14 04:00:01 Tower logger: mover started
May 14 04:00:01 Tower logger: skipping */
May 14 04:00:01 Tower logger: mover finished
May 14 06:00:01 Tower logger: mover started
May 14 06:00:01 Tower logger: skipping */
May 14 06:00:01 Tower logger: mover finished
May 14 07:52:36 Tower kernel: mdcmd (22): spindown 2
May 14 08:00:01 Tower logger: mover started
May 14 08:00:01 Tower logger: skipping */
May 14 08:00:01 Tower logger: mover finished
May 14 10:00:01 Tower logger: mover started
May 14 10:00:01 Tower logger: skipping */
May 14 10:00:01 Tower logger: mover finished
May 14 12:00:01 Tower logger: mover started
May 14 12:00:01 Tower logger: skipping */
May 14 12:00:01 Tower logger: mover finished
May 14 13:12:43 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
May 14 13:12:43 Tower kernel: ata1.00: failed command: READ DMA EXT
May 14 13:12:43 Tower kernel: ata1.00: cmd 25/00:00:c7:98:02/00:04:7f:00:00/e0 tag 0 dma 524288 in
May 14 13:12:43 Tower kernel:          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
May 14 13:12:43 Tower kernel: ata1.00: status: { DRDY }
May 14 13:12:43 Tower kernel: ata1: hard resetting link
May 14 13:12:53 Tower kernel: ata1: softreset failed (device not ready)
May 14 13:12:53 Tower kernel: ata1: hard resetting link
May 14 13:13:02 Tower kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 14 13:13:02 Tower kernel: ata1.00: configured for UDMA/133
May 14 13:13:02 Tower kernel: ata1: EH complete
May 14 13:13:33 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
May 14 13:13:33 Tower kernel: ata1.00: failed command: READ DMA EXT
May 14 13:13:33 Tower kernel: ata1.00: cmd 25/00:00:c7:a4:02/00:04:7f:00:00/e0 tag 0 dma 524288 in
May 14 13:13:33 Tower kernel:          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
May 14 13:13:33 Tower kernel: ata1.00: status: { DRDY }
May 14 13:13:33 Tower kernel: ata1: hard resetting link
May 14 13:13:43 Tower kernel: ata1: softreset failed (device not ready)
May 14 13:13:43 Tower kernel: ata1: hard resetting link
May 14 13:13:53 Tower kernel: ata1: softreset failed (device not ready)
May 14 13:13:53 Tower kernel: ata1: hard resetting link
May 14 13:14:03 Tower kernel: ata1: link is slow to respond, please be patient (ready=0)
May 14 13:14:09 Tower kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 14 13:14:09 Tower kernel: ata1.00: configured for UDMA/133
May 14 13:14:09 Tower kernel: ata1: EH complete

 

edit: wrong screenie. Anything odd in the log snippet? Heading to work. Hopefully just a hiccup.

Link to comment

Ahh -- it seems your mover script is starting up; and doing this DURING a rebuild causes a HUGE amount of disk thrashing across all disks; which will slow things to a CRAWL ... exactly what you're seeing.

 

It shouldn't actually cause errors => as you can see the "Errors" column is all zeroes;  but it is NOT a good idea to let this run during a rebuild.

 

I don't use a cache drive, so I'm not sure exactly what options you have available to you "real time".    If you can simply disable the mover, that's what I'd do.  You simply do not want ANY activity on the array during a rebuild EXCEPT the rebuild itself .. no reads, no writes, no "mover", etc.    ANY of those will cause excessive thrashing and simply slow everything to a crawl.

 

If you can't turn off the mover, you have two choices:

(a)  Just hope the mover doesn't have too much to "move", and that it will finish soon (after which the rebuild should pick up speed again);

(b)  Cancel the rebuild;  stop the array;  unassign the cache drive; and then start the array and start the rebuild again.  This time, with no cache, there won't be any "moves" started  :)

 

Hopefully you can simply turn off the mover.

Link to comment

Thank you for the reply. I'll try turning off the mover script or unassign the cache drive all together.

 

Could it really be cause though? There's nothing to be moved. Cache is empty. No one else is accessing the shares. Just the fact the script is being invoked (and skipped) during rebuild can cause some hiccups?

 

I'll take a look when I get home. Thanks again :)

Link to comment

Could it really be cause though? There's nothing to be moved.

 

The cache does appear to be nearly empty, but just the fact that the mover script is running is going to cause disk accesses ... which will cause a lot of thrashing.

 

On the other hand, it shouldn't take too long to decide there's nothing to do; so mover should terminate ... and the rebuild speed should then go back up.  See what it looks like when you get home -- the picture you posted is already 55% done;  if it picks up speed again, it may very well be best to simply "wait it out" and let the rebuild finish.    But in the future, before doing any "all disks involved" activity [rebuild, parity check] I'd disable the mover and any other script that causes disk access [speaking of which, do you have Cache_Dirs running?].

 

Link to comment

I just got home. Rebuild was complete. Was going to do a parity check afterwards but I noticed disk1 had hundreds of thousands of errors. I kid you not. So I took the array offline. Now disk1 is red balled.

Goodness, going to have to RMA this one too?

Syslog was huge. took a snippet. mostly write errors. https://dl.dropboxusercontent.com/u/5332835/unraid/syslog-may14.txt

 

What should I do now? Try rebuilding disk1 and OR swap it out with another new drive? (+ parity check afterwards)

 

edit: formatting

edit: Decided to try rebuilding disk1. Will update if anything changes.

 

 

 

Link to comment

Disk 1 !!??    But you just did a rebuild on Disk 3 -- right?

 

Was the Disk 3 rebuild successful?  (i.e. no indicated errors?)

 

Something definitely sounds "fishy" here ... but I suppose rebuilding Disk 1 is okay to try, as long as the parity still shows that it's good.

 

Are Disk 1 and Disk 3 by any chance on the same power splitter?

 

Link to comment

I know you mentioned you disabled plugins, but some consider simpefeatures as not a plugin. If you have it enabled and opened in your browser, it will slow data rebuild and parity check. That is because is does hard drive polling in background, slowing down the process altogether. Just close it in your browser.

Link to comment

That's a good option to consider. I was originally using a CX430. Upped to an extra TX650 v1 lying around. Maybe ill try a different line of molex connectors next time around.

 

Rebuild is still ongoing...at a crawl. No write errors yet.  If simplefeatures is part of unmenu, then I'm def. sure its disabled. My go file is stock and /config and /extra folders have been temporarily renamed to something else. Thanks for the tip.

 

My plan is to, assuming rebuild is complete, add a couple of drives and preclear them first via a supermicro sas2lp card and breakout cable. Then replace the older drives with the precleared ones. Then preclear the older drives to further stress test them. 500gb blue will be out the door and replaced with another new 2tb.

 

Sent from my SAMSUNG-SGH-I727 using Tapatalk 2

 

Link to comment

That's a good option to consider. I was originally using a CX430. Upped to an extra TX650 v1 lying around. Maybe ill try a different line of molex connectors next time around.

 

Rebuild is still ongoing...at a crawl. No write errors yet.  If simplefeatures is part of unmenu, then I'm def. sure its disabled. My go file is stock and /config and /extra folders have been temporarily renamed to something else. Thanks for the tip.

 

My plan is to, assuming rebuild is complete, add a couple of drives and preclear them first via a supermicro sas2lp card and breakout cable. Then replace the older drives with the precleared ones. Then preclear the older drives to further stress test them. 500gb blue will be out the door and replaced with another new 2tb.

 

Sent from my SAMSUNG-SGH-I727 using Tapatalk 2

SimpleFeatures is NOT part of unMENU.

To disable it (and everything else it installs) you'll need to rename

/boot/plugins

and

/boot/config/plugins

in addition to what you've already done, and THEN reboot. (you've apparently already renamed /boot/extra and reverted to the stock /boot/config/go file.)

 

Joe L.

 

Link to comment

That's a good option to consider. I was originally using a CX430. Upped to an extra TX650 v1 lying around. Maybe ill try a different line of molex connectors next time around.

 

Rebuild is still ongoing...at a crawl. No write errors yet.  If simplefeatures is part of unmenu, then I'm def. sure its disabled. My go file is stock and /config and /extra folders have been temporarily renamed to something else. Thanks for the tip.

 

My plan is to, assuming rebuild is complete, add a couple of drives and preclear them first via a supermicro sas2lp card and breakout cable. Then replace the older drives with the precleared ones. Then preclear the older drives to further stress test them. 500gb blue will be out the door and replaced with another new 2tb.

 

Sent from my SAMSUNG-SGH-I727 using Tapatalk 2

Did you actually rename the config folder as you stated, or did you mean the config/plugins folder? The config folder is very important and must not be renamed.

Link to comment

rebuild still ongoing. 6mb/s at about 81%.  No errors but still somewhat slow. *knock on wood*

Thanks for the heads up on SimpleFeatures. So it's another plug-in all together? Well if that's the case then I'm 100% sure I don't have it. I only had cache_dirs and unMenu as plug-ins.

 

Oops. I meant config/plugins and /extra, not config/ itself. Sorry!

Link to comment

So far I've noticed the rebuild slows down at a crawl when the 500gb is spinning. When all the reads are done on the Blue,  speed picks up dramatically. The faulty green drive also led to slower speeds.

 

Also swapped out the faulty drive. I tried preclearing the drive for the hell of it but never got far...only one hour in. RMAing it.

 

Preclearing two new 2TB greens. One to replace the 500gb, the other as another drive in the array.

 

Edit: rebuild done. now parity checking while preclearing two other drives. I will mark this as solved unless something changes.

 

Sent from my SAMSUNG-SGH-I727 using Tapatalk 2

 

Link to comment
  • 3 years later...
On 5/13/2013 at 1:22 PM, Joe L. said:

Nothing looks unusual, let the re-construction continue.  /dev/sdf looks fine.

 

It is reading all the disks and writing the one being re-construted.  You'll get better performance if you disable any add-on processes that also access the disks.  Turn off cache_dirs (if you have it installed) by typing

cache_dirs -q

 

Thanks for that, just disabled it from the plugin UI, rebuild went from 46MB/s to 140MB/s.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.