48TB server, under 35 watts idle (now under 24 watts)


Recommended Posts

Diskspeed graph below, 75MB/s at 70% makes sense, half its speed, but the same speed or a little more in the beginning makes me believe I’m hitting some limitation, don’t have a clue what it is.

 

Parity check with these disks closely follows the graph.

DT01ACA100.png.5b65ffba9bac23debbdd289070747242.png

Link to comment
  • Replies 52
  • Created
  • Last Reply

Top Posters In This Topic

Yeah that graph definitely rules out my potential explanation for what you're seeing.  Interesting.

 

Your numbers using SSDs would seem to disprove any theory that it's the operating system, or motherboard, or disk controllers.  That leads me to start thinking along the lines of it having something to do with rotational latency.  But I'm way out of my areas of expertise on this, it's just the only thing I can think of so far that might prevent effective write speeds from scaling with increased drive *sequential* throughput.  With SSDs not suffering from rotational latency, they would exhibit effective write speeds much more reflective of their sequential throughput.  Again just reaching for *something* in the way of an explanation for what you're seeing, I could easily be way off.

 

In addition to trying various md_write_limit values, did you also try the so-called Turbo Write Mode (md_write_method = 1)?

Link to comment

With turbo write on I max out gigabit.

 

I also think it may have something to do with rotational latency, the 8TB Seagate has a very similar performance curve, so in theory it should perform about the same, but if rotational latency is a factor it should be noticeable slower, if I have time I’ll do some tests this weekend.

ST8000AS0002.png.297ae8636d4c30303e03961e63700e9f.png

Link to comment

With turbo write on I max out gigabit.

 

I also think it may have something to do with rotational latency, the 8TB Seagate has a very similar performance curve, so in theory it should perform about the same, but if rotational latency is a factor it should be noticeable slower, if I have time I’ll do some tests this weekend.

 

I shouldn't complain anymore. I get 140MB/sec average on my keeper drives and they are all %75 or more full. Once we get into the 8-12TB land it will just take too much to move that amount of data with the controllers we have today. I guess once that data is on that 8TB drive it is staying there. Period. Too bad we all can't afford some nice enterprise grade stuff.

 

Link to comment

I did some tests on my server with 8TB Seagates, did two copys:

 

to disk3 (38% full): ~60MB/s (disk speed at this position ~175MB/s)

to disk1 (97% full): ~50MB/s (disk speed at this position ~100MB/s)

 

Based on these and the Toshiba results I’m thinking that rotational latency does limit max write speed in Unraid and 5400/5900rpm disks max out at ~60MB/s and 7200rpm disks at ~80MB/s.

disk1.png.7e7e1d2c6915fbceeb6aa2c5b4810183.png

disk3.png.5c85aba71437ca2fd9bdd3b57522abd5.png

Link to comment

You're definitely on to something JB.

 

Once I hit a week straight of stable uptime on my new server (another day and a half to go), I'll be adding another 8TB drive, then I'll be in a position to test write speeds to an empty one of these drives.

 

I still wonder if motherboard/disk controllers might have something to do with it, not to the point of even slowing down SSDs, but enough to cause the rotational latency to be a problem, versus some other configuration that lets them be written to faster.  I guess we'll only know if that's the case if/when a counterexample is produced, where someone is able to write to these drives faster than ~60MB/s.  I'm hoping to be that counterexample, but am not really expecting to be.

 

Can you share details of the motherboard/disk controllers used in your test?

Link to comment

Toshiba tests were done on a Supermicro X9SCM-F using the onboard controller, the Seagate were on a HP N54L microserver also using the onboard controller, both in AHCI mode.

 

Note that both servers can write at 100MB/s+ using the same controller with turbo writes on, this mode need considerably more bandwidth than the normal writing mode, so I believe the controllers are not the reason for the limits.

 

Link to comment

Yeah it wouldn't be so much the raw throughput of the controllers (your SSD test confirmed that's not a problem) as some timing issues that, combined with the drives' rotational latency and unRAID's read-modify-write technique, result in the behavior you're seeing.

 

Back to the build: I hit a solid week of uptime with zero issues.  I'll be adding a 3-bay hot swap drive cage next; I anticipate about a 3-watt hit for that.

 

Just started a Parity Check, after first capturing the tunables I used last time in a script, so I can easily reapply them.  This is because I see md_write_limit being set to a different value in disk.cfg, and md_sync_thresh isn't even listed.  Five minutes in and I'm showing over 200MB/s, so far so good . . .

 

Luna-week.png.2b1754bcff73780e9acba5dc6c1bc853.png

Link to comment

UPDATE

 

Parity Check finished in 16 hours 10 minutes, within 3.5 minutes of the previous run.  I had increased the poll_spindown frequency from 1/1800 to 1/60, so that might be related.

 

On the power consumption front: starting with 29.0 watts idle, I replaced the 2x 16GB ECC UDIMMs with 4x 8GB ECC UDIMMs (I needed the 16GB ECC UDIMMs on another Avoton board).  Idle power consumption went up to 29.6 watts.  Then I added the three-bay hot-swap drive cage, idle power consumption went up to 30.8 watts.

 

So now I have nine-drive capacity, although I've yet to enable the other six SATA ports.

 

Here is the drive cage:

 

iStarUSA BPN-DE230SS-BLACK 2 x 5.25" to 3 x 3.5" SAS/SATA Trayless Hot-Swap Cage - OEM

 

http://www.newegg.com/Product/Product.aspx?Item=N82E16816215240

 

I'll be adding another data drive at some point.  The plan is that the parity drive and a warm spare drive will be in the hot-swap cage, plus one extra bay for a guest drive.  If/when dual parity becomes available, that warm spare drive will turn into the second parity drive.

Link to comment

That’s a good average speed but with some tuning you could probably shave an hour, these are from an HP N54L with 4 x 8TBs, before and after tuning:

 

16h46m13s – 132.5MB/s

15h01m14s – 148MB/s

 

Nice improvement for sure.

 

I actually typo'd that last post, my times are all in the 15-hour range . . . you can see my before-and-after-tuning improvement, where I pick up around half an hour (~142MB/s -> ~147MB/s)

Luna-hist.png.f0847f94e14d1de1f4ad8810877c9916.png

Link to comment

I actually typo'd that last post, my times are all in the 15-hour range . . . you can see my before-and-after-tuning improvement, where I pick up around half an hour (~142MB/s -> ~147MB/s)

 

OK, that’s very close to optimal for these disks, in that case you’re good.

Link to comment

UPDATE

 

Found that the Intel SATA ports support hotswap whereas the Marvell SATA ports don't appear to.

 

So enabled all the Marvell SATA ports, switched the six 8TB drives to all connect using those ports, and ran a Parity Check.

 

The Parity Check completed in one second less time that with the six 8TB drives connected via the Intel SATA ports.

 

Power consumption is at 32.1W idle now, ready to connect more drives via the three-bay hotswap cage.

Link to comment
  • 2 weeks later...

I swapped out the 4x 8GB ECC UDIMMs for 2x 16GB ECC UDIMMs . . . power consumption dropped back down under 32 watts.  Also upgraded to unRAID 6.1.9.

 

Connected the three-bay hotswap cage to the motherboard (had to wait for 18" versions of those StarTech SATA cables to get here, 12" was not enough).

 

Just plugged in a small drive, with the system powered up and the array started, to test the hotswap capability of the SATA ports.  No problem, drive recognized, doing a preclear on it now.

 

I'm at 96% capacity now so I'll be adding another data drive soon.  Once the dust settles on that, I'll be thinking about upgrading to 6.2.0, I see there's a beta20 release now; I have beta19 on another server and it's working well so far.

Link to comment
  • 3 weeks later...

Started seeing "kernel: BUG: unable to handle kernel paging request at ..." today in the log.  Every 17 or so minutes for several hours.

 

Rebooted, running Parity Check, log is clean so far, over an hour in.

 

Searches of the forums seem to indicate that this is a potential memory corruption problem.  And difficult to track down (that part I know, I develop software myself).

 

If anyone else has seen this or know what might be causing it, I'd be interested to hear about it.

 

In the meantime I'll be keeping an eye out for this showing up again.

Link to comment
  • 1 month later...

30+ days later and no sign of the apparent kernel bug.

 

Still holding steady at 32 watts (idle) for 40TB net capacity; the three hotswap bays remain empty.

 

I have 2TB unused capacity, so it will be a little while before I add another data drive . . . most likely I'll move to Dual Parity at that time.

 

Then adding a warm spare (as unused Cache drive) will round out the build: six data drives, two parity drives, one warm spare, with the data drives inside the chassis and the spare/parity drives in the hotswap bays.  I believe I'll be under 36 watts idle in that configuration, so I should be back under 3/4 watt per protected TB.

Link to comment
  • 3 months later...

UPDATE - Lots of changes:

 

increased RAM from 32GB to 64GB

increased protected capacity from 40TB to 48TB by adding 6th data drive

upgraded to unRAID 6.2.0-beta21 (will go to rc4 or later at my next power cycle)

added second Parity drive

added warm spare (as unused Cache drive)

 

Power consumption at idle hovers in the mid-34-watt range.  So easily under 35 watts.  Result is about 0.72 watts idle per protected terabyte.

 

I welcome people posting their capacity/power numbers here.  I understand that energy efficiency is not everyone's priority; still, if anyone has impressive numbers, I'd be interested to see them.

Link to comment
  • 11 months later...

Went to write some files to this server earlier today and it was unresponsive.  No video, wouldn't boot.  Motherboard looks to have failed.

 

I just finished moving the drives to a hastily-assembled system, using parts I had laying around.  No data loss as far as I can tell.  Got everything backed up and am running a parity check now.

 

By coincidence I had just ordered another motherboard, to start a build of an online backup (mirror) system for this one.  Looks like that new board (ASRock Rack C236 WSI) will instead replace this failed Avoton board.

 

Link to comment

I presume you tried another power supply ... "just in case" >:(

 

"No video" makes this unlikely -- but are you sure it's not just hanging due to a switch in the boot device (if it's attempting to boot from a hard drive instead of the USB flash drive it will generally hang with a blank screen.    This can happen after a power failure ... especially if the motherboard battery has failed.   But you should still be able to boot to the BIOS, so assuming you have a keyboard and display connected, the "no video" comment likely excludes this.   Just wanted to be sure you'd considered it.

 

Link to comment

Thanks for the advice Garycase.

 

I tested the power supply with a power-supply tester, so it's not 'obviously' failed, but could still be off just enough to prevent that board from booting.  Will circle back for that.

 

No video comes out at any time over the course of what would be the boot sequence.  Everything powers up (drives, fans) and LEDs on the motherboard come on, but no video.  That might be acceptable for now it if would finish the boot sequence, but no joy.

 

I also cycled through the DIMMs, one at a time in the primary DIMM slot, to rule out a DIMM failure.

 

I also reset the CMOS (by removing the battery for several minutes).  I'm using a UPS so should not have experienced any power loss.

 

There was a series of serious-looking errors in the log (I was 'tail -f'-ing the log in Emacs/Cygwin).  Something about I/O and locking I think.  Was hoping a reboot cleared it up, so I didn't save it.

Link to comment

New motherboard and processor is in place, working well so far.  WIll run a Parity Check soon.

 

Only have 4GB of non-ECC memory in there for now, but the change gave a substantial improvement in power consumption: 23.0 watts at idle, compared to mid-33's before.  *And* more processing power (i3-6100T).  That's over 2TB of single-parity-protected storage per idle watt.

 

It will creep up more towards 24 watts when I put more memory in.

Edited by bobkart
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.