Throttle parity sync/check according to I/O


Recommended Posts

I would like an option in the GUI to toggle the throttle, preferably be able to switch back and forth during a check. I have programs running on the unraid box itself that may cause the parity check to never go to full speed, and I'd like the option to starve them in exchange for a faster parity check completion.

Link to comment

i guess it is one of this 'issues' where ppl start arguing....

like for me, unraid is a black box (that is actually dark gray 19'' rack case - stoneage, gutted and modded) sitting in our basement, making a hellish amount of noise most of the time (left the commercial server fans in there, they are just highly effective). in it is (obviously) the hardware to run a unraidserver  ::) - at the moment v.5 / new gui. only real plugin for me is subsonic.

i see it as a highly configurable nas box. means, thx to not too shabby hardware, it has no issues with parity checks running. not even gets the mobo/ cpu/ ram anywhere close to having to work seriously.

 

than we have the 'pimp it out' user, where a full scale parity check can get the system to nearly crash/ full crash.

 

i just don't see the point! basically, case with plenty (at the moment 15) hdd slots was sitting around here useless. so got mobo and decent psu (gigabyte 970A-DS3 and OCZ 750W) for $150, used old amd athlon II and 8gb ram i had collecting dust here. so basically box running for $150 + hdds (which i add as needed - at the moment 30tb + parity + 250gb cache) all is good. perfect highend nas + subsonic.

everything else is handled by my htpc (it is actually more a pimped out audio pc - since i dont care really about movies etc) and the gaming rig for the kids.

 

all is good, i can 'sink' my money into a pimped out audio pc, all my files sit in the basement and no need to really throttle any kind of bs to make a parity check happen  8)

 

Link to comment

I would suggest that it would be better to put in place a means where addons can be controlled or suspended so that parity has a clear run.

 

unRAID proper is about secure and reliable data storage... addons are not unRAID proper.

 

In essence there is a case to be made that throttling the parity is against the fundamental unRAID mantra.

Link to comment

I would suggest that it would be better to put in place a means where addons can be controlled or suspended so that parity has a clear run.

 

unRAID proper is about secure and reliable data storage... addons are not unRAID proper.

 

In essence there is a case to be made that throttling the parity is against the fundamental unRAID mantra.

 

:) i think you put in a very short and clear form  what i tried to say  ;D

thx

 

i am fully with you, the main purpose of having a 'failsave' storage server is to keep your actual files as save as possible against hardware failure. everything else (let's call it end user apps) should have no home on your storage.

Link to comment

If it's done right, like the standard linux MD drivers, you can set a minimum rebuild/check speed and a maximum rebuild/check speed.

using those two numbers you could set what what works for your own system.

 

 

When I used to have a crash that started a resync, I would set the maximum to an amount that let me work.

When I was done, I would reset the maximum so that the rebuild could finish without my intervention.

Later revs of the kernel did this automatically.

 

 

Also using values from the /proc/mdcmd a plugin author can determine if a parity operation is in progress.

Perhaps Tom can add a set of external events for 'start of parity operation' 'end of parity operation'.

This way a plugin author can determine if an app should be shutdown and restarted for these operations.

An owner can also add the start/stop operations in their own go scripts.

Link to comment

I suppose this would be okay, although I'm of the "basic UnRAID" school of thought => I treat it as a NAS appliance providing reliable storage.    I fundamentally don't use it at all during a parity check; and certainly do NOT want a drive rebuild throttled at all, to minimize the "at risk" time.

 

I suspect, however, that for users like me, any throttling mechanism would simply never kick in -- so it would still perform at full speed.

 

Link to comment

As an appliance, I don't want to even think about whether a parity check is running or whether that will slow things down. Ideally this is all automatic and transparent to the user. Parity should be throttled as needed so that the end user can't even tell it's running. This is especially true when the end user is not the same as the administrator.

Link to comment

The problem with this approach is that it is entirely possible that the throttle would have to be in place all the time. As soon as you let high level application demands alter the low level unRAID core parity work you open a can of worms.

 

An there within lies the conceptual architecture problem with this. unRAID cannot be all things to all people by default.

 

 

The first compromise should be making sure parity work as far as possible happens when unRAID is used the least. Notice I specifically dont say evening as again that will not be a universal assumption. unRAID should track historical usage stats anyway and this could be one use for them.

 

Next unRAID should use these stats to produce health alarms. i.e. if it is too busy too often to a point where it is likely impacting general NAS performance tell the system admin that the server is over utilized.

 

Once we have these basics in place I suspect most of the resource conflict will have either gone away or can be attributed to addon over use or server under spec.

 

At this point only we should consider letting applications dictate RAID actions.

 

Letting anyone create addons and then letting these addons dictate how relible unRAID is will be a folly.

Link to comment

I can see both sides of this discussion. (or maybe 3 or 4 sides  :) )

For me, if unRAID is just a storage medium, its of limited value.

I use it as a media server...I have unRAID because I can watch/listen to the media stored on it. 

The end-use is more important than the storage.

 

As currently setup, I run a monthly parity check...starting late on Sunday night, and I simply plan for unRAID to be unusable for two days.

I'd be happy with a rate limiter of 50%...allowing unRAID to be always usable, even if the parity check took twice as long.

 

(I suppose the rate should be configurable, I don't know that 50% is the right number...but the idea is to balance

  • 'amount of time with all the disks spun up while performing parity check' but with the array essentially offline.

versus

  • 'a longer amount of time with all the disks spun up while performing parity check' and the array usable)

If someone is running an unRAID server that is constantly running other programs/plugins, their tradeoff is similar...

how many of their plugins are going to be usable while a parity check runs, versus how long will the parity check run?

A configurable throttle would allow everyone to make that decision on their own.

Link to comment
Once we have these basics in place I suspect most of the resource conflict will have either gone away or can be attributed to addon over use or server under spec.

 

With the 3T drives I now have a parity check is completing in <9 hours @ 95MBps. This is telling me the hardware doesn't really need upgrading. Yet I have always had to either pause the check to play a media file or just not use the server, both of which are not very acceptable.  This has been the same through many versions of unRAID, nothing added to a number of plug-ins and through different hard drives. Heck, I can't even playback from the cache while a parity check is running. The hardware is quite capable of doing all I need of it except for the parity check hitting some sort of bottleneck bad enough to affect media playback.

 

Link to comment

I am sympathetic but equally it is not unusual for arrays to run slow when degraded or building. This is not a new caveat specific to unRAID.

 

But what we are saying here is:

 

"i want to use my array rather than have it do required RAID stuff"

 

or

 

"i want RAID stuff to always take priority as data integrity is more important than using the array"

 

 

and the proposed solution to have the camps compromise is "i want to use my array and have all the RAID stuff run slow so i can get my data timely"

 

.... which to me is no different than saying "i don't really care about finding problem quickly just give me the data"

 

That is not a bad position to take its just not my interpretation of what unRAID has been about up to this point. I just feel sorry for the inevitable Murphys law where some poor soul looses data cause of this compromise.

 

 

This though is file share access by humans I am talking about. Addons are a whole differernt can of worms

 

Link to comment

unRAID is a data server.......not being capable of serving data due to a background process running is rather unacceptable. A parity check is not a data rebuild or a degraded array due to a failed drive. It's a simple background health check.

 

How can doing a monthly parity check and finding a problem an hour earlier make any difference when the problem could have existed for 29 days???

 

Also, there is no way to be notified of an issue during a parity check using stock unRAID so what difference does it make time wise unless you're sitting at a PC refreshing the main screen and syslog every 10 or 15 minutes during the check. If you do a no-correct check then the stock unRAID does nothing when a parity problem is found except log it. You won't know there is an issue unless you go look for it.

 

At the end of the day, it is a data server and NOT some kind of secure data storage vault. If you have critical data stored on the server then you NEED to have a backup of the data.

 

At least my check is now only 9 hours. Dale has his server down 2 days every month. That kind of unavailability is total BS.

 

Link to comment

unRAID is a data server.......not being capable of serving data due to a background process running is rather unacceptable. A parity check is not a data rebuild or a degraded array due to a failed drive. It's a simple background health check.

 

That is the reality of the type of RAID unRAID uses. You can obviously not use parity and not have the issue at the expense of no redundancy.

 

 

How can doing a monthly parity check and finding a problem an hour earlier make any difference when the problem could have existed for 29 days???

 

Statistically it makes very little difference. However to the person that loses data because of this hour it makes all the difference in the world.

 

Also, there is no way to be notified of an issue during a parity check using stock unRAID so what difference does it make time wise unless you're sitting at a PC refreshing the main screen and syslog every 10 or 15 minutes during the check. If you do a no-correct check then the stock unRAID does nothing when a parity problem is found except log it. You won't know there is an issue unless you go look for it.

Totally agree this is a MAJOR deficit of unRAID

 

At the end of the day, it is a data server and NOT some kind of secure data storage vault. If you have critical data stored on the server then you NEED to have a backup of the data.

 

 

It is a redundant data server using a type of redundancy that has a large number of advantages and a few disadvantage, this being one of them.

 

 

I think we are at the point of saying the same thing differernt ways over and over now.

 

So to summarise. I do not disagree this is a valid usage case but it do not like the fact we are for the first time considering a less redundant compromise solution by default. It may be the way to go and I believe I could be convinced that it is necessary for human interactions but I fundamentally disagree that addons should be able to illicit the same compromise. This is is especially true since addon dev is currently the "wild west". Until we find a way to eliminate this problem that I do not think this is a good idea at all.

Link to comment

... we are for the first time considering a less redundant compromise solution by default...

 

Throttling the parity check doesn't make it "less redundant" => it simply slows down the checks.    Considering that many folks do "non-correcting" checks anyway, it really make no difference, as the check isn't going to fix parity issues even if it finds them (I never do non-correcting checks ... there's little statistical reason to do so; and if there IS an error I want it FIXED !)

 

I really think for most this whole discussion is almost moot => a parity check is a relatively rare event; and most of us do them at times when the array isn't likely to be in use anyway (thus the desire to have them be as quick as possible).  But if they ran slower, it really wouldn't make much difference -- and if the throttling mechanism was dynamic, they wouldn't even run slower for most of us.

 

A more compelling issue is drive re-builds.  Since you're running "at risk" in this case, I definitely would not want these to be slowed down.    In fact, whenever possible, it's a good idea to make NO use of the array during a rebuild.

 

 

... since addon dev is currently the "wild west" ...

 

Agree !!  This is far more of a problem than whether or not parity checks are throttled !!

 

Link to comment

I am not sure there is a better term than "less redundant" to describe a lessening of the processes relating to redundancy but I admit it is easy to take the wrong way.

 

I think it is fundamentally wrong though to assume that a slower parity check is ok. The whole parity process should be the prime goal of unRAID.

 

I am less concerned with the specifics here than the paradigm shift it represents. In my experience one compromise sets precedence and leads to another.

Link to comment

For what my opinion is worth, I am in the camp that believes the most important part of unraid is the protection of its data.

 

However, perhaps giving the user a choice is a compromise to suit both parties. A priority list, where the user must choose (and in turn accept the consequence of that choice) to put a certain plugin above the value of parity check.

Link to comment

Geeze, just provide min and max throttling settings to allow it to be tuned. Throwing a fit claiming it's the start of the end of unRAID as we know it is being a little ridiculous. The question is about a parity check, not a parity build.

 

Does the stock unRAID even have a mechanism to allow periodic parity checks now? I haven't used a stock unRAID in so long I have no clue but I suspect maybe that unfinished left hanging 5.0 plug-in interface is the only thing that would have it. It seems odd to complain about the world ending due to throttling ruining unRAID if unRAID never does checks anyhow. In other words, your scheduled check is an "add-on".

 

 

Link to comment

..just provide...

 

Just makes it sound so easy... but yes this is probably the best way to go and have it by default mimicking the current behavior. Yet one more complication for new users we would have to live with, one more thing to support and one more thing to break.

 

The reason this thread became a bit more of a deal than immediately seemed was that all of a sudden we were like "sure slow down parity cehcking thats fine everyone will be happy with that and there's no down side". Not everyone is happy and there are plenty of downsides :)

 

Link to comment

There aren't plenty of downsides. There is 1 - the parity check happens to give errors due to a failing drive and you immediately get it swapped and rebuild right before (we're talking an hour or 2 before at the most) another drive fails.

 

If your parity checks are regularly finding errors without drive issues or without you fixing then issue then your array is suspect and you're just asking to lose data. You should NEVER normally have parity errors.

 

You're arguing like these parity checks are necessary to keep the parity correct which would allow a drive rebuild and that's simply not true.

 

Link to comment

It's certainly true that nothing suggested here has any impact on the integrity of the data => the computation of parity during normal operation wouldn't change at all.    Throttling the parity checks would simply mean a parity check would take longer -- the effective result might be that you learn about an error a bit later than you otherwise would have.

 

HOWEVER, that's more than just one downside => any time you add complexity to code, it increases the likelihood of some "bug" that may have unforeseen consequences.    Modifying the parity check process to include system monitoring and automatic throttling certainly adds complexity to that code.

 

As I've noted earlier, throttling should have zero impact on those of us who don't do other things during a parity check (it would simply run at full speed); and would also satisfy those who want it to automatically throttle down when other activities were running.    So I'm basically neutral on the concept -- but it is NOT "risk free" to implement it.

 

Link to comment

...

You're arguing like these parity checks are necessary to keep the parity correct which would allow a drive rebuild and that's simply not true.

 

No I am not.

 

We simply slipped into talking about parity checking but the proposal does not differentiate between sync and check.

 

...

Instead of pause/restart mechanism for parity check/sync, there should be code added that detects I/O on the array and throttles back any parity check/sync operation automatically.

 

 

Link to comment

I certainly agree that neither syncs nor drive rebuilds should be any slower than they have to be ==> any time you're running "at risk" that period should be minimized.

 

On the other hand, as long as you don't do other things on the array while those operations are in process, any properly implemented throttling code wouldn't slow things down anyway => so the user effectively controls whether or not they would happen at full speed.

 

Link to comment