johnnie.black

[Feature Request] Perform a clean shutdown if disk reaches critical temp

40 posts in this topic

That is indeed a scenario which may happen if the disk is actively in use. Though it will not immediately spin down the disk a next time unless the temperature starts to climb. 

You'd be looking at a spin down followed by an immediate spin up (or a slight delay depending upon when the next read / write actually takes place.  Odds are that the drive would never actually spin down long enough to let the temp drop, and repeated spin ups / downs / ups / downs can't at all be good for the drive.

 

The temperature doesn't need to drop, a subsequent spin down command will only happen when the temperature went up, though this may lead to a never ending story ...

0

Share this post


Link to post
Share on other sites

I think we need to be clear that we are only catering for environmental problems and not inherent system design issues. i.e. its too hot a day and not .... if i spin up 10 disks any day the system overheats.

 

Once we accept that then it is obvious that once something overheats it will likely be hours before the room cools again. Since we dont know air temp universally or reliably we cant use this factor so we have to play it safe and spin down forever until manual intervention comes.

0

Share this post


Link to post
Share on other sites

To me, the easiest solution is provide one more setting on the Display Settings page, immediately after the current "Critical disk temperature threshold", for "Shutdown disk temperature threshold".  You really don't need a delay or any other option, because if it reaches the shutdown temp, then you already have received the Warning temp notification, and then the Critical temp notification.  If a temp reaches the Shutdown temp, powerdown is called.  I would default it to a high number like 80, so it does not surprise anyone, and I think we can all agree a temp of 80 should shut the server down.  Most would set the number considerably lower.  This should be easy to implement.

 

You mean something like this ?

 

That's a nice enhancement for the Critical Temp, but I would still prefer an additional temp threshold for Shutdown Temp.  It's easier to understand I think, and should be set high enough that there's no question about what should happen, high enough that you don't care about what might be going on or might be interfered with, you just want the system off.  Then drop the Shutdown option from the Critical temp choices, and add a choice to Stop Array, to Critical Temp actions.  All threshold triggers would send notifications, including the action that will be taken.  For spin downs, you might add a 60 second delay, which should spread them out if repeated.

 

If a user wants, they can always set Shutdown Temp to be only one or two degrees hotter than Critical Temp.  Just my opinion, but I would probably set the 3 temps to 45, 60, and 70, with Critical Temp action to Stop Array.  If you have a way to wait for array to be fully stopped, then send a Spin All Down, that would be a nice bonus!

0

Share this post


Link to post
Share on other sites

I second RobJ, some kind of two-stage approach would be beneficial. Probably 1st threshold to try to spin down and 2nd to unconditionally shut down.

0

Share this post


Link to post
Share on other sites

A special consideration has to be considered here. If a drive is reporting a high temperature due to a sensor problem or something, you could create a scenario that unRaid would shutdown immediately after the reboot and give the user no opportunity to diagnose the problem.

0

Share this post


Link to post
Share on other sites

How likely is that scenario compared to say someone really having a hot disk and it failing. To me this is just a problem of getting the logging right.

0

Share this post


Link to post
Share on other sites

How likely is that scenario compared to say someone really having a hot disk and it failing. To me this is just a problem of getting the logging right.

 

Could also occur if someone fat fingers the value and saves it. Poof, server shutdown and won't come back up! :)

0

Share this post


Link to post
Share on other sites

 

How likely is that scenario compared to say someone really having a hot disk and it failing. To me this is just a problem of getting the logging right.

 

Could also occur if someone fat fingers the value and saves it. Poof, server shutdown and won't come back up! :)

 

Maybe this "feature" could be set to disabled if one entered safe mode?

0

Share this post


Link to post
Share on other sites

This time of year, my home office reaches high 80s late afternoon. Especially on parity check day (like today), I get a temp warning from one disc (&^% 5TB tosh). Reminds me to put a floor fan down in front of it until it finishes. I'd hate to have it shut down if I wasn't here to be reminded. :P

 

These big 6TB discs have to churn for nearly 20 hours straight once a month. They get kinda warm.

0

Share this post


Link to post
Share on other sites

A special consideration has to be considered here. If a drive is reporting a high temperature due to a sensor problem or something, you could create a scenario that unRaid would shutdown immediately after the reboot and give the user no opportunity to diagnose the problem.

Do you have any ideas to avoid that?  The user would have received notifications about the exact drive, probably multiple notifications.  I do know it's possible, as I have one drive right now where the reported temp can bounce from 62 to 94, currently reporting 61 but seems about the same temp (mid 30's) as the others.  The drive has issues, but I still use it for low value old videos, in a Windows station.

 

Could also occur if someone fat fingers the value and saves it. Poof, server shutdown and won't come back up! :)

I don't think this is a problem, because the temp settings could be checked for validity before accepted (e.g. Critical temp must be at least one degree above Warning temp, and Shutdown temp must be at least one degree above Critical temp, or settings aren't accepted/saved).  You would have had to have fat fingered all of them.

 

This time of year, my home office reaches high 80s late afternoon.

Since drive temps are always reported in Celsius, we tend to always do so too.  Your high 80's would be mid 30's to the drives, much lower than the 40's to 70's we are talking about. My suggested default of 70 is about 150 F.

 

Disabling an overtemp shutdown might be a good idea in Safe Mode.  And before shutting down the first time, it could disable auto-start of the array.

0

Share this post


Link to post
Share on other sites

Do you have any ideas to avoid that?  The user would have received notifications about the exact drive, probably multiple notifications.  I do know it's possible, as I have one drive right now where the reported temp can bounce from 62 to 94, currently reporting 61 but seems about the same temp (mid 30's) as the others.  The drive has issues, but I still use it for low value old videos, in a Windows station.

 

Safe mode is an interesting idea, but entering safe mode most likely has nothing to do with temperature, and I'm not sure we'd want this feature disabled for safe mode.

 

One simple idea is to document how to manually enter a setting file to change the critical temperature or turn off the feature.

 

Another idea is to disable the feature for some period of time from boot (say 10 mins) to give the user the opportunity to change their configuration before it shuts down the server.

0

Share this post


Link to post
Share on other sites

Any updates on this feature? I had a recent issue with drives heating up to over 50 in some cases during a parity check. This would be a great feature to have, thanks.

0

Share this post


Link to post
Share on other sites

I honestly think this feature would be wonderful to have. Especially for us who do not put our systems up on the net to remotely shut it down. I understand there is some logistics to this as not all systems are the same. However, this would be a nice almost insurance if a fan goes bad and no owner around to correct the problem before it becomes severe!

0

Share this post


Link to post
Share on other sites

 

I would also like this feature :)

 

On 1.7.2015 at 2:54 PM, bonienl said:

 

You mean something like this ?

 

critical-disk-action.png.85935567644b071ff8ecdad4fefb5de5.png

0

Share this post


Link to post
Share on other sites
On 25-6-2015 at 10:12 AM, Fireball3 said:

How about removing the disk load?

That means pausing/stopping the load generating process.

Idle disks shouldn't run hot even if cooling has failed.

 

Interesting... 

 

It does not sound to difficult to GLOBALLY exclude a disk at the moment it shows some kind of misery...

 

Globsl exclusion could be set on temp, but also based on smart values, a read error, etc...

 

Does not sound like a bad idea actually and the basic functionality is already there..

 

(ofcourse this would only work if people use user shares... disk shares would still be possible to write to..)

Edited by Helmonder
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

Copyright © 2005-2017 Lime Technology, Inc. unRAID® is a registered trademark of Lime Technology, Inc.