[Feature Request] Perform a clean shutdown if disk reaches critical temp


Recommended Posts

Would love an option for the server to perform a clean shutdown if a disk reaches critical temp.

 

Yesterday a fan from a 5in3 cage went bad and thanks to notifications system I got an email and was able to remotely shutdown my server, but then I thought, what if I was somewhere without internet? Or if the fan broke during the night? I would wake up to 5 cooked disks  :'(

 

Thanks in advance.

 

Link to comment

My only question would be, is it a good idea to turn off a disk in a hot state?

I know that sounds weird, but off means no fan spinning (if the fan is broken it's a moot point), so it is not being actively cooled.

 

I think of an overheating engine and shutting it off... You do not want the temp to rise, however turning it off kills the water pump, radiator fan, etc... and the temp just sits until passively cooling.

So, if you're going to implement this it may want to have some consideration.

Also, what about a staged response?

 

If all we were to do is shut off the server, why not just shutdown that disc? It is effectively the same, however the fans (the ones that are still working) are still moving air.

This can go the opposite way as there are still devices causing heat, but I think it is still something that should be pondered.

Link to comment

My only question would be, is it a good idea to turn off a disk in a hot state?

 

I believe that it is always better than leaving it on overheating, the disk will cool down slowly, also you can set a low critical temp, in my case my disks are normally at around 35C/95F, my Hot warning is at 40C/104C and Critical at 45C/113F, I now that's a low critical temp but if any of my disks reach 45C there's something wrong with the cooling.

 

Would like to add to my request:

send email before shutdown like "disk x reached critical temp, server shutdown in 60 seconds"

not so important but nice, possibility to abort the shutdown

 

Thanks

Link to comment

I have advocated for this a number of times but never got traction. Maybe will this time. But I would say that no other OS or appliance has this sort of feature. Not a good reason not to do it, but obviously others have considered and not done it. I think about bad sensors and attempts to diagnose heat related issues. How annoying would it be to have to fight with the feature. So there would have to be ways to disable it, at least temporarily. The only heat related failure I can remember is someone that had an overnight guest throw a blanket on top of a server because it was too noisy. But these stories are not common.

Link to comment

I think those sensors are quite reliable these days.

I can't remember of one harddisk - even old ones - with a bad temp reading.

Certainly there has to be some possiblility to adjust the individual bounds but

it shouldn't be an issue.

 

obviously others have considered and not done it

Perhaps they did it on the other side?

e.g. they monitor fans etc.

Link to comment

I think those sensors are quite reliable these days.

I can't remember of one harddisk - even old ones - with a bad temp reading.

Certainly there has to be some possiblility to adjust the individual bounds but

it shouldn't be an issue.

 

obviously others have considered and not done it

Perhaps they did it on the other side?

e.g. they monitor fans etc.

 

SEE THIS POST

 

Still think this is a good idea - but there are bigger fish to fry. How about dual parity?

Link to comment

My only question would be, is it a good idea to turn off a disk in a hot state?

I know that sounds weird, but off means no fan spinning (if the fan is broken it's a moot point), so it is not being actively cooled.

 

I think of an overheating engine and shutting it off... You do not want the temp to rise, however turning it off kills the water pump, radiator fan, etc... and the temp just sits until passively cooling.

So, if you're going to implement this it may want to have some consideration.

Also, what about a staged response?

 

If all we were to do is shut off the server, why not just shutdown that disc? It is effectively the same, however the fans (the ones that are still working) are still moving air.

This can go the opposite way as there are still devices causing heat, but I think it is still something that should be pondered.

 

Slightly off topic, but food for thought in this case.  I've seen usb fans.

If the motherboard can supply usb power while in a powered down state, then a usb fan in the case could help exhaust heat while idle.

 

Fringe case, but still viable.

I remember years ago there used to be PCI slot fans that would stay spinning for a certain amount of time on power supply standby power.

Link to comment

Slightly off topic, but food for thought in this case.  I've seen usb fans.

If the motherboard can supply usb power while in a powered down state, then a usb fan in the case could help exhaust heat while idle.

 

Fringe case, but still viable.

I remember years ago there used to be PCI slot fans that would stay spinning for a certain amount of time on power supply standby power.

 

This is true, but as you state, it'd be a fringe case.

I think this in general would be a valuable addition as a feature in whatever way it is implemented.

The idea as posed is mainly for a critical temp, where off is much better than continued heat build-up.

However, I also think it could be done better to help avoid getting to that point, and thus requiring a completely off state.

 

Link to comment

I would think that the best option for this is a more general one.  Along with the notifications for browser, email, agents, add another one "user defined scripts".  This would allow a script to respond to any particular notification and take action based upon it.

Link to comment

How feasible would it be to reply to the service  status email in someway with a command in the subject line such as SHUTDOWN and have the server then perform a safe shutdown. Example would be overheating server or failing disk while away from the house

 

Pretty nice idea although there might be a considerable delay until the user response is available.

Therefore only a reasonable solution for less critical errors.

Link to comment

To me, the easiest solution is provide one more setting on the Display Settings page, immediately after the current "Critical disk temperature threshold", for "Shutdown disk temperature threshold".  You really don't need a delay or any other option, because if it reaches the shutdown temp, then you already have received the Warning temp notification, and then the Critical temp notification.  If a temp reaches the Shutdown temp, powerdown is called.  I would default it to a high number like 80, so it does not surprise anyone, and I think we can all agree a temp of 80 should shut the server down.  Most would set the number considerably lower.  This should be easy to implement.

Link to comment

To me, the easiest solution is provide one more setting on the Display Settings page, immediately after the current "Critical disk temperature threshold", for "Shutdown disk temperature threshold".  You really don't need a delay or any other option, because if it reaches the shutdown temp, then you already have received the Warning temp notification, and then the Critical temp notification.  If a temp reaches the Shutdown temp, powerdown is called.  I would default it to a high number like 80, so it does not surprise anyone, and I think we can all agree a temp of 80 should shut the server down.  Most would set the number considerably lower.  This should be easy to implement.

 

You mean something like this ?

critical-disk-action.png.85935567644b071ff8ecdad4fefb5de5.png

Link to comment

To me, the easiest solution is provide one more setting on the Display Settings page, immediately after the current "Critical disk temperature threshold", for "Shutdown disk temperature threshold".  You really don't need a delay or any other option, because if it reaches the shutdown temp, then you already have received the Warning temp notification, and then the Critical temp notification.  If a temp reaches the Shutdown temp, powerdown is called.  I would default it to a high number like 80, so it does not surprise anyone, and I think we can all agree a temp of 80 should shut the server down.  Most would set the number considerably lower.  This should be easy to implement.

 

You mean something like this ?

The only problem with spin down disk / spin down array is that the odds are extremely good that the drive in question is spun up because the system is actively using it (eg:  my cache drive is an ancient drive.  Consistently goes over my warnings set at 40C during high activity)

 

So, it goes over temperature.  We spin it down / spin down the array.  First thing that's going to happen is the system spins it back up.  But wait... Its over temperature.  Lets spin it down again.  Whoops...  System accessing the drive.  Spin it up.  Ad Nauseum...

 

And that situation is going to be far worse for the drive that merely running a little hot.

Link to comment

To me, the easiest solution is provide one more setting on the Display Settings page, immediately after the current "Critical disk temperature threshold", for "Shutdown disk temperature threshold".  You really don't need a delay or any other option, because if it reaches the shutdown temp, then you already have received the Warning temp notification, and then the Critical temp notification.  If a temp reaches the Shutdown temp, powerdown is called.  I would default it to a high number like 80, so it does not surprise anyone, and I think we can all agree a temp of 80 should shut the server down.  Most would set the number considerably lower.  This should be easy to implement.

 

You mean something like this ?

The only problem with spin down disk / spin down array is that the odds are extremely good that the drive in question is spun up because the system is actively using it (eg:  my cache drive is an ancient drive.  Consistently goes over my warnings set at 40C during high activity)

 

So, it goes over temperature.  We spin it down / spin down the array.  First thing that's going to happen is the system spins it back up.  But wait... Its over temperature.  Lets spin it down again.  Whoops...  System accessing the drive.  Spin it up.  Ad Nauseum...

 

And that situation is going to be far worse for the drive that merely running a little hot.

 

That is indeed a scenario which may happen if the disk is actively in use. Though it will not immediately spin down the disk a next time unless the temperature starts to climb.

 

On the other hand a system shutdown action is quite drastic and may interfere with anything going on at that point.

 

I am open to any bright ideas how to deal with this in a smart way :)

 

Link to comment

I would say that if disks are spun down due to a over temp state then the alert is sent and manual intervention would be needed to bring it back up.

 

Anything other than this just caters for edge cases where certain users are happy to run hotter and they would either need to turn off the feature or change the over temp levels.

 

The problem of the "disk in use" is a global one for unRAID that needs a global solution. For sure it is relevant here but its a separate issue.

Link to comment

That is indeed a scenario which may happen if the disk is actively in use. Though it will not immediately spin down the disk a next time unless the temperature starts to climb. 

You'd be looking at a spin down followed by an immediate spin up (or a slight delay depending upon when the next read / write actually takes place.  Odds are that the drive would never actually spin down long enough to let the temp drop, and repeated spin ups / downs / ups / downs can't at all be good for the drive.
Link to comment

Western Digital drives report their temp without spinning up the disk. It would be nice if the Web GUI would continue to report temps on spun down WDs. That way if there is something very wrong and the temps were getting very high, the computer would be able to shut itself down. Otherwise, this shut down would never happen on a sleeping array.

Link to comment

Western Digital drives report their temp without spinning up the disk. It would be nice if the Web GUI would continue to report temps on spun down WDs. That way if there is something very wrong and the temps were getting very high, the computer would be able to shut itself down. Otherwise, this shut down would never happen on a sleeping array.

If your drives are overheating while they are spun down the system should probably email 911 because you've probably got more serious problems happening elsewhere.
Link to comment

Western Digital drives report their temp without spinning up the disk. It would be nice if the Web GUI would continue to report temps on spun down WDs. That way if there is something very wrong and the temps were getting very high, the computer would be able to shut itself down. Otherwise, this shut down would never happen on a sleeping array.

 

True, but then some logic needs to go in to determine the drive manufacturer. A Seagate drive will spin up (at least the ones I have)...

 

Link to comment

Western Digital drives report their temp without spinning up the disk. It would be nice if the Web GUI would continue to report temps on spun down WDs. That way if there is something very wrong and the temps were getting very high, the computer would be able to shut itself down. Otherwise, this shut down would never happen on a sleeping array.

If your drives are overheating while they are spun down the system should probably email 911 because you've probably got more serious problems happening elsewhere.

 

Excellent observation  :D

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.