[Plugin] IPMI for unRAID 6.1+


Recommended Posts

 

 

Working with local again  :)  I have one of the S2600CP2 systems and notice that the processor Therm Margin displays as negative (margin or headroom) but when selected as one of the four pull-downs it is displayed as ##.  Could this be changed to display the same as the main Sensor display?

 

Thanks for the quick update.

 

There was a conversation about this earlier in the thread. It would be nice if I could figure out what zero is but it's processor dependent. Then I could display the actual temp.

 

I will remove the restriction on  negative numbers so they show in the footer. Zero will still show ## since if using the hdd temp as a display when they are spin down the temp would be misleading.

Link to comment

The plugin is working great.  I do have a request though :) After my last bios update, ipmi-sensors added a bunch of useless entries (everything below with a reading of N/A, plus CPU_AP1 temp):

 

root@Tower:/# ipmi-sensors
ID | Name            | Type        | Reading    | Units | Event
3  | ATX+5VSB        | Voltage     | 4.95       | V     | 'OK'
4  | +3VSB           | Voltage     | 3.44       | V     | 'OK'
5  | Vcore1          | Voltage     | 1.79       | V     | 'OK'
6  | Vcore2          | Voltage     | N/A        | V     | N/A
7  | VCCM1           | Voltage     | 1.51       | V     | 'OK'
8  | VCCM2           | Voltage     | N/A        | V     | N/A
9  | +1.10_PCH       | Voltage     | 1.07       | V     | 'OK'
10 | +1.50_PCH       | Voltage     | N/A        | V     | N/A
11 | CPU VTT1        | Voltage     | 1.00       | V     | 'OK'
12 | CPU VTT2        | Voltage     | N/A        | V     | N/A
13 | BAT             | Voltage     | 3.26       | V     | 'OK'
14 | +3V             | Voltage     | 3.38       | V     | 'OK'
15 | +5V             | Voltage     | 5.10       | V     | 'OK'
16 | +12V            | Voltage     | 12.00      | V     | 'OK'
17 | CPU_FAN1_1      | Fan         | 1200.00    | RPM   | 'OK'
18 | CPU_FAN2_1      | Fan         | N/A        | RPM   | N/A
19 | REAR_FAN1       | Fan         | 500.00     | RPM   | 'OK'
20 | REAR_FAN2       | Fan         | N/A        | RPM   | N/A
21 | FRNT_FAN1       | Fan         | 500.00     | RPM   | 'OK'
22 | FRNT_FAN2       | Fan         | N/A        | RPM   | N/A
23 | FRNT_FAN3       | Fan         | N/A        | RPM   | N/A
24 | FRNT_FAN4       | Fan         | N/A        | RPM   | N/A
25 | CPU_FAN1_2      | Fan         | N/A        | RPM   | N/A
26 | CPU_FAN2_2      | Fan         | N/A        | RPM   | N/A
27 | MB Temperature  | Temperature | 39.00      | C     | 'OK'
28 | TR1 Temperature | Temperature | N/A        | C     | N/A
30 | CPU_BSP1 Temp   | Temperature | 40.00      | C     | 'OK'
31 | CPU_AP1 Temp    | Temperature | 0.00       | C     | 'OK'

 

This clutters the readings and all the dropdowns with useless information.

 

Adding "--ignore-not-available-sensors" to the command gets rid of everything N/A, and "-R 31" gets rid of the useless temperature:

 

root@Tower:/# /usr/sbin/ipmi-sensors --ignore-not-available-sensors -R 31
ID | Name            | Type        | Reading    | Units | Event
3  | ATX+5VSB        | Voltage     | 4.95       | V     | 'OK'
4  | +3VSB           | Voltage     | 3.44       | V     | 'OK'
5  | Vcore1          | Voltage     | 1.79       | V     | 'OK'
7  | VCCM1           | Voltage     | 1.51       | V     | 'OK'
9  | +1.10_PCH       | Voltage     | 1.07       | V     | 'OK'
11 | CPU VTT1        | Voltage     | 1.01       | V     | 'OK'
13 | BAT             | Voltage     | 3.26       | V     | 'OK'
14 | +3V             | Voltage     | 3.38       | V     | 'OK'
15 | +5V             | Voltage     | 5.10       | V     | 'OK'
16 | +12V            | Voltage     | 12.00      | V     | 'OK'
17 | CPU_FAN1_1      | Fan         | 1200.00    | RPM   | 'OK'
19 | REAR_FAN1       | Fan         | 500.00     | RPM   | 'OK'
21 | FRNT_FAN1       | Fan         | 500.00     | RPM   | 'OK'
27 | MB Temperature  | Temperature | 39.00      | C     | 'OK'
30 | CPU_BSP1 Temp   | Temperature | 40.00      | C     | 'OK'

 

So I was wondering if we could add an IGNORE entry to ipmi.cfg, for me it would look like this:

  IGNORE="--ignore-not-available-sensors -R 31"

and then every time you call ipmi-sensors you would add the $IGNORE variable to the call.  I'd be ok if I had to add/edit that line in ipmi.cfg manually.

 

Or if you think everyone would benefit from --ignore-not-available-sensors (I think they would), then include that by default everywhere, meaning my file ipmi.cfg would just have:

  IGNORE="-R 31"

 

What do you think?

 

Ignore n/a was originally part of the ipmi-sensors options. However when using fan control, it created a problem when the hard drives were spin down and the fans turned off.  The corresponding fan sensors would then go n/a and drop from the lists and views. That could cause confusion and other problems.

 

1. I could remove all n/a sensors from dropdowns except fans

 

2. and also from the readings display.

 

3. I think I will add an ignore dropdown checklist with all the sensors listed. Then you could pick any sensor to ignore. They would be added to an IGNORE setting in then ipmi.cfg then applied to the ipmi-sensors with the -R option.

 

#3 I'll definitely do. Also I think removing any n/a temps from the footer and fan control dropdowns makes sense. Let me know what you think.

 

Also did you update the bmc when you updated the bios? Since they are usually separate. The last one for me changed my board manufacturer from ASRock to ASRockRack in the bmc plus some other sensor names.

 

Link to comment

Working with local again  :)  I have one of the S2600CP2 systems and notice that the processor Therm Margin displays as negative (margin or headroom) but when selected as one of the four pull-downs it is displayed as ##.  Could this be changed to display the same as the main Sensor display?

 

Thanks for the quick update.

 

Can you telnet into your system and type "ipmi-sensors", then copy/paste the output here?

 

 

 

root@Tower:~# ipmi-sensors

ID | Name            | Type                        | Reading    | Units | Event

1  | Pwr Unit Status  | Power Unit                  | N/A        | N/A  | 'OK'

2  | IPMI Watchdog    | Watchdog 2                  | N/A        | N/A  | 'OK'

3  | Physical Scrty  | Physical Security          | N/A        | N/A  | 'OK'

4  | SMI Timeout      | OEM Reserved                | N/A        | N/A  | 'OK'

5  | System Event Log | Event Logging Disabled      | N/A        | N/A  | 'OK'

6  | System Event    | System Event                | N/A        | N/A  | 'OK'

7  | Button          | Button/Switch              | N/A        | N/A  | 'OK'

9  | VR Watchdog      | Voltage                    | N/A        | N/A  | 'OK'

10 | SSB Therm Trip  | Temperature                | N/A        | N/A  | 'OK'

11 | BMC FW Health    | Management Subsystem Health | N/A        | N/A  | 'OK'

12 | System Airflow  | Other Units Based Sensor    | 0.00      | CFM  | 'OK'

14 | BB EDGE Temp    | Temperature                | 31.00      | C    | 'OK'

15 | SSB Temp        | Temperature                | 47.00      | C    | 'OK'

16 | BB BMC Temp      | Temperature                | 39.00      | C    | 'OK'

17 | BB P2 VR Temp    | Temperature                | 31.00      | C    | 'OK'

18 | BB MEM VR Temp  | Temperature                | 30.00      | C    | 'OK'

19 | LAN NIC Temp    | Temperature                | 46.00      | C    | 'OK'

20 | System Fan 4    | Fan                        | 1176.00    | RPM  | 'OK'

21 | Processor 1 Fan  | Fan                        | 1274.00    | RPM  | 'OK'

22 | Processor 2 Fan  | Fan                        | 1274.00    | RPM  | 'OK'

23 | Rear Fan        | Fan                        | 1078.00    | RPM  | 'OK'

24 | P1 Status        | Processor                  | N/A        | N/A  | 'Processor Presence detected'

25 | P2 Status        | Processor                  | N/A        | N/A  | 'Processor Presence detected'

26 | P1 Therm Margin  | Temperature                | -60.00    | C    | 'OK'

27 | P2 Therm Margin  | Temperature                | -58.00    | C    | 'OK'

28 | P1 Therm Ctrl %  | Temperature                | 0.00      | %    | 'OK'

29 | P2 Therm Ctrl %  | Temperature                | 0.00      | %    | 'OK'

30 | P1 ERR2          | Processor                  | N/A        | N/A  | 'OK'

31 | P2 ERR2          | Processor                  | N/A        | N/A  | 'OK'

32 | CATERR          | Processor                  | N/A        | N/A  | 'OK'

33 | P1 MSID Mismatch | Processor                  | N/A        | N/A  | 'OK'

34 | CPU Missing      | Processor                  | N/A        | N/A  | 'OK'

35 | P1 DTS Therm Mgn | Temperature                | -60.00    | C    | 'OK'

36 | P2 DTS Therm Mgn | Temperature                | -58.00    | C    | 'OK'

37 | P2 MSID Mismatch | Processor                  | N/A        | N/A  | 'OK'

38 | P1 VRD Hot      | Temperature                | N/A        | N/A  | 'OK'

39 | P2 VRD Hot      | Temperature                | N/A        | N/A  | 'OK'

40 | P1 MEM01 VRD Hot | Temperature                | N/A        | N/A  | 'OK'

41 | P1 MEM23 VRD Hot | Temperature                | N/A        | N/A  | 'OK'

42 | P2 MEM01 VRD Hot | Temperature                | N/A        | N/A  | 'OK'

43 | P2 MEM23 VRD Hot | Temperature                | N/A        | N/A  | 'OK'

44 | DIMM Thrm Mrgn 1 | Temperature                | -56.00    | C    | 'OK'

45 | DIMM Thrm Mrgn 2 | Temperature                | N/A        | C    | N/A

46 | DIMM Thrm Mrgn 3 | Temperature                | -58.00    | C    | 'OK'

47 | DIMM Thrm Mrgn 4 | Temperature                | N/A        | C    | N/A

48 | Mem P1 Thrm Trip | Memory                      | N/A        | N/A  | 'OK'

49 | Mem P2 Thrm Trip | Memory                      | N/A        | N/A  | 'OK'

50 | BB +12.0V        | Voltage                    | 11.78      | V    | 'OK'

51 | BB +5.0V        | Voltage                    | 4.92      | V    | 'OK'

52 | BB +3.3V        | Voltage                    | 3.34      | V    | 'OK'

53 | BB +5.0V STBY    | Voltage                    | 4.96      | V    | 'OK'

54 | BB +3.3V AUX    | Voltage                    | 3.28      | V    | 'OK'

55 | BB +1.05V P1Vccp | Voltage                    | 0.81      | V    | 'OK'

56 | BB +1.05V P2Vccp | Voltage                    | 0.77      | V    | 'OK'

57 | BB +1.5 P1DDR AB | Voltage                    | N/A        | V    | N/A

58 | BB +1.5 P1DDR CD | Voltage                    | N/A        | V    | N/A

59 | BB +1.5 P2DDR AB | Voltage                    | N/A        | V    | N/A

60 | BB +1.5 P2DDR CD | Voltage                    | N/A        | V    | N/A

61 | BB +1.8V AUX    | Voltage                    | 1.79      | V    | 'OK'

62 | BB +1.1V STBY    | Voltage                    | 1.08      | V    | 'OK'

63 | BB VBAT          | Voltage                    | 3.15      | V    | 'OK'

64 | BB +1.35 P1LV AB | Voltage                    | 1.34      | V    | 'OK'

65 | BB +1.35 P1LV CD | Voltage                    | 1.35      | V    | 'OK'

66 | BB +1.35 P2LV AB | Voltage                    | 1.35      | V    | 'OK'

67 | BB +1.35 P2LV CD | Voltage                    | 1.35      | V    | 'OK'

71 | NM Capabilities  | OEM Reserved                | N/A        | N/A  | N/A

74 | P1 MTT          | Memory                      | N/A        | %    | N/A

75 | P2 MTT          | Memory                      | N/A        | %    | N/A

root@Tower:~#

Link to comment

The plugin is working great.  I do have a request though :) After my last bios update, ipmi-sensors added a bunch of useless entries (everything below with a reading of N/A, plus CPU_AP1 temp):

 

root@Tower:/# ipmi-sensors
ID | Name            | Type        | Reading    | Units | Event
3  | ATX+5VSB        | Voltage     | 4.95       | V     | 'OK'
4  | +3VSB           | Voltage     | 3.44       | V     | 'OK'
5  | Vcore1          | Voltage     | 1.79       | V     | 'OK'
6  | Vcore2          | Voltage     | N/A        | V     | N/A
7  | VCCM1           | Voltage     | 1.51       | V     | 'OK'
8  | VCCM2           | Voltage     | N/A        | V     | N/A
9  | +1.10_PCH       | Voltage     | 1.07       | V     | 'OK'
10 | +1.50_PCH       | Voltage     | N/A        | V     | N/A
11 | CPU VTT1        | Voltage     | 1.00       | V     | 'OK'
12 | CPU VTT2        | Voltage     | N/A        | V     | N/A
13 | BAT             | Voltage     | 3.26       | V     | 'OK'
14 | +3V             | Voltage     | 3.38       | V     | 'OK'
15 | +5V             | Voltage     | 5.10       | V     | 'OK'
16 | +12V            | Voltage     | 12.00      | V     | 'OK'
17 | CPU_FAN1_1      | Fan         | 1200.00    | RPM   | 'OK'
18 | CPU_FAN2_1      | Fan         | N/A        | RPM   | N/A
19 | REAR_FAN1       | Fan         | 500.00     | RPM   | 'OK'
20 | REAR_FAN2       | Fan         | N/A        | RPM   | N/A
21 | FRNT_FAN1       | Fan         | 500.00     | RPM   | 'OK'
22 | FRNT_FAN2       | Fan         | N/A        | RPM   | N/A
23 | FRNT_FAN3       | Fan         | N/A        | RPM   | N/A
24 | FRNT_FAN4       | Fan         | N/A        | RPM   | N/A
25 | CPU_FAN1_2      | Fan         | N/A        | RPM   | N/A
26 | CPU_FAN2_2      | Fan         | N/A        | RPM   | N/A
27 | MB Temperature  | Temperature | 39.00      | C     | 'OK'
28 | TR1 Temperature | Temperature | N/A        | C     | N/A
30 | CPU_BSP1 Temp   | Temperature | 40.00      | C     | 'OK'
31 | CPU_AP1 Temp    | Temperature | 0.00       | C     | 'OK'

 

This clutters the readings and all the dropdowns with useless information.

 

Adding "--ignore-not-available-sensors" to the command gets rid of everything N/A, and "-R 31" gets rid of the useless temperature:

 

root@Tower:/# /usr/sbin/ipmi-sensors --ignore-not-available-sensors -R 31
ID | Name            | Type        | Reading    | Units | Event
3  | ATX+5VSB        | Voltage     | 4.95       | V     | 'OK'
4  | +3VSB           | Voltage     | 3.44       | V     | 'OK'
5  | Vcore1          | Voltage     | 1.79       | V     | 'OK'
7  | VCCM1           | Voltage     | 1.51       | V     | 'OK'
9  | +1.10_PCH       | Voltage     | 1.07       | V     | 'OK'
11 | CPU VTT1        | Voltage     | 1.01       | V     | 'OK'
13 | BAT             | Voltage     | 3.26       | V     | 'OK'
14 | +3V             | Voltage     | 3.38       | V     | 'OK'
15 | +5V             | Voltage     | 5.10       | V     | 'OK'
16 | +12V            | Voltage     | 12.00      | V     | 'OK'
17 | CPU_FAN1_1      | Fan         | 1200.00    | RPM   | 'OK'
19 | REAR_FAN1       | Fan         | 500.00     | RPM   | 'OK'
21 | FRNT_FAN1       | Fan         | 500.00     | RPM   | 'OK'
27 | MB Temperature  | Temperature | 39.00      | C     | 'OK'
30 | CPU_BSP1 Temp   | Temperature | 40.00      | C     | 'OK'

 

So I was wondering if we could add an IGNORE entry to ipmi.cfg, for me it would look like this:

  IGNORE="--ignore-not-available-sensors -R 31"

and then every time you call ipmi-sensors you would add the $IGNORE variable to the call.  I'd be ok if I had to add/edit that line in ipmi.cfg manually.

 

Or if you think everyone would benefit from --ignore-not-available-sensors (I think they would), then include that by default everywhere, meaning my file ipmi.cfg would just have:

  IGNORE="-R 31"

 

What do you think?

Maybe something like this

50dcd20f5d99f84ed58ef9b154975ac7.jpg

Link to comment

Also did you update the bmc when you updated the bios? Since they are usually separate. The last one for me changed my board manufacturer from ASRock to ASRockRack in the bmc plus some other sensor names.

 

Ah you're right, it must have been the bmc update that changed everything, I did both at the same time. 

 

Ignore n/a was originally part of the ipmi-sensors options. However when using fan control, it created a problem when the hard drives were spin down and the fans turned off.  The corresponding fan sensors would then go n/a and drop from the lists and views. That could cause confusion and other problems.

 

Can definitely see how that would be a problem.  I didn't know valid sensors would go n/a (!) I would have expected them to drop to 0 rpm instead.

 

 

#3 I'll definitely do. Also I think removing any n/a temps from the footer and fan control dropdowns makes sense. Let me know what you think.

 

It seems like this would have the same problem as the --ignore n/a setting?  I'm happy with the ability to ignore individual sensors.

 

3. I think I will add an ignore dropdown checklist with all the sensors listed. Then you could pick any sensor to ignore. They would be added to an IGNORE setting in then ipmi.cfg then applied to the ipmi-sensors with the -R option.

 

The screenshot looks great!

 

Could we add the current sensor value next to the sensor name? Something like this:

 

CPU_FAN1_1 (192.168.69.50) - 1200.00 RPM
CPU_FAN2_2 (192.168.69.50) - N/A

 

That would make it easier to know which sensors to ignore.

 

Link to comment

Would it be a bit more intuitive if you reversed it, changing "Ignored Sensors List" to "Usable Sensors List" or "Valid Sensors List" or "Sensors List (unchecked will be ignored)"?

Yes that's a good idea. It was something I put together real quick and posting checked boxes is easier. But I haven't done anything else. I'll see what I can do.

Link to comment

 

 

 

The screenshot looks great!

 

Could we add the current sensor value next to the sensor name? Something like this:

 

CPU_FAN1_1 (192.168.69.50) - 1200.00 RPM
CPU_FAN2_2 (192.168.69.50) - N/A

 

That would make it easier to know which sensors to ignore.

 

Thanks. Yes. Adding any sensor info is easy since all the sensors are stored in a global array by their id number.

 

I also need to add a fix for ASRock boards with 10 fans like yours. The first 2 fans (CPU_FAN1_1 & CPU_FAN2_2) can't be controller right now since it looks for CPU_FAN1 & CPU_FAN2. I believe the last 2 fans are not controllable at all.

Link to comment

I also need to add a fix for ASRock boards with 10 fans like yours. The first 2 fans (CPU_FAN1_1 & CPU_FAN2_2) can't be controller right now since it looks for CPU_FAN1 & CPU_FAN2. I believe the last 2 fans are not controllable at all.

 

Sorry, I guess that's a side effect of my recent bmc update.  The system only has 3 fans, not sure why it feels the need to expose 10!

 

On the plus side, the system is actually doing a decent job of regulating the fan speed on its own now.  Prior to my recent bios/bmc updates I had to set the fans to a fixed rpm.  "auto" mode works now.

Link to comment

I also need to add a fix for ASRock boards with 10 fans like yours. The first 2 fans (CPU_FAN1_1 & CPU_FAN2_2) can't be controller right now since it looks for CPU_FAN1 & CPU_FAN2. I believe the last 2 fans are not controllable at all.

 

Sorry, I guess that's a side effect of my recent bmc update.  The system only has 3 fans, not sure why it feels the need to expose 10!

 

On the plus side, the system is actually doing a decent job of regulating the fan speed on its own now.  Prior to my recent bios/bmc updates I had to set the fans to a fixed rpm.  "auto" mode works now.

I need the fan control. My CPU is fanless and the system doesn't produce enough heat to spin the fans fast enough to cool the hard drives based on CPU temp. I was using auto but I had to skew the fans to come on sooner and run faster. Now I just run the rear exhaust on auto and the 2 intake/hdd fans based on hdd temps.

Link to comment

Thanks for the fan control! I can now run a parity check without worrying about temperatures.

No problem. Your welcome. I just run two of my fans based on hard drive temps. I let the other run auto and don't control it so it's cpu driven.

Just wanted to post a thank-you for this great plugin! The last update was a biggie for me. Being able to modify the thresholds so easily is fantastic!

Glad you find it useful. I use it to set my 2 hard drive fans so no events are generated when the fan control shuts them off. Also I use the load config option since my sensor configuration doesn't permanently save. So if I restart my thresholds and event options are reloaded.

 

 

Btw I dropped another little update with a Dashboard page and sensor ignore options.

Link to comment

Btw I dropped another little update with a Dashboard page and sensor ignore options.

 

Thanks for this!  The ability to ignore sensors is awesome, and the dashboard was a nice surprise!

 

I suppressed all the voltage readings from the dashboard, but if one of them throws an alarm can it still be displayed?

 

BTW, I submitted pull requests to fix two issues:

  • Footer content doesn't display when using local IPMI
  • Footer content doesn't display unless you ignore at least one sensor

Also, I was wondering if you would mind adding the Event ID column to the Archived Events page?  This will let me sort events that have an invalid date stamp.

 

Thanks!

Link to comment

I wanted to activate the fan control for my main server (ASRock - E3C224D4I-14S) but the plugin is showing: "Your board is not currently supported"

Is that a correct statement? My understanding was that all ASRock boards are being supported but this might be a misunderstanding on my side.

Thanks for you plugin!!!

Link to comment

I wanted to activate the fan control for my main server (ASRock - E3C224D4I-14S) but the plugin is showing: "Your board is not currently supported"

Is that a correct statement? My understanding was that all ASRock boards are being supported but this might be a misunderstanding on my side.

Thanks for you plugin!!!

First make sure you are on the latest.  I just pushed an update. What does the command ipmi-fru show? Have you updated your BIOS and BMC also.  They made some recent changes to ASRockRack models.

 

It won't work completely for all ASRock boards but should for most.  It should work with your model and mine since those are really the only two that have been tested thoroughly using the ipmi raw commands and hex values to match those to corresponding fans.  Fan control works by fan name and board manufacturer name.  There's a multidimensional array (/boot/config/plugins/ipmi/boards.json) that the plugin checks for board manufacturer name. Right now there's just ASRock and ASRockRack. It works by matching the fan name with the position of the fan name in the array and then creates the ipmi raw hex command in the right order. On some boards the cpu fan control won't work (those with CPU_FAN1_1 and CPU_FAN2_1) until I create a fix. Other fans should work.

 

Also right now board.json is automatically updated from github periodically or if missing.  I am gonna add a button under fan control to manually update board.json and remove auto update so the array can be edited and not overwritten unless you choose to.

Link to comment

Btw I dropped another little update with a Dashboard page and sensor ignore options.

 

Thanks for this!  The ability to ignore sensors is awesome, and the dashboard was a nice surprise!

 

I suppressed all the voltage readings from the dashboard, but if one of them throws an alarm can it still be displayed?

 

BTW, I submitted pull requests to fix two issues:

  • Footer content doesn't display when using local IPMI
  • Footer content doesn't display unless you ignore at least one sensor

Also, I was wondering if you would mind adding the Event ID column to the Archived Events page?  This will let me sort events that have an invalid date stamp.

 

Thanks!

Thanks again for the help. I removed the footer network check and combined the all and ignore logic. I also added to the Dashboard to override and show any sensor if state is not nominal. I'm not sure you'd get the results you want from the Archive page and Event ID.  I left it off because the ID restarts every time you clear all the events.  So you could end up with several 1,2,3... event ID's.  I'm not sure if this will get what you want but if you click the header sort a third time it will go to unsorted. This should be in the order that the events happened no matter the date.  The archive is just a dump of the the events in ID order if you do a Clear All and subsequent clears are just appended. But if you cleared them individually then that wouldn't help.

Link to comment

I wanted to activate the fan control for my main server (ASRock - E3C224D4I-14S) but the plugin is showing: "Your board is not currently supported"

Is that a correct statement? My understanding was that all ASRock boards are being supported but this might be a misunderstanding on my side.

Thanks for you plugin!!!

First make sure you are on the latest.  I just pushed an update. What does the command ipmi-fru show? Have you updated your BIOS and BMC also.  They made some recent changes to ASRockRack models.

 

It won't work completely for all ASRock boards but should for most.  It should work with your model and mine since those are really the only two that have been tested thoroughly using the ipmi raw commands and hex values to match those to corresponding fans.  Fan control works by fan name and board manufacturer name.  There's a multidimensional array (/boot/config/plugins/ipmi/boards.json) that the plugin checks for board manufacturer name. Right now there's just ASRock and ASRockRack. It works by matching the fan name with the position of the fan name in the array and then creates the ipmi raw hex command in the right order. On some boards the cpu fan control won't work (those with CPU_FAN1_1 and CPU_FAN2_1) until I create a fix. Other fans should work.

 

Also right now board.json is automatically updated from github periodically or if missing.  I am gonna add a button under fan control to manually update board.json and remove auto update so the array can be edited and not overwritten unless you choose to.

I had installed 2016.05.15 and just upgraded to 2016.05.16 which was making no difference. BIOS Version was already P3.20 but I had to upgrade BMC from 0.12 to 0.16.

However that was not resolving the issue:

oot@Tower:~# ipmi-fru 
FRU Inventory Device: Default FRU Device (ID 00h)
  FRU Error: board info area checksum invalid
  FRU Product Manufacturer Name: ASRockRack

 

The syslog doesn't show any errors.

Link to comment

I had installed 2016.05.15 and just upgraded to 2016.05.16 which was making no difference. BIOS Version was already P3.20 but I had to upgrade BMC from 0.12 to 0.16.

However that was not resolving the issue:

oot@Tower:~# ipmi-fru 
FRU Inventory Device: Default FRU Device (ID 00h)
  FRU Error: board info area checksum invalid
  FRU Product Manufacturer Name: ASRockRack

 

The syslog doesn't show any errors.

It's not working because of the checksum error. I'll shorten the grep to Manufacturer instead of Board Manufacturer.  That should fix it for you. But I'd still check into that error.  bmc-device --cold-reset  will reset the bmc as if you unplugged it from the wall. A reflash or factory reset of the BMC may be necessary.

Link to comment

Btw I dropped another little update with a Dashboard page and sensor ignore options.

 

Thanks for this!  The ability to ignore sensors is awesome, and the dashboard was a nice surprise!

 

I suppressed all the voltage readings from the dashboard, but if one of them throws an alarm can it still be displayed?

 

BTW, I submitted pull requests to fix two issues:

  • Footer content doesn't display when using local IPMI
  • Footer content doesn't display unless you ignore at least one sensor

Also, I was wondering if you would mind adding the Event ID column to the Archived Events page?  This will let me sort events that have an invalid date stamp.

 

Thanks!

Thanks again for the help. I removed the footer network check and combined the all and ignore logic. I also added to the Dashboard to override and show any sensor if state is not nominal. I'm not sure you'd get the results you want from the Archive page and Event ID.  I left it off because the ID restarts every time you clear all the events.  So you could end up with several 1,2,3... event ID's.  I'm not sure if this will get what you want but if you click the header sort a third time it will go to unsorted. This should be in the order that the events happened no matter the date.  The archive is just a dump of the the events in ID order if you do a Clear All and subsequent clears are just appended. But if you cleared them individually then that wouldn't help.

 

Ah, thanks for the explanation of how Clear All affects the Event ID.  I agree, sorting the archive by event id wouldn't really make sense given the ids can be reused.

 

The dashboard changes look great, and the footer works perfectly both locally and over the network!

 

Sorry if this was there yesterday and I didn't notice it, but the Fan Settings page isn't honoring the "ignore" options, it displays all fans all the time.

 

Also, can the helper scripts in sbin be modified to use the "ignore" settings too?

 

Thanks!

Link to comment

Ah, thanks for the explanation of how Clear All affects the Event ID.  I agree, sorting the archive by event id wouldn't really make sense given the ids can be reused.

 

The dashboard changes look great, and the footer works perfectly both locally and over the network!

 

Sorry if this was there yesterday and I didn't notice it, but the Fan Settings page isn't honoring the "ignore" options, it displays all fans all the time.

 

Also, can the helper scripts in sbin be modified to use the "ignore" settings too?

 

Thanks!

For the Dash settings I think I'll just add a complete list of sensors and then you can select which to show.  And still show any sensors that aren't nominal even if not selected.

 

I didn't add the ignore to the fan control because I hadn't explored all the affects on the actual ipmifan script yet.

 

I'll add $@ to the scipts so any command line args are processed along with the network options and ignore to ipmisensors.

Link to comment

oot@Tower:~# ipmi-fru 
FRU Inventory Device: Default FRU Device (ID 00h)
  FRU Error: board info area checksum invalid
  FRU Product Manufacturer Name: ASRockRack

 

The syslog doesn't show any errors.

It's not working because of the checksum error. I'll shorten the grep to Manufacturer instead of Board Manufacturer.  That should fix it for you. But I'd still check into that error.  bmc-device --cold-reset  will reset the bmc as if you unplugged it from the wall. A reflash or factory reset of the BMC may be necessary.

bmc-device --cold reset gave me a syslog entry:
IPMI message handler: BMC returned incorrect response, expected netfn b cmd 20, got netfn 0 cmd 0

 

Also a reset via the ASRock webUI was not working.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.