[ZBXNEXT-300] Add support for ipmi discrete sensor values Created: 2010 Apr 15  Updated: 2013 Jul 23  Resolved: 2013 Apr 02

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: None
Affects Version/s: 1.8.2
Fix Version/s: 2.1.0

Type: Change Request Priority: Major
Reporter: Mark Carbonaro Assignee: Unassigned
Resolution: Fixed Votes: 21
Labels: ipmi
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File IBM_System_x3550M2_sdr_elist_all.txt     Text File IBM_System_x3550M2_sensor_list_all.txt     Text File IBM_System_x3550_IPMI_sensors.txt     Text File ProLiant-BL460c-G7-sensors-hpblade.txt     Text File ProLiant-BL685c-G7-sensors-sdr-hpblade.txt     Text File ProLiant_BL460c_G1_sdr_elist_all.txt     Text File ProLiant_DL360_G7_sdr_elist_all.txt     Text File ProLiant_DL360_G7_sensor_list_all.txt     Text File ProLiant_DL360_G7_v_sdr_elist_all.txt     Text File ProLiant_DL360_G7_v_sensor_list_all.txt     Text File checks_ipmi.c.patch    
Issue Links:
Duplicate
is duplicated by ZBX-6812 [IPMI agent] Error while getting valu... Reopened

 Description   

At present discrete sensor values are not supported and are needed to monitor drive failure information on most Sun x86 servers.

See this forum threat for some information: http://zabbix.com/forum/showthread.php?t=10762



 Comments   
Comment by Mark Carbonaro [ 2010 Apr 15 ]

Here is the patch from the forum thread that has been updated to apply to 1.8.2 although I don't think it is a complete and proper implementation, but unfortunately I don't poses the skills to fix it up.

Comment by Andy Lubel [ 2010 May 12 ]

We would prefer to monitor the discrete results that show things like whether the service LED is on or there is a fault rather than have 30+ objects monitored per physical server and the trigger complexity (each systems critical thresholds are different), we can read this discrete data universally on every sun system with recent ILOM.

example on a sun server:
PSx/PRSNT
PSx/VINOK
PSx/PWROK

Comment by Alex Deiter [ 2010 May 18 ]

Thanks a lot for patch!

To Alexei: could you please review this patch ?

Patch successfully tested on Sun Fire x4150/x4200/x4450:

14722:20100518:135325.027 In init_ipmi_host([10.1.1.2]:623)
14722:20100518:135325.027 In get_ipmi_host([10.1.1.2]:623)
14722:20100518:135325.027 In get_ipmi_sensor_by_name() FB0/FM0/PRSNT@[10.1.1.2]:623
14722:20100518:135325.027 In read_ipmi_sensor() FB0/FM0/PRSNT@[10.1.1.2]:623
14722:20100518:135325.195 In get_ipmi_sensor()

> sensor info test(30.0)
Sensor
Name: test(30.0).FB0/FM0/PRSNT
LUN: 0
Number: 161
Event Reading Type: 8
Event Reading Type Name: discrete_device_presense
Type: 37
Type Name: entity_presense
Event Support: per state
Init Scanning: true
Init Events: true
Init Thresholds: false
Init Hysteresis: false
Init Type: true
Init Power Up Events: true
Init Power Up Scanning: true
Ignore If No Entity: false
Auto Rearm: true
OEM1: 0
Id: FB0/FM0/PRSNT
Event
Offset: 0
Name: device removed/absent
Supports: assertion
Event
Offset: 1
Name: device inserted/present
Supports: assertion
>

Comment by Thomas Lohmüller [ 2011 Nov 14 ]

I tried to monitor the Service LED on HP servers. It's also a discrete value and does not work. Is there anything going on here? It's still not fixed in 1.8.8.

Comment by Oleksii Zagorskyi [ 2012 Jan 19 ]

An example of a sensor list. FAN sensors work well, PS Status sensor not.
root@zabbix:~# ipmitool -U ADMIN -H 10.10.10.10 -I lanplus -L user sensor list
Password:
FAN 1 | 5476.000 | RPM | ok | 400.000 | 576.000 | 784.000 | 33856.000 | 34225.000 | 34596.000
... trim
FAN 8 | na | RPM | na | na | na | na | na | na | na
CPU1 Vcore | 0.872 | Volts | ok | 0.776 | 0.800 | 0.824 | 1.352 | 1.376 | 1.400
... trim
VBAT | 3.216 | Volts | ok | 2.880 | 2.904 | 2.928 | 3.648 | 3.672 | 3.696
CPU1 Temp | 0x0 | discrete | 0x0000| na | na | na | na | na | na
CPU2 Temp | 0x0 | discrete | 0x0000| na | na | na | na | na | na
System Temp | 30.000 | degrees C | ok | -9.000 | -7.000 | -5.000 | 75.000 | 77.000 | 79.000
P1-DIMM1A | 37.000 | degrees C | ok | -9.000 | -7.000 | -5.000 | 65.000 | 70.000 | 75.000
... trim
Chassis Intru | 0x0 | discrete | 0x0000| na | na | na | na | na | na
PS Status | 0x1 | discrete | 0x01ff| na | na | na | na | na | na

Comment by Jc Duss [ 2012 Apr 17 ]

Hi all,

Is this ticket still in development and what is the release target please?

Thank you for your work on it.

JC.

Comment by Oleksii Zagorskyi [ 2012 Apr 17 ]

As I understood the implementation of this feature turned out to be absolutely NOT trivial.
It requires to find out a model/approach and then complex development.

Comment by richlv [ 2012 Jul 24 ]

i thought i described it somewhere, but just can't find it... so the situation is like this.

discrete sensors may return a value where any bit can be meaningful. so first bit may be some status of the first psu, second of the second one etc. or maybe one is intrusion sensor, another is humidity sensor etc.

patch returns position of the first set value (0001000011101100 would get you 4). but that would completely ignore the remaining set bits.

possible approaches and their drawbacks :

a) just allow to store discrete values in text items, and write triggers against them matching strings. triggers become very hard to manage, there is no separate history for individual bits and there is no possibility to graph this (up/down), in the history there is no way to figure out what's what (without knowing bits by heart)

b) for each discrete sensor auto-create individual items for each bit. trigger functions are the same, history for each bit is separate, graphing possible, value mapping can be set up and history is easy to use. but if all you need is a single bit from the discrete value, this is an overkill.

maybe the solution is some mix (or both) of the above, but that's far from trivial at that point

<Sasha>
c) just allow to store discrete values in numeric items and add support bitwise operators in trigger expressions. Not possible graphing and value mapping.

d) or add preprocessing for numeric items and store only useful bits.

<richlv> as per openipmi docs cited below, "each bit may represent the initialization state of a piece of software". in this case the suggested patch that just returns first set bit would be best, so it would be highly desirable to have such functionality as an option

Comment by richlv [ 2012 Jul 25 ]

http://openipmi.sourceforge.net/IPMI.pdf has some details as well. citing it :

Sensor monitor something about an object. IPMI defines many types of sensors, but groups them into two
main categories: Threshold and discrete. Threshold sensors are “analog”, they have continuous (or mostly
continuous) readings. Things like fans speed, voltage, or temperature.
Discrete sensors have a set of binary readings that may each be independently zero or one. In some
sensors, these may be independent. For instance, a power supply may have both an external power failure
and a predictive failure at the same time. In other cases they may be mutually exclusive. For instance, each
bit may represent the initialization state of a piece of software.

Discrete sensors report their readings in a 16-bit bitmask, each bit generally representing a discrete state.
For instance, consider the slot/connector sensor. Bit 0 tells if there is a fault. Bit 2 tells if a device is present
in the slot. Bit 5 tells if power is off on the slot. Each bit tells a completely independent state and they may
each be zero or one independently.

Comment by Oleksii Zagorskyi [ 2012 Aug 10 ]

bitwise operators in triggers requested in ZBXNEXT-1329

Comment by Andris Mednis [ 2013 Jan 11 ]

Document at http://openipmi.sourceforge.net/IPMI.pdf is from February 10, 2006.
To get a newer version download the latest source code archive from http://sourceforge.net/projects/openipmi/. For example, OpenIPMI-2.0.20-rc1/doc/IPMI.pdf is from July 2, 2012.

Comment by Andris Mednis [ 2013 Jan 15 ]

While setting up a Zabbix server (from trunk/) test system on Debian GNU/Linux 64-bit testing ("Wheezy"), I noticed that:

  • using Zabbix with Debian OpenIPMI 2.0.16-1.3 packages produces error:
    11206:20130115:112404.996 In setup_done() phost:0x2784470 host:'[xxx.xx.xx.xx]:623'
    11206:20130115:112404.996 setup_done() fail: [33554441] Unknown error 33554441
    11206:20130115:112404.996 End of setup_done():NETWORK_ERROR
    11206:20130115:112404.996 In domain_closed() phost:0x2784470 host:'[xxx.xxx.xx.xx]:623'
    11206:20130115:112404.996 End of domain_closed()
    11206:20130115:112404.996 End of init_ipmi_host():0x2784470
    11206:20130115:112404.996 Item [xxxxxxx:baseboard_temp] error: cannot connect to IPMI host: [33554441] Unknown error 33554441
  • using Zabbix with locally compiled OpenIPMI-2.0.19 works fine (with analog sensors).
Comment by Andris Mednis [ 2013 Feb 07 ]

1) Can you tell which discrete sensors are the most important to support in Zabbix ?

2) Things like "Cooling/Fan fault detected", "Drive Fault" do not show up under sensors, they belong to "chassis status". Should they be supported as if they were discrete sensors ?

Comment by Sergey Syreskin [ 2013 Feb 08 ]

Hi Andris,

Here is a list of sensors for IBM SystemX 3550 M2, which we use in our production environment. We definitely need to know about power supply and power supply fans failures. Then I think, Host Power, One of PCI Error, One of the DIMMs, One of the CPUs, Sys Board Fault, CPU 1 OverTemp, CPU 2 OverTemp.

Regards,
Sergey

Comment by Andris Mednis [ 2013 Feb 08 ]

Thanks, Sergey!

Your list of IBM SystemX sensors is very helpful as an example.
Can You also send an output of "ipmitool .... sdr elist all" ?

Does anybody have an examples of "ipmitool ... sensor list all" and "ipmitool .... sdr elist all" on a blade enclosure with multiple blades ?

Best regards,
Andris

Comment by Sergey Syreskin [ 2013 Feb 08 ]

This is not a blade server, however I upload both files, just in case.

andris Thanks! You helped me to understand more.

Comment by Tomasz Pawelczak [ 2013 Feb 11 ]

Ipmi sensors info for ProLiant BL460c G7 blades. For HP only blades have ipmi sensors - not whole chassis.

See ProLiant-BL460c-G7-sensors-hpblade.txt

Comment by Andrej Kacian [ 2013 Feb 11 ]

Attaching also sdr+sensors listing for a HP ProLiant 685c G7 blade.

Comment by Sergey Syreskin [ 2013 Feb 11 ]

Added ipmitool output for HP ProLiant DL360 G7.

Comment by Andris Mednis [ 2013 Feb 11 ]

Thanks for samples!

Comment by Andris Mednis [ 2013 Feb 14 ]

Hi!
Initial draft of specifications at https://www.zabbix.org/wiki/Docs/specs/ZBXNEXT-300.

Trunk-based development branch
svn://svn.zabbix.com/branches/dev/ZBXNEXT-300-trunk
contains work done so far (reading discrete sensors, bitwise function AND for testing bits in trigger expressions).
Waiting for your feedback...

Comment by Oleksii Zagorskyi [ 2013 Feb 15 ]

I think that new trigger function name (and) is not the best.
Maybe we could select something more intuitively clear ? For example "bit" or "bitwise" or something similar.

Maybe then mode (AND|OR) could be additional function parameter ? (gust an idea)

Also 2nd function parameter (sec or #num) should be optional as well.

Wouldn't it be more "readable" for users to use 1st function parameter in binary form ?
i.e. binary "00001000" instead of decimal "8"

Also a phrase "Sensor "Power Unit Stat" has "Event/Reading Type Code" 0x6f and "Sensor Type Code" 0x9" is not clear to me, I mean it contains twice "Sensor Type Code" with different values. Is this ok ?

Comment by Andris Mednis [ 2013 Feb 15 ]

How about using bitwise function names as in Lua: band(), bnot(), bor(), bxor() ?
bitwise(and, 1, #1) instead of band(1, #1) ? I don't know...

Ok, let's make 2nd function parameter (sec or #num) optional.

Is binary "00001000" sufficient ? Or somebody would want "0x08", too ?

"Event/Reading Type Code" and "Sensor Type Code" are two different codes which characterize a sensor.
Examples of "Event/Reading Type Code": 01h = Threshold (a.k.a. analog), 02h = Discrete(DMI-based “Usage State” STATES), 03h, ...06h= ‘digital’ Discrete(DIGITAL/DISCRETE EVENT STATES),...
Examples of "Sensor Type Code": 01h=Temperature, 02h=Voltage, 03h=Current, 04h=Fan, 05h=Physical Security (Chassis Intrusion), 06=Platform Security
Violation Attempt, 07h=Processor, 08h=Power Supply, ...

Comment by Oleksii Zagorskyi [ 2013 Feb 15 ]

Idea about band(), bnot(), bor(), bxor() looks good, in any case it's much better than just and().

If we would select "bitwise(and, 1, #1)" style then I'd set MODE (and|or) to not 1st position, but to 2nd or last, and there should be defined default value for MODE ("and" probably).
Then most often used function will look more readable, simple, say like: "bitwise(00001000)"

Please ignore my comment about "Sensor Type Code", i was wrong, that phrase in the spec is just a bit complicated.

Comment by Felipe de Moura Vieira [ 2013 Mar 08 ]

An output of ipmitool for a Proliant BL460C G1

Comment by Andris Mednis [ 2013 Mar 08 ]

Thanks, Felipe!
One of sensors is "Virtual Fan"

Comment by Andris Mednis [ 2013 Mar 14 ]

Hi!
You are welcome to read specifications at https://www.zabbix.org/wiki/Docs/specs/ZBXNEXT-300
and test development branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-300-trunk
for:

  • support of discrete sensors,
  • a new function "band()" (bitwise AND) for trigger expressions and calculated items,
  • function "count()" with added operator "band" for trigger expressions and calculated items.

Known problems so far:

  • frontend's "Expression constructor" does not yet support entering "band" operator for count() function.
    Enter expressions without constructor.
  • documentation is not updated. Use specifications at
    https://www.zabbix.org/wiki/Docs/specs/ZBXNEXT-300
    as current documentation and see examples below.

As IPMI discrete sensors change their state rarely, I took a frequently changing item "system.cpu.switches" as an input for testing "band()" and "count()".
Examples I used for testing:

  • calculated items:
    band("system.cpu.switches",#1,18446744073709551360,0) <--- a 64 bit mask to set 8 least significant bits to "0".
    band("system.cpu.switches",#1,1) <--- a value "AND-ed" with mask 1 gives the least significant bit of value.
    band("system.cpu.switches",#1,1,1d)
    band("system.cpu.switches",#100,1,1d)
    band("system.cpu.switches",0,1)
    band("system.cpu.switches",0,1,1d)
    band("system.cpu.switches",100,1)
    band("system.cpu.switches",100,1,1d)
    count("system.cpu.switches",600,0,band) <--- mask 0 "matches" any value. Same as count("system.cpu.switches",600).
    count("system.cpu.switches",600,1,band) <--- how many values in last 10 minutes have "1" in the least significant bit.
    count("system.cpu.switches",600,1,band,60)
    count("system.cpu.switches",600,2/3,band) <--- how many values in last 10 minutes have "10" in 2 least significant bits.
    <--- Bitwise AND with mask 3 sets all bits to "0" except the 2 least significant bits.
    <--- The result is compared to 2 ("10" in binary).
  • trigger expressions:
    {localhost1:system.cpu.switches.count(0,6/7,band)}

    >0 <--- Does the last value contain "110" in the 3 least significant bits ?

    {test_db:Power_Unit_Stat.band(#1,1)}

    =1 <--- Does the last reading of discrete sensor Power_Unit_Stat have "1" in the least significant bit ?

    {test_db:Power_Unit_Stat.band(#1,1,1d)}

    =1

    {test_db:Power_Unit_Stat.band(#20,1)}

    =1

    {test_db:Power_Unit_Stat.band(#20,1,1d)}

    =1

    {test_db:Power_Unit_Stat.band(0,1)}

    =1

    {test_db:Power_Unit_Stat.band(0,1,1d)}

    =1

    {test_db:Power_Unit_Stat.band(20,1)}

    =1

    {test_db:Power_Unit_Stat.band(20,1,1d)}

    =1

Waiting for your thoughts...

Comment by Andris Mednis [ 2013 Mar 14 ]

(1) Documented a new function "band" and a new operator "band" for function "count" in
https://www.zabbix.com/documentation/2.2/manual/appendix/triggers/functions

<richlv> apparently some discrete sensor related documentation has also been added to https://www.zabbix.com/documentation/2.2/manual/config/items/itemtypes/ipmi - anywhere else ?
still missing - whatsnew

andris "whatsnew" documentation added at https://www.zabbix.com/documentation/2.2/manual/introduction/whatsnew220?&#support_for_ipmi_discrete_sensors

<richlv> thanks - i reordered it a bit and added a couple of links - please check that it still looks ok; if so, this can be closed

andris Thanks. CLOSED

Comment by Toms (Inactive) [ 2013 Mar 19 ]

Fronted ready for testing in development branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-300-trunk r34462

Comment by Andris Mednis [ 2013 Mar 22 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-300-trunk

Comment by Andris Zeila [ 2013 Mar 28 ]

Successfully tested

andris Thanks! Proposed changes accepted.

Comment by Andris Mednis [ 2013 Mar 28 ]

Fixed in version pre-2.1.0 rev.34705

Comment by richlv [ 2013 Apr 02 ]

reopen to close subissue

Generated at Thu Apr 25 19:23:24 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.