[ZBXNEXT-300] Add support for ipmi discrete sensor values Created: 2010 Apr 15 Updated: 2013 Jul 23 Resolved: 2013 Apr 02 |
|
Status: | Closed |
Project: | ZABBIX FEATURE REQUESTS |
Component/s: | None |
Affects Version/s: | 1.8.2 |
Fix Version/s: | 2.1.0 |
Type: | Change Request | Priority: | Major |
Reporter: | Mark Carbonaro | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 21 |
Labels: | ipmi | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: | IBM_System_x3550M2_sdr_elist_all.txt IBM_System_x3550M2_sensor_list_all.txt IBM_System_x3550_IPMI_sensors.txt ProLiant-BL460c-G7-sensors-hpblade.txt ProLiant-BL685c-G7-sensors-sdr-hpblade.txt ProLiant_BL460c_G1_sdr_elist_all.txt ProLiant_DL360_G7_sdr_elist_all.txt ProLiant_DL360_G7_sensor_list_all.txt ProLiant_DL360_G7_v_sdr_elist_all.txt ProLiant_DL360_G7_v_sensor_list_all.txt checks_ipmi.c.patch | ||||||||
Issue Links: |
|
Description |
At present discrete sensor values are not supported and are needed to monitor drive failure information on most Sun x86 servers. See this forum threat for some information: http://zabbix.com/forum/showthread.php?t=10762 |
Comments |
Comment by Mark Carbonaro [ 2010 Apr 15 ] |
Here is the patch from the forum thread that has been updated to apply to 1.8.2 although I don't think it is a complete and proper implementation, but unfortunately I don't poses the skills to fix it up. |
Comment by Andy Lubel [ 2010 May 12 ] |
We would prefer to monitor the discrete results that show things like whether the service LED is on or there is a fault rather than have 30+ objects monitored per physical server and the trigger complexity (each systems critical thresholds are different), we can read this discrete data universally on every sun system with recent ILOM. example on a sun server: |
Comment by Alex Deiter [ 2010 May 18 ] |
Thanks a lot for patch! To Alexei: could you please review this patch ? Patch successfully tested on Sun Fire x4150/x4200/x4450: 14722:20100518:135325.027 In init_ipmi_host([10.1.1.2]:623) > sensor info test(30.0) |
Comment by Thomas Lohmüller [ 2011 Nov 14 ] |
I tried to monitor the Service LED on HP servers. It's also a discrete value and does not work. Is there anything going on here? It's still not fixed in 1.8.8. |
Comment by Oleksii Zagorskyi [ 2012 Jan 19 ] |
An example of a sensor list. FAN sensors work well, PS Status sensor not. |
Comment by Jc Duss [ 2012 Apr 17 ] |
Hi all, Is this ticket still in development and what is the release target please? Thank you for your work on it. JC. |
Comment by Oleksii Zagorskyi [ 2012 Apr 17 ] |
As I understood the implementation of this feature turned out to be absolutely NOT trivial. |
Comment by richlv [ 2012 Jul 24 ] |
i thought i described it somewhere, but just can't find it... so the situation is like this. discrete sensors may return a value where any bit can be meaningful. so first bit may be some status of the first psu, second of the second one etc. or maybe one is intrusion sensor, another is humidity sensor etc. patch returns position of the first set value (0001000011101100 would get you 4). but that would completely ignore the remaining set bits. possible approaches and their drawbacks : a) just allow to store discrete values in text items, and write triggers against them matching strings. triggers become very hard to manage, there is no separate history for individual bits and there is no possibility to graph this (up/down), in the history there is no way to figure out what's what (without knowing bits by heart) b) for each discrete sensor auto-create individual items for each bit. trigger functions are the same, history for each bit is separate, graphing possible, value mapping can be set up and history is easy to use. but if all you need is a single bit from the discrete value, this is an overkill. maybe the solution is some mix (or both) of the above, but that's far from trivial at that point <Sasha> d) or add preprocessing for numeric items and store only useful bits. <richlv> as per openipmi docs cited below, "each bit may represent the initialization state of a piece of software". in this case the suggested patch that just returns first set bit would be best, so it would be highly desirable to have such functionality as an option |
Comment by richlv [ 2012 Jul 25 ] |
http://openipmi.sourceforge.net/IPMI.pdf has some details as well. citing it : Sensor monitor something about an object. IPMI defines many types of sensors, but groups them into two Discrete sensors report their readings in a 16-bit bitmask, each bit generally representing a discrete state. |
Comment by Oleksii Zagorskyi [ 2012 Aug 10 ] |
bitwise operators in triggers requested in |
Comment by Andris Mednis [ 2013 Jan 11 ] |
Document at http://openipmi.sourceforge.net/IPMI.pdf is from February 10, 2006. |
Comment by Andris Mednis [ 2013 Jan 15 ] |
While setting up a Zabbix server (from trunk/) test system on Debian GNU/Linux 64-bit testing ("Wheezy"), I noticed that:
|
Comment by Andris Mednis [ 2013 Feb 07 ] |
1) Can you tell which discrete sensors are the most important to support in Zabbix ? 2) Things like "Cooling/Fan fault detected", "Drive Fault" do not show up under sensors, they belong to "chassis status". Should they be supported as if they were discrete sensors ? |
Comment by Sergey Syreskin [ 2013 Feb 08 ] |
Hi Andris, Here is a list of sensors for IBM SystemX 3550 M2, which we use in our production environment. We definitely need to know about power supply and power supply fans failures. Then I think, Host Power, One of PCI Error, One of the DIMMs, One of the CPUs, Sys Board Fault, CPU 1 OverTemp, CPU 2 OverTemp. Regards, |
Comment by Andris Mednis [ 2013 Feb 08 ] |
Thanks, Sergey! Your list of IBM SystemX sensors is very helpful as an example. Does anybody have an examples of "ipmitool ... sensor list all" and "ipmitool .... sdr elist all" on a blade enclosure with multiple blades ? Best regards, |
Comment by Sergey Syreskin [ 2013 Feb 08 ] |
This is not a blade server, however I upload both files, just in case. andris Thanks! You helped me to understand more. |
Comment by Tomasz Pawelczak [ 2013 Feb 11 ] |
Ipmi sensors info for ProLiant BL460c G7 blades. For HP only blades have ipmi sensors - not whole chassis. See ProLiant-BL460c-G7-sensors-hpblade.txt |
Comment by Andrej Kacian [ 2013 Feb 11 ] |
Attaching also sdr+sensors listing for a HP ProLiant 685c G7 blade. |
Comment by Sergey Syreskin [ 2013 Feb 11 ] |
Added ipmitool output for HP ProLiant DL360 G7. |
Comment by Andris Mednis [ 2013 Feb 11 ] |
Thanks for samples! |
Comment by Andris Mednis [ 2013 Feb 14 ] |
Hi! Trunk-based development branch |
Comment by Oleksii Zagorskyi [ 2013 Feb 15 ] |
I think that new trigger function name (and) is not the best. Maybe then mode (AND|OR) could be additional function parameter ? (gust an idea) Also 2nd function parameter (sec or #num) should be optional as well. Wouldn't it be more "readable" for users to use 1st function parameter in binary form ? Also a phrase "Sensor "Power Unit Stat" has "Event/Reading Type Code" 0x6f and "Sensor Type Code" 0x9" is not clear to me, I mean it contains twice "Sensor Type Code" with different values. Is this ok ? |
Comment by Andris Mednis [ 2013 Feb 15 ] |
How about using bitwise function names as in Lua: band(), bnot(), bor(), bxor() ? Ok, let's make 2nd function parameter (sec or #num) optional. Is binary "00001000" sufficient ? Or somebody would want "0x08", too ? "Event/Reading Type Code" and "Sensor Type Code" are two different codes which characterize a sensor. |
Comment by Oleksii Zagorskyi [ 2013 Feb 15 ] |
Idea about band(), bnot(), bor(), bxor() looks good, in any case it's much better than just and(). If we would select "bitwise(and, 1, #1)" style then I'd set MODE (and|or) to not 1st position, but to 2nd or last, and there should be defined default value for MODE ("and" probably). Please ignore my comment about "Sensor Type Code", i was wrong, that phrase in the spec is just a bit complicated. |
Comment by Felipe de Moura Vieira [ 2013 Mar 08 ] |
An output of ipmitool for a Proliant BL460C G1 |
Comment by Andris Mednis [ 2013 Mar 08 ] |
Thanks, Felipe! |
Comment by Andris Mednis [ 2013 Mar 14 ] |
Hi!
Known problems so far:
As IPMI discrete sensors change their state rarely, I took a frequently changing item "system.cpu.switches" as an input for testing "band()" and "count()".
Waiting for your thoughts... |
Comment by Andris Mednis [ 2013 Mar 14 ] |
(1) Documented a new function "band" and a new operator "band" for function "count" in <richlv> apparently some discrete sensor related documentation has also been added to https://www.zabbix.com/documentation/2.2/manual/config/items/itemtypes/ipmi - anywhere else ? andris "whatsnew" documentation added at https://www.zabbix.com/documentation/2.2/manual/introduction/whatsnew220?&#support_for_ipmi_discrete_sensors <richlv> thanks - i reordered it a bit and added a couple of links - please check that it still looks ok; if so, this can be closed andris Thanks. CLOSED |
Comment by Toms (Inactive) [ 2013 Mar 19 ] |
Fronted ready for testing in development branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-300-trunk r34462 |
Comment by Andris Mednis [ 2013 Mar 22 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-300-trunk |
Comment by Andris Zeila [ 2013 Mar 28 ] |
Successfully tested andris Thanks! Proposed changes accepted. |
Comment by Andris Mednis [ 2013 Mar 28 ] |
Fixed in version pre-2.1.0 rev.34705 |
Comment by richlv [ 2013 Apr 02 ] |
reopen to close subissue |