[ZBX-8899] sensor[] item does not work due to a different /sys/class/hwmon file system layout Created: 2014 Oct 14 Updated: 2017 May 30 Resolved: 2015 Dec 14 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 2.4.1 |
Fix Version/s: | 2.2.11rc1, 2.4.7rc1, 3.0.0alpha3 |
Type: | Incident report | Priority: | Major |
Reporter: | Aleksandrs Saveljevs | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 3 |
Labels: | sensors | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
$ uname -a |
Issue Links: |
|
Description |
Support for sensors on newer Linux kernels was implemented in $ sensors -u | sed -n '10,16 p' coretemp-isa-0000 Adapter: ISA adapter Physical id 0: temp1_input: 35.000 temp1_max: 85.000 temp1_crit: 105.000 temp1_crit_alarm: 0.000 However, the following item does not work, even though documentation gives a similar example: $ sbin/zabbix_agentd -t 'sensor[coretemp-isa-0000,temp1]' sensor[coretemp-isa-0000,temp1] [m|ZBX_NOTSUPPORTED] [Cannot obtain sensor information.] |
Comments |
Comment by Aleksandrs Saveljevs [ 2014 Oct 14 ] |
According to Igors' comment in
However, this is different on my machine: $ find /sys/class/hwmon/hwmon1/ | sort /sys/class/hwmon/hwmon1/ /sys/class/hwmon/hwmon1/device /sys/class/hwmon/hwmon1/name /sys/class/hwmon/hwmon1/power /sys/class/hwmon/hwmon1/power/async /sys/class/hwmon/hwmon1/power/autosuspend_delay_ms /sys/class/hwmon/hwmon1/power/control /sys/class/hwmon/hwmon1/power/runtime_active_kids /sys/class/hwmon/hwmon1/power/runtime_active_time /sys/class/hwmon/hwmon1/power/runtime_enabled /sys/class/hwmon/hwmon1/power/runtime_status /sys/class/hwmon/hwmon1/power/runtime_suspended_time /sys/class/hwmon/hwmon1/power/runtime_usage /sys/class/hwmon/hwmon1/subsystem /sys/class/hwmon/hwmon1/temp1_crit /sys/class/hwmon/hwmon1/temp1_crit_alarm /sys/class/hwmon/hwmon1/temp1_input /sys/class/hwmon/hwmon1/temp1_label /sys/class/hwmon/hwmon1/temp1_max /sys/class/hwmon/hwmon1/temp2_crit /sys/class/hwmon/hwmon1/temp2_crit_alarm /sys/class/hwmon/hwmon1/temp2_input /sys/class/hwmon/hwmon1/temp2_label /sys/class/hwmon/hwmon1/temp2_max /sys/class/hwmon/hwmon1/temp3_crit /sys/class/hwmon/hwmon1/temp3_crit_alarm /sys/class/hwmon/hwmon1/temp3_input /sys/class/hwmon/hwmon1/temp3_label /sys/class/hwmon/hwmon1/temp3_max /sys/class/hwmon/hwmon1/temp4_crit /sys/class/hwmon/hwmon1/temp4_crit_alarm /sys/class/hwmon/hwmon1/temp4_input /sys/class/hwmon/hwmon1/temp4_label /sys/class/hwmon/hwmon1/temp4_max /sys/class/hwmon/hwmon1/temp5_crit /sys/class/hwmon/hwmon1/temp5_crit_alarm /sys/class/hwmon/hwmon1/temp5_input /sys/class/hwmon/hwmon1/temp5_label /sys/class/hwmon/hwmon1/temp5_max /sys/class/hwmon/hwmon1/uevent It can be seen that *_input files are located directly inside /sys/class/hwmon/hwmon1, even though they are expected to be found in /sys/class/hwmon/hwmon1/device. |
Comment by Oleksii Zagorskyi [ 2015 Jul 11 ] |
Note - duplicating |
Comment by Trever L. Adams [ 2015 Jul 12 ] |
Per https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface lines 33-42: Each chip gets its own directory in the sysfs /sys/devices tree. To Up to lm-sensors 3.0.0, libsensors looks for hardware monitoring attributes lines 79-91 are also interesting. So, I cannot tell you when things changed, but there is your documentation proving where the name file can/should be. |
Comment by Trever L. Adams [ 2015 Jul 12 ] |
As an aside, https://bugs.launchpad.net/ubuntu/+source/conky/+bug/435571 covers |
Comment by Aleksandrs Saveljevs [ 2015 Jul 13 ] |
Thank you, Trever! These are useful references. |
Comment by Volker Fröhlich [ 2015 Aug 17 ] |
I've been looking at lm_sensor 3.4.0's lib/sysfs.c sysfs_read_attr(). The function is almost the same as your's, except that it takes two arguments: *device and *attr, not only *device. The following code clearly shows what their approach is: 820 /* Get the adapter name from the classdev "name" attribute 821 * (Linux 2.6.20 and later). If it fails, fall back to 822 * the device "name" attribute (for older kernels). */ 823 entry.adapter = sysfs_read_attr(path, "name"); 824 if (!entry.adapter) 825 entry.adapter = sysfs_read_attr(path, "device/name"); So, all you have to do is to put the same logic into either your function or the invocation part. |
Comment by Glebs Ivanovskis (Inactive) [ 2015 Aug 21 ] |
I'm afraid in his comment Igor misinterpreted lines 33-42 of https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface and Trever is gently pushing us to acknowledge this fact. "Physical" sensor will actually have two paths, for example on my machine I have:
Both have the same contents and both have device folder in them, which actually points to ../../../coretemp.0. I guess "virtual" sensor will have it's own path is /sys/class but no entry in /sys/devices (or /sys/devices/platform, not sure what is more precise). The correct way in my opinion would be checking /sys/class/hwmon/hwmon* for name and all other files and ignoring /sys/class/hwmon/hwmon*/device folder. It would be interesting to find out why our approach worked previously. Is /sys/class/hwmon/hwmon*/device a relatively new addition? |
Comment by Glebs Ivanovskis (Inactive) [ 2015 Aug 28 ] |
Fixed for version 2.2 in development branch svn://svn.zabbix.com/branches/dev/ZBX-8899-22 r55208. As volter suggested, we will now check both /sys/class/hwmon/hwmon* and /sys/class/hwmon/hwmon*/device for sensor name attribute and look for other attributes in the same place. (1) As a side effect of work on this fix we dug deeper into sensor interface specification and found out how complicated it is. For example, zabbix can report voltages if you tell him sensor[chip-name,in*] but sometimes these values need scaling, sometimes not even device-model-specific scaling but device-specimen-specific scaling (see http://lm-sensors.org/wiki/VoltageLabelsAndScaling). This aspect should be mentioned in documentation. glebs.ivanovskis Added footnote about voltage scaling in:
Unfortunately, lm-sensors.org seems to be down, provided link directly to sysfs specification. Please review. RESOLVED sasha CLOSED |
Comment by Andris Zeila [ 2015 Sep 02 ] |
(2) An output parameter was added to sysfs_read_attr() function which points to a static buffer: static char *sysfs_read_attr(const char *device, char *attr_location) We should either use dynamic buffer or pass the buffer size as additional parameter: static char *sysfs_read_attr(const char *device, char **attr_location) // or static char *sysfs_read_attr(const char *device, char *attr_location, size_t attr_location_size) glebs.ivanovskis RESOLVED in r55837, see details in (3). wiper CLOSED |
Comment by Andris Zeila [ 2015 Sep 03 ] |
(3) sysfs_read_attr() is used in two places - first to get device name and location, then to get bus name. So now it can return bus name where previously returned NULL and agent might incorrectly classify sensor's bus. glebs.ivanovskis I accepted the patch you suggested to check different possible locations of sensor attributes in a more elegant way. I just made char *locations[] array global and sysfs_read_attr() wil now return pointer of locations element where it found attributes. The attribute itself is passed to caller in second sysfs_read_attr() parameter. In the second place where sysfs_read_attr() was called we now make sure that subfolder we found attribute in is "/device". wiper CLOSED, please review changes in r55861 |
Comment by Glebs Ivanovskis (Inactive) [ 2015 Sep 30 ] |
Taking in mind what was said in (1) about driver complexity I would strongly encourage users interested in sensor monitoring on Linux (as well as developers interested in easier respective code maintenance ) to vote for ZBXNEXT-2929. |
Comment by Andris Zeila [ 2015 Sep 30 ] |
Successfully tested |
Comment by Glebs Ivanovskis (Inactive) [ 2015 Sep 30 ] |
Fixed in pre-2.2.11rc1 r55871. Reviewed and accepted changes in r55861. Please review conflict resolution in r55873. wiper The merge looks ok, but please check (4) glebs.ivanovskis (4) Fixed in pre-2.2.11rc1 r55896, pre-2.4.7rc1 r55897, pre-3.0.0alpha3 (trunk) r55898. |
Comment by Andris Zeila [ 2015 Oct 01 ] |
(4) Problem found by coverity: if (0 > dev_len) { /* No device link? Treat device as virtual */ err = get_device_info(devicepath, NULL, device_info, &subfolder); } else { deviced[dev_len] = '\0'; device_p = strrchr(deviced, '/') + 1; if (0 == strcmp(device, device_p)) { zbx_snprintf(device_info, sizeof(device_info), "%s", device); err = SUCCEED; } else { err = get_device_info(devicepath, device_p, device_info, &subfolder); } } if (SUCCEED == err && 0 == strcmp(device_info, device)) { // CID 130939: Memory - illegal accesses (UNINIT) // Using uninitialized value "subfolder" when calling "__zbx_zbx_snprintf". zbx_snprintf(devicepath, sizeof(devicepath), "%s/%s%s", DEVICE_DIR, deviceent->d_name, subfolder); Indeed if dev_len > 0 and device matches device_p then subfolder is not initialized. Apparently it can happen as we have code handling this situation. No idea what the subfolder should be though. glebs.ivanovskis In case of device matching device_p user provided us actual device name instead of name printed by sensors -u command. $ zabbix_agentd -t 'sensor[f8000-isa-0a00,in0]' sensor[f8000-isa-0a00,in0] [d|1.688000] $ zabbix_agentd -t 'sensor[f71882fg.2560,in0]' sensor[f71882fg.2560,in0] [d|1.688000] $ zabbix_agentd -t 'sensor[coretemp-isa-0000,temp,avg]' sensor[coretemp-isa-0000,temp,avg] [d|30.000000] $ zabbix_agentd -t 'sensor[coretemp.0,temp,avg]' sensor[coretemp.0,temp,avg] [d|30.000000] We simply need to initialize subfolder properly like we do usually because it can be either "" or "/device". Fixed in revision 55881. RESOLVED wiper CLOSED |
Comment by Andris Zeila [ 2015 Oct 01 ] |
Succesfully tested |
Comment by Glebs Ivanovskis (Inactive) [ 2015 Oct 13 ] |
(5) Documented in What's new:
Added footnote about voltage scaling (subissue (1)) in:
wiper reviewed, though not sure we should add 'simple' bugfixes to what's new zalex_ua I'm completely agree with wiper, we do not mention such bug fixes on whatsnew pages. sasha REOPENED according to remarks glebs.ivanovskis Removed from What's new. RESOLVED sasha CLOSED |
Comment by Oleksii Zagorskyi [ 2015 Dec 01 ] |
It caused a regression in |