[ZBX-8899] sensor[] item does not work due to a different /sys/class/hwmon file system layout Created: 2014 Oct 14  Updated: 2017 May 30  Resolved: 2015 Dec 14

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 2.4.1
Fix Version/s: 2.2.11rc1, 2.4.7rc1, 3.0.0alpha3

Type: Incident report Priority: Major
Reporter: Aleksandrs Saveljevs Assignee: Unassigned
Resolution: Fixed Votes: 3
Labels: sensors
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

$ uname -a
Linux hostname 3.16-2-amd64 #1 SMP Debian 3.16.3-2 (2014-09-20) x86_64 GNU/Linux


Issue Links:
Duplicate
is duplicated by ZBX-9678 /sys/class/hwmon/hwmon0/device/name i... Closed

 Description   

Support for sensors on newer Linux kernels was implemented in ZBX-282. According to https://www.zabbix.com/documentation/2.4/manual/appendix/items/sensor, we should be able to construct proper arguments for the item if we look at the output of sensors -u. For instance:

$ sensors -u | sed -n '10,16 p'
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:
  temp1_input: 35.000
  temp1_max: 85.000
  temp1_crit: 105.000
  temp1_crit_alarm: 0.000

However, the following item does not work, even though documentation gives a similar example:

$ sbin/zabbix_agentd -t 'sensor[coretemp-isa-0000,temp1]'
sensor[coretemp-isa-0000,temp1]               [m|ZBX_NOTSUPPORTED] [Cannot obtain sensor information.]


 Comments   
Comment by Aleksandrs Saveljevs [ 2014 Oct 14 ]

According to Igors' comment in ZBX-282, this is the expected file system layout:

All sensor chips are located in /sys/class/hwmon/hwmon*. If there is no /sys/class/hwmon/hwmon*/device directory, then the device is treated as virtual. In this case, the files are located inside the hwmon* directory, otherwise, inside hwmon*/device directory.

However, this is different on my machine:

$ find /sys/class/hwmon/hwmon1/ | sort
/sys/class/hwmon/hwmon1/
/sys/class/hwmon/hwmon1/device
/sys/class/hwmon/hwmon1/name
/sys/class/hwmon/hwmon1/power
/sys/class/hwmon/hwmon1/power/async
/sys/class/hwmon/hwmon1/power/autosuspend_delay_ms
/sys/class/hwmon/hwmon1/power/control
/sys/class/hwmon/hwmon1/power/runtime_active_kids
/sys/class/hwmon/hwmon1/power/runtime_active_time
/sys/class/hwmon/hwmon1/power/runtime_enabled
/sys/class/hwmon/hwmon1/power/runtime_status
/sys/class/hwmon/hwmon1/power/runtime_suspended_time
/sys/class/hwmon/hwmon1/power/runtime_usage
/sys/class/hwmon/hwmon1/subsystem
/sys/class/hwmon/hwmon1/temp1_crit
/sys/class/hwmon/hwmon1/temp1_crit_alarm
/sys/class/hwmon/hwmon1/temp1_input
/sys/class/hwmon/hwmon1/temp1_label
/sys/class/hwmon/hwmon1/temp1_max
/sys/class/hwmon/hwmon1/temp2_crit
/sys/class/hwmon/hwmon1/temp2_crit_alarm
/sys/class/hwmon/hwmon1/temp2_input
/sys/class/hwmon/hwmon1/temp2_label
/sys/class/hwmon/hwmon1/temp2_max
/sys/class/hwmon/hwmon1/temp3_crit
/sys/class/hwmon/hwmon1/temp3_crit_alarm
/sys/class/hwmon/hwmon1/temp3_input
/sys/class/hwmon/hwmon1/temp3_label
/sys/class/hwmon/hwmon1/temp3_max
/sys/class/hwmon/hwmon1/temp4_crit
/sys/class/hwmon/hwmon1/temp4_crit_alarm
/sys/class/hwmon/hwmon1/temp4_input
/sys/class/hwmon/hwmon1/temp4_label
/sys/class/hwmon/hwmon1/temp4_max
/sys/class/hwmon/hwmon1/temp5_crit
/sys/class/hwmon/hwmon1/temp5_crit_alarm
/sys/class/hwmon/hwmon1/temp5_input
/sys/class/hwmon/hwmon1/temp5_label
/sys/class/hwmon/hwmon1/temp5_max
/sys/class/hwmon/hwmon1/uevent

It can be seen that *_input files are located directly inside /sys/class/hwmon/hwmon1, even though they are expected to be found in /sys/class/hwmon/hwmon1/device.

Comment by Oleksii Zagorskyi [ 2015 Jul 11 ]

Note - duplicating ZBX-9678 suggests a patch.

Comment by Trever L. Adams [ 2015 Jul 12 ]

Per ZBX-9678 you need documentation that my patch is correct and/or the current code is wrong:

https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface

lines 33-42:

Each chip gets its own directory in the sysfs /sys/devices tree. To
find all sensor chips, it is easier to follow the device symlinks from
/sys/class/hwmon/hwmon*.

Up to lm-sensors 3.0.0, libsensors looks for hardware monitoring attributes
in the "physical" device directory. Since lm-sensors 3.0.1, attributes found
in the hwmon "class" device directory are also supported. Complex drivers
(e.g. drivers for multifunction chips) may want to use this possibility to
avoid namespace pollution. The only drawback will be that older versions of
libsensors won't support the driver in question.

lines 79-91 are also interesting.

So, I cannot tell you when things changed, but there is your documentation proving where the name file can/should be.

Comment by Trever L. Adams [ 2015 Jul 12 ]

As an aside, https://bugs.launchpad.net/ubuntu/+source/conky/+bug/435571 covers ZBX-282 and followed the directory recommendations made there (not saying they even knew of the bug, only that they followed the same fix), which was deemed incorrect by ZABBIX SIA and turns out now to be the correct solution.

Comment by Aleksandrs Saveljevs [ 2015 Jul 13 ]

Thank you, Trever! These are useful references.

Comment by Volker Fröhlich [ 2015 Aug 17 ]

I've been looking at lm_sensor 3.4.0's lib/sysfs.c sysfs_read_attr().

The function is almost the same as your's, except that it takes two arguments: *device and *attr, not only *device. The following code clearly shows what their approach is:

820     /* Get the adapter name from the classdev "name" attribute
821      * (Linux 2.6.20 and later). If it fails, fall back to
822      * the device "name" attribute (for older kernels). */
823     entry.adapter = sysfs_read_attr(path, "name");
824     if (!entry.adapter)
825         entry.adapter = sysfs_read_attr(path, "device/name");

So, all you have to do is to put the same logic into either your function or the invocation part.

Comment by Glebs Ivanovskis (Inactive) [ 2015 Aug 21 ]

I'm afraid in his comment Igor misinterpreted lines 33-42 of https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface and Trever is gently pushing us to acknowledge this fact.

"Physical" sensor will actually have two paths, for example on my machine I have:

  • /sys/class/hwmon/hwmon0
  • /sys/devices/platform/coretemp.0/hwmon/hwmon0

Both have the same contents and both have device folder in them, which actually points to ../../../coretemp.0.

I guess "virtual" sensor will have it's own path is /sys/class but no entry in /sys/devices (or /sys/devices/platform, not sure what is more precise).

The correct way in my opinion would be checking /sys/class/hwmon/hwmon* for name and all other files and ignoring /sys/class/hwmon/hwmon*/device folder. It would be interesting to find out why our approach worked previously. Is /sys/class/hwmon/hwmon*/device a relatively new addition?

Comment by Glebs Ivanovskis (Inactive) [ 2015 Aug 28 ]

Fixed for version 2.2 in development branch svn://svn.zabbix.com/branches/dev/ZBX-8899-22 r55208.

As volter suggested, we will now check both /sys/class/hwmon/hwmon* and /sys/class/hwmon/hwmon*/device for sensor name attribute and look for other attributes in the same place.

(1) As a side effect of work on this fix we dug deeper into sensor interface specification and found out how complicated it is. For example, zabbix can report voltages if you tell him sensor[chip-name,in*] but sometimes these values need scaling, sometimes not even device-model-specific scaling but device-specimen-specific scaling (see http://lm-sensors.org/wiki/VoltageLabelsAndScaling). This aspect should be mentioned in documentation.

glebs.ivanovskis Added footnote about voltage scaling in:

Unfortunately, lm-sensors.org seems to be down, provided link directly to sysfs specification. Please review. RESOLVED

sasha CLOSED

Comment by Andris Zeila [ 2015 Sep 02 ]

(2) An output parameter was added to sysfs_read_attr() function which points to a static buffer:

static char	*sysfs_read_attr(const char *device, char *attr_location)

We should either use dynamic buffer or pass the buffer size as additional parameter:

static char	*sysfs_read_attr(const char *device, char **attr_location)
// or
static char	*sysfs_read_attr(const char *device, char *attr_location, size_t attr_location_size)

glebs.ivanovskis RESOLVED in r55837, see details in (3).

wiper CLOSED

Comment by Andris Zeila [ 2015 Sep 03 ]

(3) sysfs_read_attr() is used in two places - first to get device name and location, then to get bus name. So now it can return bus name where previously returned NULL and agent might incorrectly classify sensor's bus.

glebs.ivanovskis I accepted the patch you suggested to check different possible locations of sensor attributes in a more elegant way. I just made char *locations[] array global and sysfs_read_attr() wil now return pointer of locations element where it found attributes. The attribute itself is passed to caller in second sysfs_read_attr() parameter. In the second place where sysfs_read_attr() was called we now make sure that subfolder we found attribute in is "/device".
Changes of r55837 address both (2) and (3). RESOLVED

wiper CLOSED, please review changes in r55861

Comment by Glebs Ivanovskis (Inactive) [ 2015 Sep 30 ]

Taking in mind what was said in (1) about driver complexity I would strongly encourage users interested in sensor monitoring on Linux (as well as developers interested in easier respective code maintenance ) to vote for ZBXNEXT-2929.

Comment by Andris Zeila [ 2015 Sep 30 ]

Successfully tested

Comment by Glebs Ivanovskis (Inactive) [ 2015 Sep 30 ]

Fixed in pre-2.2.11rc1 r55871.
Fixed in pre-2.4.7rc1 r55873.
Fixed in pre-3.0.0alpha3 (trunk) r55874.

Reviewed and accepted changes in r55861.

Please review conflict resolution in r55873.

wiper The merge looks ok, but please check (4)

glebs.ivanovskis (4) Fixed in pre-2.2.11rc1 r55896, pre-2.4.7rc1 r55897, pre-3.0.0alpha3 (trunk) r55898.

Comment by Andris Zeila [ 2015 Oct 01 ]

(4) Problem found by coverity:

		if (0 > dev_len)
		{
			/* No device link? Treat device as virtual */
			err = get_device_info(devicepath, NULL, device_info, &subfolder);
		}
		else
		{
			deviced[dev_len] = '\0';
			device_p = strrchr(deviced, '/') + 1;

			if (0 == strcmp(device, device_p))
			{
				zbx_snprintf(device_info, sizeof(device_info), "%s", device);
				err = SUCCEED;
			}
			else
			{
				err = get_device_info(devicepath, device_p, device_info, &subfolder);
			}
		}

		if (SUCCEED == err && 0 == strcmp(device_info, device))
		{
//   CID 130939:  Memory - illegal accesses  (UNINIT)
//   Using uninitialized value "subfolder" when calling "__zbx_zbx_snprintf".
			zbx_snprintf(devicepath, sizeof(devicepath), "%s/%s%s", DEVICE_DIR, deviceent->d_name,
					subfolder);

Indeed if dev_len > 0 and device matches device_p then subfolder is not initialized. Apparently it can happen as we have code handling this situation. No idea what the subfolder should be though.

glebs.ivanovskis In case of device matching device_p user provided us actual device name instead of name printed by sensors -u command.

$ zabbix_agentd -t 'sensor[f8000-isa-0a00,in0]'
sensor[f8000-isa-0a00,in0]                    [d|1.688000]
$ zabbix_agentd -t 'sensor[f71882fg.2560,in0]'
sensor[f71882fg.2560,in0]                     [d|1.688000]
$ zabbix_agentd -t 'sensor[coretemp-isa-0000,temp,avg]'
sensor[coretemp-isa-0000,temp,avg]            [d|30.000000]
$ zabbix_agentd -t 'sensor[coretemp.0,temp,avg]'
sensor[coretemp.0,temp,avg]                   [d|30.000000]

We simply need to initialize subfolder properly like we do usually because it can be either "" or "/device". Fixed in revision 55881. RESOLVED

wiper CLOSED

Comment by Andris Zeila [ 2015 Oct 01 ]

Succesfully tested

Comment by Glebs Ivanovskis (Inactive) [ 2015 Oct 13 ]

(5) Documented in What's new:

Added footnote about voltage scaling (subissue (1)) in:

wiper reviewed, though not sure we should add 'simple' bugfixes to what's new

zalex_ua I'm completely agree with wiper, we do not mention such bug fixes on whatsnew pages.
Also, in documentation we don't use phrases like "We fixed ..."

sasha REOPENED according to remarks

glebs.ivanovskis Removed from What's new. RESOLVED

sasha CLOSED

Comment by Oleksii Zagorskyi [ 2015 Dec 01 ]

It caused a regression in ZBX-10126

Generated at Sat Apr 20 12:31:28 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.