I found an issue and am recounting my journey in attempting to fix it. It's possible that there's a better way and I thought it might be useful to know how I got there.
I've been investigating the use of the SMART template with agent2 on several test hosts. I noticed something very odd today when working on improving the SMART template. A bunch of disks that I got all around the same time a ~2 years ago were all reporting the EXACT same "Power on hours".
Last value for "SMART [sdc sat]: ID 9 Power_On_Hours" is '85'.
Last value for "SMART [sdc sat]: Power on hours" is "3h 47m 27s".
I checked the graph which I would assume would be changing upward as power on hours increases but instead I found a flat line!
Well - that's not right. And all the drives shouldn't all be the same either. Are other items wrong too?
I went looking at the hard drive itself. This is straight from the smartctl program:
Ah. That makes sense. The template is pulling back the "VALUE" which is not at all what is needed here. What is needed is the "RAW_VALUE". Which means my suspicions on all the other values being wrong is probably correct too.
What does that look like in json? (snipping only relevant bit)
Capturing the first value isn't as useful to monitoring as the raw value. Now to update the template template_module_smart_agent2.yaml:
And a recheck...
Well, some values are much better. For example, my disk is NOT running at 60 degrees C! But it is at 40C. That's a plus.
In fact, a lot of the values make more sense. Better yet, they aren't all the same across every disk!
However, Power_on_hours is still not right. It went from "85" to "19297288074576" when it should be "13647". And I also broke other fields as my Temperature_Celsius (which has the same temp as Airflow_Temperature_Cel) is showing "107374182440"!!!
So clearly something is busted in my fix. Investigating the SMART JSON output again, sure enough some values the "raw value" matches the "string" but in others it really is the raw value. Which means the string might be the better field to read in. Though, I have no idea how to deal with multiple value types inside a single discovery item. And I don't really want to create a new value for every potential item here.
Here's my attempt to fix it again in template_module_smart_agent2.yaml:
Is it the right fix? :shrug: but it works. I'm getting correct values.
Zabbix isn't translating the "SMART [sdc sat]: Power on hours" correctly because it reads it in as seconds and changing that to hours just gives "13.65 Kh" instead of translating that into years/months/days/hours. I just removed it to display the raw value for me but maybe there's a better solution there too. However, everything else is looking much better.
Hopefully this helps others with this template issue.