Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  2. ZBX-19284

Bad values in the agent2 SMART template



    • Team INT
    • Sprint 80 (Sep 2021), Sprint 81 (Oct 2021)
    • 0.5


      I found an issue and am recounting my journey in attempting to fix it. It's possible that there's a better way and I thought it might be useful to know how I got there.

      I've been investigating the use of the SMART template with agent2 on several test hosts. I noticed something very odd today when working on improving the SMART template. A bunch of disks that I got all around the same time a ~2 years ago were all reporting the EXACT same "Power on hours".
      Last value for "SMART [sdc sat]: ID 9 Power_On_Hours" is '85'.
      Last value for "SMART [sdc sat]: Power on hours" is "3h 47m 27s".
      I checked the graph which I would assume would be changing upward as power on hours increases but instead I found a flat line!

      Well - that's not right. And all the drives shouldn't all be the same either. Are other items wrong too?

      I went looking at the hard drive itself. This is straight from the smartctl program:

      9 Power_On_Hours 0x0032 085 085 000 Old_age Always - 13647 (130 84 0)

      Ah. That makes sense. The template is pulling back the "VALUE" which is not at all what is needed here. What is needed is the "RAW_VALUE". Which means my suspicions on all the other values being wrong is probably correct too.

      What does that look like in json? (snipping only relevant bit)

       "id": 9,
       "name": "Power_On_Hours",
       "value": 85,
       "worst": 85,
       "thresh": 0,
       "when_failed": "",
       "flags": {
       "value": 50,
       "string": "-O--CK ",
       "prefailure": false,
       "updated_online": true,
       "performance": false,
       "error_rate": false,
       "event_count": true,
       "auto_keep": true
       "raw": {
       "value": 39973260637519,
       "string": "13647 (36 91 0)"

      Capturing the first value isn't as useful to monitoring as the raw value. Now to update the template template_module_smart_agent2.yaml:

      <                     - '$[?(@.disk_name==''{#NAME}'')].ata_smart_attributes.table[?(@.id=={#ID})].value.first()'
      >                     - '$[?(@.disk_name==''{#NAME}'')].ata_smart_attributes.raw.table[?(@.id=={#ID})].value.first()'

      And a recheck...

      Well, some values are much better. For example, my disk is NOT running at 60 degrees C! But it is at 40C. That's a plus.

      190 Airflow_Temperature_Cel 0x0022 060 052 040 Old_age Always - 40 (Min/Max 33/43)

      In fact, a lot of the values make more sense. Better yet, they aren't all the same across every disk!

      However, Power_on_hours is still not right. It went from "85" to "19297288074576" when it should be "13647". And I also broke other fields as my Temperature_Celsius (which has the same temp as Airflow_Temperature_Cel) is showing "107374182440"!!!

      So clearly something is busted in my fix. Investigating the SMART JSON output again, sure enough some values the "raw value" matches the "string" but in others it really is the raw value. Which means the string might be the better field to read in. Though, I have no idea how to deal with multiple value types inside a single discovery item. And I don't really want to create a new value for every potential item here.

      Here's my attempt to fix it again in template_module_smart_agent2.yaml:

      <                     - '$[?(@.disk_name==''{#NAME}'')].ata_smart_attributes.table[?(@.id=={#ID})].value.first()'
      >                     - '$[?(@.disk_name==''{#NAME}'')].ata_smart_attributes.table[?(@.id=={#ID})].raw.string.first()'
      >                 -
      >                   type: JAVASCRIPT
      >                   parameters:
      >                     - |
      >                       var parsed = value.split(" ").filter(function(e){ return e === 0 || e })[0];
      >                       parsed = parsed.split("+").filter(function(e){ return e === 0 || e })[0];
      >                       return parsed
      >                 -
      >                   type: RTRIM
      >                   parameters:
      >                     - h

      Is it the right fix? :shrug: but it works. I'm getting correct values.

      Zabbix isn't translating the "SMART [sdc sat]: Power on hours" correctly because it reads it in as seconds and changing that to hours just gives "13.65 Kh" instead of translating that into years/months/days/hours. I just removed it to display the raw value for me but maybe there's a better solution there too. However, everything else is looking much better.

      Hopefully this helps others with this template issue.





            mchudinov Maxim Chudinov
            cstackpole Chris Stackpole
            3 Vote for this issue
            6 Start watching this issue