Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-16096

system.cpu.util[<core>] key sometimes returns a wrong huge value (like 307343286799559360.000000)

    Details

    • Type: Problem report
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.0.27, 4.0.7, 4.2.1
    • Component/s: Agent (G)
    • Labels:
    • Environment:
      ZBX server v4.2.0 on (CentOS Linux release 7.6.1810 (Core))
      ZBX agent v4.2.1 (previously v3.4.11 or v3.4.14) on (CentOS Linux release 7.5.1804 (Core))
    • Team:
      Team A
    • Sprint:
      Sprint 52 (May 2019)
    • Story Points:
      0.125

      Description

      We had this issue quite long time already (on versions 3.4.11 or 4.2.1), reporting it just now as it became too annoying.

      We have 3 servers, having 80 CPU cores and our zabbix_server.log is filled by messages like these:

       10639:20190506:111426.753 item "system_ovn3.domain:system.cpu.util[58,user]" became not supported: Value 307445734561825856.000000 is too small or too large.
       10648:20190506:111439.777 item "system_ovn2.domain:system.cpu.util[51,user]" became not supported: Value 308010420332435328.000000 is too small or too large.
       10639:20190506:111526.913 item "system_ovn3.domain:system.cpu.util[58,user]" became supported
       10643:20190506:111539.947 item "system_ovn2.domain:system.cpu.util[51,user]" became supported
       10643:20190506:111606.996 item "system_ovn1.domain:system.cpu.util[58,user]" became not supported: Value 307496984059169024.000000 is too small or too large.
       10648:20190506:111706.938 item "system_ovn1.domain:system.cpu.util[58,user]" became supported
      

      Each such host has ~600 items, where 328 items are different "system.cpu.util*" keys (80*4 - idle,iowait,system,user + a 8 items for whole CPU).
      Update interval is 1m for all the "system.cpu.util" items.
      Also each host has 3 "system.cpu.load[percpu,avg5]" items (for avg1 avg5 avg15).

      We have run 'system.cpu.util[58,user]' key using zabbix_get tool in a loop with 5 seconds delay, and during 5 minutes received the huge value once for one server:

      #  while true; do echo "`date` - `zabbix_get -s system_ovn1.domain -k 'system.cpu.util[58,user]'`"; sleep 5; done
      
      Mon May  6 12:58:32 CEST 2019 - 0.083333
      Mon May  6 12:58:37 CEST 2019 - 0.083333
      Mon May  6 12:58:42 CEST 2019 - 0.083333
      Mon May  6 12:58:47 CEST 2019 - 0.083333
      Mon May  6 12:58:52 CEST 2019 - 0.083319
      Mon May  6 12:58:57 CEST 2019 - 0.016661
      Mon May  6 12:59:02 CEST 2019 - 0.016664
      Mon May  6 12:59:07 CEST 2019 - 0.033333
      Mon May  6 12:59:12 CEST 2019 - 0.016667
      Mon May  6 12:59:17 CEST 2019 - 0.016664
      Mon May  6 12:59:22 CEST 2019 - 0.000000
      Mon May  6 12:59:27 CEST 2019 - 0.000000
      Mon May  6 12:59:32 CEST 2019 - 0.000000
      Mon May  6 12:59:37 CEST 2019 - 0.000000
      Mon May  6 12:59:42 CEST 2019 - 0.000000
      Mon May  6 12:59:47 CEST 2019 - 0.000000
      Mon May  6 12:59:52 CEST 2019 - 0.000000
      Mon May  6 12:59:57 CEST 2019 - 0.000000
      Mon May  6 13:00:02 CEST 2019 - 0.000000
      Mon May  6 13:00:07 CEST 2019 - 307343286799559360.000000
      Mon May  6 13:00:12 CEST 2019 - 0.000000
      Mon May  6 13:00:17 CEST 2019 - 0.000000
      Mon May  6 13:00:22 CEST 2019 - 0.000000
      Mon May  6 13:00:27 CEST 2019 - 0.000000
      Mon May  6 13:00:32 CEST 2019 - 0.000000
      Mon May  6 13:00:37 CEST 2019 - 0.066678
      Mon May  6 13:00:42 CEST 2019 - 0.083319
      Mon May  6 13:00:47 CEST 2019 - 0.083333
      Mon May  6 13:00:52 CEST 2019 - 0.083333
      

      Yes, we have many items, which you can say are not so useful/required, but zabbix agent should not return a wrong value.

       

      We tried to run "system.cpu.util" key in similar loop for longer period, on all 3 servers, and did not caught the issue.
      Maybe we were just not luck enough, who knows ...

        Attachments

        1. .cproject
          11 kB
        2. PS-571-testing_agents_P1_P2-logs.tar.gz
          1.05 MB
        3. system_ovn1.domain-cat_proc_stat
          11 kB
        4. system_ovn2.domain-cat_proc_stat
          10 kB
        5. system_ovn3.domain-cat_proc_stat
          10 kB
        6. ZBX-16096_1.diff
          1 kB
        7. ZBX-16096_2.diff
          0.8 kB
        8. ZBX-16096-4.0.diff
          1.0 kB

          Activity

            People

            • Assignee:
              vso Vladislavs Sokurenko
              Reporter:
              zalex_ua Oleksiy Zagorskyi
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: