Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-4551

item checks fail majority of time in 1.9.8

XMLWordPrintable

    • Icon: Incident report Incident report
    • Resolution: Won't fix
    • Icon: Blocker Blocker
    • None
    • 1.9.8 (beta)
    • Server (S)
    • None
    • RHEL6

      I upgraded from 1.9.6 to 1.9.8. Updated the database with no errors and had the server and agent binaries running.

      Immediately the server logs start logging failures for item checks, if you check the graph for a particular item there are huge gaps where it didn't collect any data. Also the 'queue' is growing to hundreds of items.
      Strangely I never had an error trying to fetch the same items with zabbix_get

      Rolled back to 1.9.6 and all is fine again.

      The following is an example of the errors seen in the server log

      13719:20120116:140618.123 Zabbix agent item [vfs.fs.size[/tmp, pfree]] on host [lin020] failed: first network error, wait for 15 seconds
      13719:20120116:140629.126 Zabbix agent item [vfs.dev.read[/dev/disk/by-id/dm-name-rootvg-root,sectors]] on host [lin021] failed: first network error, wait for 15 seconds
      13721:20120116:140633.910 Zabbix agent item [vfs.fs.size[/var, free]] on host [lin020] failed: another network error, wait for 15 seconds
      13721:20120116:140644.912 Zabbix agent item [vfs.dev.read[/dev/disk/by-id/dm-name-rootvg-root,sectors]] on host [lin021] failed: another network error, wait for 15 seconds
      13721:20120116:140648.917 resuming Zabbix agent checks on host [lin020]: connection restored
      13717:20120116:140649.133 Zabbix agent item [vfs.fs.size[/var, pfree]] on host [lin020] failed: first network error, wait for 15 seconds
      13721:20120116:140659.919 Zabbix agent item [vfs.dev.read[/dev/disk/by-id/dm-name-rootvg-root,sectors]] on host [lin021] failed: another network error, wait for 15 seconds
      13721:20120116:140704.920 resuming Zabbix agent checks on host [lin020]: connection restored
      13718:20120116:140705.141 Zabbix agent item [net.if.total[eth0,dropped]] on host [lin020] failed: first network error, wait for 15 seconds
      13721:20120116:140714.923 resuming Zabbix agent checks on host [lin021]: connection restored

      I turned debugging on and saw some entries like this as well.

      13593:20120116:131456.257 In substitute_simple_macros() data:'vfs.fs.size[/var, free]'
      13593:20120116:131456.257 In substitute_simple_macros() data:EMPTY
      13593:20120116:131456.257 In deactivate_host() hostid:10047 itemid:28946 type:0
      13593:20120116:131456.257 deactivate_host() errors_from:0 available:1

        1. zabbix_server-120119.log.bz2
          380 kB
        2. zabbix_server.log.bz2
          518 kB
        3. schema.diff
          19 kB

            Unassigned Unassigned
            ahowell Andrew Howell
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: