Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-4640

another network error retrying to get a value

XMLWordPrintable

    • Icon: Incident report Incident report
    • Resolution: Cannot Reproduce
    • Icon: Critical Critical
    • None
    • 1.9.7 (beta), 1.9.8 (beta)
    • Server (S)
    • None
    • Debian x32 & x64

      I have problems with retrying to get a value.
      First found in version 1.9.7 (fresh install), upgrade to 1.9.9 didn't fixed it. Tested on 2 servers with lots of clients.
      I found some fixed issues on similar errors, but it seems they are not completely fixed, upgrade to the 1.9.9 doen't help.

      Logs are populated with the following:
      17808:20120210:161916.259 resuming Zabbix agent checks on host [lari-casino]: connection restored
      17821:20120210:161923.164 resuming Zabbix agent checks on host [gw.viaden.com]: connection restored
      17796:20120210:161925.241 Zabbix agent item [system.swap.size[,pfree]] on host [lari-poker] failed: first network error, wait for 20 seconds
      17751:20120210:161929.219 Zabbix agent item [system.cpu.load[,avg15]] on host [lari-casino] failed: first network error, wait for 20 seconds
      17812:20120210:161949.182 resuming Zabbix agent checks on host [lari-casino]: connection restored
      17782:20120210:161958.749 Zabbix agent item [vm.memory.size[total]] on host [gw.viaden.com] failed: first network error, wait for 20 seconds
      17782:20120210:162005.730 Zabbix agent item [vfs.fs.size[/,pfree]] on host [lari-casino] failed: first network error, wait for 20 seconds
      17819:20120210:162018.302 resuming Zabbix agent checks on host [gw.viaden.com]: connection restored
      17816:20120210:162025.407 resuming Zabbix agent checks on host [lari-casino]: connection restored
      17785:20120210:162102.411 Zabbix agent item [vfs.fs.inode[/home,pfree]] on host [lari-casino] failed: first network error, wait for 20 seconds
      17806:20120210:162122.346 resuming Zabbix agent checks on host [lari-casino]: connection restored
      17714:20120210:162124.508 Zabbix agent item [system.cpu.util[,idle,avg1]] on host [gw.viaden.com] failed: first network error, wait for 20 seconds
      17793:20120210:162126.288 Zabbix agent item [vm.memory.inactive] on host [gw.viaden.com] failed: another network error, wait for 20 seconds
      17726:20120210:162140.805 Zabbix agent item [system.cpu.load[,avg1]] on host [lari-casino] failed: first network error, wait for 20 seconds
      17805:20120210:162146.459 resuming Zabbix agent checks on host [gw.viaden.com]: connection restored
      17805:20120210:162200.672 resuming Zabbix agent checks on host [lari-casino]: connection restored

      Note, keys and servers are different.
      Tested different UnreachableDelay (from 5 to 20).
      This is not connectivity issue, the same time tested with multiple zabbix_get - no errors at all.

      The agent log with debug enabled shows no errors - it always sends data back.
      tcpdump shows a lot of RST flags from server. It doesn't seem to be right tcp session end.

      I tried to disable checks on host, wait until queue is cleared, then start monitoring again. It doesn't help.
      Agent and server restarts sometimes help, sometimes not. The issue occurs randomly and can dissaper after some time (few hours ordinary), or stay for a long time.
      There are no strange spikes on the internal zabbix monitoring graphs (except housekeeping tasks), network activity and pooling are stable.

            Unassigned Unassigned
            sineex Anton Ryabchenko
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: