Loading...

XML

Word

Printable

Type: Incident report
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: None
Affects Version/s: 1.9.7 (beta), 1.9.8 (beta)
Component/s: Server (S)
Labels:
None
Environment:
Debian x32 & x64

I have problems with retrying to get a value.
First found in version 1.9.7 (fresh install), upgrade to 1.9.9 didn't fixed it. Tested on 2 servers with lots of clients.
I found some fixed issues on similar errors, but it seems they are not completely fixed, upgrade to the 1.9.9 doen't help.

Logs are populated with the following:
17808:20120210:161916.259 resuming Zabbix agent checks on host [lari-casino]: connection restored
17821:20120210:161923.164 resuming Zabbix agent checks on host [gw.viaden.com]: connection restored
17796:20120210:161925.241 Zabbix agent item [system.swap.size[,pfree]] on host [lari-poker] failed: first network error, wait for 20 seconds
17751:20120210:161929.219 Zabbix agent item [system.cpu.load[,avg15]] on host [lari-casino] failed: first network error, wait for 20 seconds
17812:20120210:161949.182 resuming Zabbix agent checks on host [lari-casino]: connection restored
17782:20120210:161958.749 Zabbix agent item [vm.memory.size[total]] on host [gw.viaden.com] failed: first network error, wait for 20 seconds
17782:20120210:162005.730 Zabbix agent item [vfs.fs.size[/,pfree]] on host [lari-casino] failed: first network error, wait for 20 seconds
17819:20120210:162018.302 resuming Zabbix agent checks on host [gw.viaden.com]: connection restored
17816:20120210:162025.407 resuming Zabbix agent checks on host [lari-casino]: connection restored
17785:20120210:162102.411 Zabbix agent item [vfs.fs.inode[/home,pfree]] on host [lari-casino] failed: first network error, wait for 20 seconds
17806:20120210:162122.346 resuming Zabbix agent checks on host [lari-casino]: connection restored
17714:20120210:162124.508 Zabbix agent item [system.cpu.util[,idle,avg1]] on host [gw.viaden.com] failed: first network error, wait for 20 seconds
17793:20120210:162126.288 Zabbix agent item [vm.memory.inactive] on host [gw.viaden.com] failed: another network error, wait for 20 seconds
17726:20120210:162140.805 Zabbix agent item [system.cpu.load[,avg1]] on host [lari-casino] failed: first network error, wait for 20 seconds
17805:20120210:162146.459 resuming Zabbix agent checks on host [gw.viaden.com]: connection restored
17805:20120210:162200.672 resuming Zabbix agent checks on host [lari-casino]: connection restored

Note, keys and servers are different.
Tested different UnreachableDelay (from 5 to 20).
This is not connectivity issue, the same time tested with multiple zabbix_get - no errors at all.

The agent log with debug enabled shows no errors - it always sends data back.
tcpdump shows a lot of RST flags from server. It doesn't seem to be right tcp session end.

I tried to disable checks on host, wait until queue is cleared, then start monitoring again. It doesn't help.
Agent and server restarts sometimes help, sometimes not. The issue occurs randomly and can dissaper after some time (few hours ordinary), or stay for a long time.
There are no strange spikes on the internal zabbix monitoring graphs (except housekeeping tasks), network activity and pooling are stable.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

connection refused.png
17 kB
2012 Feb 13 16:09

Assignee:: Unassigned

Reporter:: Anton Ryabchenko

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2012 Feb 10 16:11

Updated:: 2017 May 30 18:07

Resolved:: 2012 Feb 13 17:46

Details

Description

Attachments

Attachments

Activity

People

Dates