[ZBX-8566] Zabbix pollers get stuck Created: 2014 Aug 01  Updated: 2017 May 30  Resolved: 2014 Aug 02

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: 2.2.5
Fix Version/s: None

Type: Incident report Priority: Critical
Reporter: Raimonds Treimanis Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: snmp
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Centos 6.5/Ubuntu 14.04
MySQL 5.6


Attachments: JPEG File z2.jpg     JPEG File z3.jpg     JPEG File z4.jpg    
Issue Links:
Duplicate
duplicates ZBX-8528 random lost UDP packets lead to not b... Closed

 Description   

Zabbix poller processes get somehow "stuck", which results in failed polls and "first network error, wait for 15 seconds" messages in log.
Most of my items are SNMPv2, and, as i already commented in ZBX-7426, without doing any alteration of configuration percent of busy pollers on proxies is rising - you can see yourself in attached graphs. Those big drops in busy are proxy restarts. In third graph you can see how min/max/acv busy pollers change after restart
I have noticed that as time passes i start to get more and more pollers reporting
zabbix_proxy: poller #123 [got x values in 30.034998 sec, idle 1 sec]
In my configuration timeout is set to 30 sec, so polls longer that that should end in error. Unclear thing is why they report that they actually got some values. Anyway number of such processes build up. And those are not same processes but different ones,
admin@zbx-prxy1 ~ >ps aux|grep poll |grep "in 30"
zabbix 31378 0.1 0.6 599500 52160 ? S 13:57 0:02 zabbix_proxy: poller #2 [got 1 values in 30.042152 sec, getting values]
zabbix 31394 0.0 0.5 597364 47632 ? S 13:57 0:01 zabbix_proxy: poller #18 [got 3 values in 30.057145 sec, getting values]
zabbix 31407 0.2 0.6 599500 53756 ? S 13:57 0:03 zabbix_proxy: poller #28 [got 3 values in 30.004999 sec, getting values]
zabbix 31410 0.1 0.6 599988 52160 ? S 13:57 0:02 zabbix_proxy: poller #31 [got 2 values in 30.052261 sec, getting values]
zabbix 31430 0.2 0.6 599972 54824 ? S 13:57 0:04 zabbix_proxy: poller #51 [got 2 values in 30.052217 sec, getting values]
zabbix 31432 0.1 0.6 599168 51448 ? S 13:57 0:02 zabbix_proxy: poller #53 [got 3 values in 30.056933 sec, getting values]
zabbix 31447 0.1 0.6 597312 48920 ? S 13:57 0:02 zabbix_proxy: poller #63 [got 8 values in 30.354670 sec, getting values]
zabbix 31449 0.1 0.6 597312 49200 ? S 13:57 0:02 zabbix_proxy: poller #65 [got 1 values in 30.041501 sec, getting values]
zabbix 31497 0.0 0.6 599428 49556 ? S 13:57 0:01 zabbix_proxy: poller #108 [got 1 values in 30.039372 sec, getting values]
zabbix 31511 0.1 0.6 599840 52844 ? R 13:57 0:01 zabbix_proxy: poller #122 [got 1 values in 30.039643 sec, getting values]
zabbix 31515 0.1 0.6 599300 52316 ? S 13:57 0:02 zabbix_proxy: poller #126 [got 2 values in 30.050534 sec, getting values]
admin@zbx-prxy1 ~ >ps aux|grep poll |grep "in 30"
zabbix 31407 0.2 0.6 599500 53756 ? S 13:57 0:03 zabbix_proxy: poller #28 [got 3 values in 30.004999 sec, getting values]
zabbix 31410 0.1 0.6 599988 52160 ? S 13:57 0:02 zabbix_proxy: poller #31 [got 2 values in 30.052261 sec, getting values]
zabbix 31497 0.0 0.6 599428 49556 ? S 13:57 0:01 zabbix_proxy: poller #108 [got 1 values in 30.039372 sec, getting values]
admin@zbx-prxy1 ~ >ps aux|grep poll |grep "in 30"
zabbix 31407 0.2 0.6 599500 53756 ? S 13:57 0:03 zabbix_proxy: poller #28 [got 3 values in 30.004999 sec, getting values]
zabbix 31410 0.1 0.6 599988 52160 ? S 13:57 0:02 zabbix_proxy: poller #31 [got 2 values in 30.052261 sec, getting values]
zabbix 31497 0.0 0.6 599428 49556 ? S 13:57 0:01 zabbix_proxy: poller #108 [got 1 values in 30.039372 sec, getting values]
admin@zbx-prxy1 ~ >ps aux|grep poll |grep "in 30"
zabbix 31410 0.1 0.6 599988 52160 ? S 13:57 0:02 zabbix_proxy: poller #31 [got 2 values in 30.052261 sec, getting values]
zabbix 31497 0.0 0.6 599428 49556 ? S 13:57 0:01 zabbix_proxy: poller #108 [got 1 values in 30.039372 sec, getting values]
admin@zbx-prxy1 ~ >ps aux|grep poll |grep "in 30"
zabbix 31497 0.0 0.6 599428 49556 ? S 13:57 0:01 zabbix_proxy: poller #108 [got 1 values in 30.039372 sec, getting values]
zabbix 31536 0.0 0.1 596804 10964 ? S 13:57 0:00 zabbix_proxy: unreachable poller #15 [got 1 values in 30.033028 sec, idle 4 sec]
admin@zbx-prxy1 ~ >ps aux|grep poll |grep "in 30"
zabbix 31497 0.0 0.6 599428 49556 ? S 13:57 0:01 zabbix_proxy: poller #108 [got 1 values in 30.039372 sec, getting values]
admin@zbx-prxy1 ~ >ps aux|grep poll |grep "in 30"
zabbix 31497 0.0 0.6 599428 49556 ? S 13:57 0:01 zabbix_proxy: poller #108 [got 1 values in 30.039372 sec, getting values]
admin@zbx-prxy1 ~ >ps aux|grep poll |grep "in 30"
admin@zbx-prxy1 ~ >ps aux|grep poll |grep "in 30"



 Comments   
Comment by Oleksii Zagorskyi [ 2014 Aug 02 ]

Next time please attach graphs as PNG and with original size.

This may be caused by reasons described in ZBX-8528.

Received values even after 30 seconds waiting is not a strange thing. Poller my process different hosts in one iteration.

Comment by Oleksii Zagorskyi [ 2014 Aug 02 ]

I'm closing it as duplicate of ZBX-8528.
Feel free to reopen if you don't agree.

Comment by richlv [ 2014 Aug 08 ]

the cause might even be more simple than ZBX-8528 - i would strongly recommend against setting timeout to 30 seconds, returning it to the default of 3 seconds would be suggested

Comment by Raimonds Treimanis [ 2014 Aug 08 ]

Lots of my hosts respond slower than 3 sec, therefore setting timeout to 3 sec will only speed up degradation of bulk requests to 1 per request as mentioned in ZBX-8528.

Comment by Raimonds Treimanis [ 2014 Aug 08 ]

I dont get what do you mean by "Received values even after 30 seconds waiting is not a strange thing. Poller my process different hosts in one iteration."
I thought poller polls only 1 host in 1 "cycle", before reporting number of values it got.
So it runs out poller polls 1 host, gets some values, then polls second, gets some values and on third times out but returns values it managed to get successfully and reports in log "got xxx values in yyy sec"?

Comment by Aleksandrs Saveljevs [ 2014 Aug 08 ]

According to src/zabbix_server/poller/poller.c, the "xxx" and "yyy" values in "got xxx values in yyy sec" are updated in two cases: (a) when the poller process has nothing to do and goes to sleep, or (b) when the poller process is constantly taking items from the queue, but has not updated its status in 5 seconds. So in line "got 3 values in 30.004999 sec" above, for instance, it is not possible to know in how many iterations the poller acquired 3 values: one, two, or three. But it is known for sure that in the last iteration it hit a timeout.

Generated at Fri Apr 04 20:22:00 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.