[ZBX-14312] Proxy->Agent communication drops intermittently Created: 2018 May 01 Updated: 2024 Apr 10 Resolved: 2018 May 13 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 3.4.8 |
Fix Version/s: | 3.4.10rc1, 4.0.0alpha7, 4.0 (plan) |
Type: | Incident report | Priority: | Critical |
Reporter: | Hari Vittal | Assignee: | Vladislavs Sokurenko |
Resolution: | Fixed | Votes: | 0 |
Labels: | deadlock, problems | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Zabbix Server 3.4.8 (RHEL 7.4) |
Attachments: |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
||||||||||||||||||||
Issue Links: |
|
||||||||||||||||||||
Team: | |||||||||||||||||||||
Sprint: | Sprint 33 | ||||||||||||||||||||
Story Points: | 1 |
Description |
We are seeing agents intermittently failing agent.ping checks Proxy that is used appears to have a large queue: We do have other servers reporting through the same proxy (rhlappzab405) w/o issue though. These are specific to Windows hosts/agents and happens on around 100 hosts. Proxy is managing between 400-500 hosts in total. Proxy Logs: zabbix_proxy.log.zip Server Logs: zabbix_server.log.zip |
Comments |
Comment by Alexey Pustovalov [ 2018 May 01 ] |
Everything looks like you have performance issues with the proxy. |
Comment by Alexey Pustovalov [ 2018 May 01 ] |
Please attach graphs from proxy monitoring (template "Template App Zabbix Proxy"). |
Comment by Hari Vittal [ 2018 May 01 ] |
Hi Alexey, Please find attached graphs for proxy performance... unreachable poller is maxed out at 100%, these are quite busy on the other hosts but not maxed out at 100%. We use the defaults for proxy parameters apart from below:
DBHost=rhldatzab405 DBName=zabbix_proxy1 DBPassword=XXXXXXXX DBUser=zabbix DBPort=3303 LogFileSize=100 LogFile=/tmp/zabbix_proxy.log Server=zabbix-corp.fairisaac.com Timeout=30 StartPollers=40 StartPollersUnreachable=5 StartPingers=5 CacheSize=128M JavaGateway=zabbix-proxy-shk JavaGatewayPort=10052 StartJavaPollers=5 ConfigFrequency=300 PidFile=/tmp/zabbix_proxy.pid
|
Comment by Hari Vittal [ 2018 May 01 ] |
I have increased below:
StartPollersUnreachable=20 StartPingers=20
That's made some improvements but the queue is still larger than usual...
Unreachable Poller processes have dropped to around 70% busy:
Unclear if this means we have to still increase these two conf parameters...
|
Comment by Alexey Pustovalov [ 2018 May 01 ] |
after these changes you need to check hosts availability using information from Administration->Queue->Details. Actually the issue is not a bug in Zabbix |
Comment by Hari Vittal [ 2018 May 02 ] |
Everything that is delayed (~200 items) appears to be form hosts that are currently unreachable:
Appears to have caught up otherwise... I think what's not clear is why this issue with the load on the proxy had an effect recently... There hasn't been any major change in the volume of servers managed by the proxy. So far we have not had any repeats of the issue, so I think we can close this ticket. |
Comment by Vladislavs Sokurenko [ 2018 May 11 ] |
Fixed in:
|