[ZBX-14312] Proxy->Agent communication drops intermittently Created: 2018 May 01  Updated: 2024 Apr 10  Resolved: 2018 May 13

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.4.8
Fix Version/s: 3.4.10rc1, 4.0.0alpha7, 4.0 (plan)

Type: Incident report Priority: Critical
Reporter: Hari Vittal Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 0
Labels: deadlock, problems
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Zabbix Server 3.4.8 (RHEL 7.4)
Zabbix Proxy 3.4.8 (RHEL 6.9)
Zabbix Agent 2.4.1 (Microsoft Windows Server 2008 R2 SE SP 1 x64)


Attachments: File 60720_eng.log     PNG File Screen Shot 2018-05-01 at 15.56.39.png     PNG File Screen Shot 2018-05-01 at 15.57.47.png     PNG File agent_ping.png     Text File image-2018-05-02-12-11-03-871.png     Text File image-2018-05-02-12-11-49-813.png     PNG File proxy_perf1.png     PNG File proxy_perf2.png     PNG File proxy_perf3.png     PNG File proxy_perf4.png     PNG File proxy_queue.png     Zip Archive zabbix_proxy.log.zip     Zip Archive zabbix_server.log.zip    
Issue Links:
Causes
caused by ZBXNEXT-1649 Fine grained control of tasks perform... Closed
caused by ZBX-11426 Events removed by housekeeper can cau... Closed
Duplicate
duplicates ZBX-14307 Agent communication breaks frequently Closed
Team: Team A
Sprint: Sprint 33
Story Points: 1

 Description   

We are seeing agents intermittently failing agent.ping checks

Proxy that is used appears to have a large queue:

 We do have other servers reporting through the same proxy (rhlappzab405) w/o issue though. These are specific to Windows hosts/agents and happens on around 100 hosts. Proxy is managing between 400-500 hosts in total.

Proxy Logs: zabbix_proxy.log.zip

Server Logs: zabbix_server.log.zip



 Comments   
Comment by Alexey Pustovalov [ 2018 May 01 ]

Everything looks like you have performance issues with the proxy.

Comment by Alexey Pustovalov [ 2018 May 01 ]

Please attach graphs from proxy monitoring (template "Template App Zabbix Proxy").

Comment by Hari Vittal [ 2018 May 01 ]

Hi Alexey,

Please find attached graphs for proxy performance... unreachable poller is maxed out at 100%, these are quite busy on the other hosts but not maxed out at 100%.

We use the defaults for proxy parameters apart from below:

 

DBHost=rhldatzab405

DBName=zabbix_proxy1

DBPassword=XXXXXXXX

DBUser=zabbix

DBPort=3303

LogFileSize=100

LogFile=/tmp/zabbix_proxy.log

Server=zabbix-corp.fairisaac.com

Timeout=30

StartPollers=40

StartPollersUnreachable=5

StartPingers=5

CacheSize=128M

JavaGateway=zabbix-proxy-shk

JavaGatewayPort=10052

StartJavaPollers=5

ConfigFrequency=300

PidFile=/tmp/zabbix_proxy.pid

 

Comment by Hari Vittal [ 2018 May 01 ]

I have increased below:

 

StartPollersUnreachable=20

StartPingers=20

 

That's made some improvements but the queue is still larger than usual...

 

Unreachable Poller processes have dropped to around 70% busy:

 

Unclear if this means we have to still increase these two conf parameters...

 

Comment by Alexey Pustovalov [ 2018 May 01 ]

after these changes you need to check hosts availability using information from Administration->Queue->Details. Actually the issue is not a bug in Zabbix

Comment by Hari Vittal [ 2018 May 02 ]

Everything that is delayed (~200 items) appears to be form hosts that are currently unreachable:

 

 

Appears to have caught up otherwise... 

I think what's not clear is why this issue with the load on the proxy had an effect recently... There hasn't been any major change in the volume of servers managed by the proxy.

So far we have not had any repeats of the issue, so I think we can close this ticket.

Comment by Vladislavs Sokurenko [ 2018 May 11 ]

Fixed in:

  • pre-3.4.10rc1 r80725
  • pre-4.0.0alpha7 (trunk) r80726
Generated at Thu Apr 25 01:33:46 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.