[ZBX-11590] Zabbix Server stop data gathering Created: 2016 Dec 13  Updated: 2018 Apr 20  Resolved: 2018 Apr 20

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G), Server (S)
Affects Version/s: 3.2.1
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: Jose Augusto Ferrronato Assignee: Unassigned
Resolution: Unsupported version Votes: 0
Labels: action, alerter, database, history, housekeeper, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Centos 6.8
MySQL 5.5.54
Zabbix Server 3.2.1


Attachments: PNG File cpu_user.png     PNG File cpu_user_graph.png     PNG File data_gathering.png     JPEG File history_syncer.jpg     PNG File internal_process.png     PNG File screenshot-1.png     File zabbix_agentd.log-20161210.gz     JPEG File zabbix_info.jpg     File zabbix_server.conf     Zip Archive zabbix_server_log.zip    

 Description   

Zabbix has consistently failed to send alerts due to collection issues.
Zabbix server also stops collecting the data, generating the error of "Zabbix agent on {HOST.NAME} is unreachable for 5 minutes" in the collection machines, after a time zabbix itself recovers Usually, other times we have to restart the zabbix server service.

The graph shows "no data" and after a certain amount of time (an average of 5 minutes later) it fills the graphic of the missing data and re-collects when the collection returns automatically, when manual restart, the graph is no longer filled and all collection data is lost.

This error happened after a migration from zabbix 2.4 to zabbix 3.2

The housekeeper is disabled, even with the housekeeper enabled the problem continued.



 Comments   
Comment by Aleksandrs Saveljevs [ 2016 Dec 13 ]

Related issue: ZBX-11586.

Comment by Aleksandrs Saveljevs [ 2016 Dec 13 ]

Similar to ZBX-11586, it seems to do with poller and history syncer busyness.

Would it be possible to post screenshots that compare their busyness before and after the upgrade? How many pollers and history syncers do you have? Did you change any configuration after the upgrade except Zabbix version? Do you use proxies?

Based on "internal_process.png", history syncers get loaded every hour. Do you have an idea what is causing this? Do you get an unusual item traffic every hour? If so, what are those items?

Comment by Adriane Ázara [ 2016 Dec 13 ]

Inserted the image of the last 3 months

Update was held on 11/21

The parameters that were changed were those of cache and those of polles.

There is 2 zabbix proxy in different locales.

Comment by Jose Augusto Ferrronato [ 2016 Dec 13 ]

Hi,
We have no specify item on this, here the items we have:

Items report
==================

[INFO] Total de items: 72698
[INFO] Items enabled: 70005
[INFO] Items disabled: 134
[ERRO] Items not supported: 2559

Items by type
==============
[INFO] Items Zabbix Agent (passive): 593
[INFO] Items Zabbix Agent (active): 0
[INFO] Items Zabbix Trapper: 213
[INFO] Items Zabbix Internal: 73
[INFO] Items Zabbix Agreggate: 0
[INFO] Items SNMPv1: 174
[INFO] Items SNMPv2: 66396
[INFO] Items SNMPv3: 0
[INFO] Items SNMNP Trap: 0
[INFO] Items JMX: 0
[INFO] Items IPMI: 0
[INFO] Items SSH: 0
[INFO] Items Telnet: 0
[INFO] Items Web: 180
[INFO] Items Simple Check: 1504
[INFO] Items Calculated: 1261
[INFO] Items External Check: 863
[INFO] Items Database: 1487

Another info

[INFO] Number of items with icmpping key history for more than 7 days: 146
[INFO] Number of non-numeric items (active): 6822

NVPS: 158.7

Comment by Jose Augusto Ferrronato [ 2016 Dec 13 ]

The history syncer increase a lot after the upgrade

Comment by Matthew ISIDORE [ 2016 Dec 14 ]

I currently have the same problem did you try to lower
CacheSize=512M at like 128M or less ?
HistoryCacheSize=512M to default ?
HistoryIndexCacheSize=128M to default ?
TrendCacheSize=512M to default
ValueCacheSize=512M to default

Why i'm telling you that ? Just because i suppose that if your cache is too big the history syncer process will have too many item to process and (refering to your confing file) will do that every 120 seconds. I think you don't need as much cache as that. I have 480NVPS and i use less cache than you.

That's my only clue to solver your problem ^^
Tell me if it change anything for you

Edit:
Take note that if the server crash it's because the value are too low

Comment by Jose Augusto Ferrronato [ 2016 Dec 14 ]

The error already occurred before we changed the Cache values. It continued even after we adjusted with larger values. Thanks for the help.

Generated at Sat Apr 05 11:48:28 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.