[ZBX-11586] Zabbix hosts lost conections Created: 2016 Dec 12  Updated: 2018 Aug 10  Resolved: 2018 Aug 10

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.2.1
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: Matthew ISIDORE Assignee: Unassigned
Resolution: Unsupported version Votes: 0
Labels: agent, triggers, usability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Centos 7
8 vCPU
8go RAM


Attachments: PNG File after_update.png     PNG File before_update.png     PNG File proxys_322.png     PNG File queue-after.png     PNG File queue_before.png     PNG File zabbix_error.png    

 Description   

Hi,

I updated my zabbix from 3.0 to 3.2.1 because some features of 3.2.1 was interesting to me.
I use zabbix to monitor =~ 400-450 hosts like Fortigate, Mikrotic, Windows & Linux servers.
Before the update everything was OK but since last Thursday sometimes i have many hosts loosing their connections to Zabbix server and trigger the alert. They come back like 5 or 10 mins after the alert. It's look like just before the problem there is a huge item queue. But that's all i noticed

It's very very annoying for the support team because a hundred or alert pop on the screens for nothing because host and Internet links are OK

I attached a screen of some graphs of the last 4 days. If you need anything more just tell me i will send it to you

Thanks for help !



 Comments   
Comment by Matthew ISIDORE [ 2016 Dec 12 ]

Update: I stress tested a lot of agents to see if there any problem and they are OK
I checked if that was related to backup execution, it's not...
The last thing i tested was to disable housekeeper in the frontend and there is no bug since that. But i prefer to wait more to see if there is any change

Comment by Aleksandrs Saveljevs [ 2016 Dec 13 ]

When did you upgrade your server? Could you please show the process busyness graphs and queue graphs with data both before and after the upgrade? Did anything else change beyond the Zabbix server version? Do you use proxies?

Also, you may wish to upgrade to Zabbix 3.2.2, because a lot of important issues were fixed. To fix (or work around) the problem, consider increasing the amount of pollers and history syncers. How many do you have now and how many did you have before the upgrade?

Comment by Matthew ISIDORE [ 2016 Dec 13 ]

I did the update the day of the 3.2.2 release i think. Yeah i was mad when i saw it and i just finished the update.
I use proxies (5 of them) but i have also active agent and SNMP.
I didn't change my conf (until this morning) it's maybe not good as it could be but it worked. I have 150 poller and i had 25 hystory syncer (put to 100 now to check).

I put graphs 3 days before the update and after the update

Comment by Matthew ISIDORE [ 2016 Dec 13 ]

I'm downloading source of 3.2.2 to try another update

edit1:
After the update 3.2.2 on server and all proxies i noticed that some items are stuck in queue for more than 10 mins and take a long long time to get out of it
(i put a screen of that too)

edit2:
I just have the same bug even with the new 3.2.2 update
It seem to be history syncer related

Comment by Aleksandrs Saveljevs [ 2016 Dec 13 ]

Related issue: ZBX-11590.

Comment by Matthew ISIDORE [ 2016 Dec 13 ]

Yes, in fact i cloned the vm of zabbix and use a snapshot to come back just before the upgrade because we use it in production
I will continue to dig it, but now i need to revert all changes

Comment by Matthew ISIDORE [ 2016 Dec 14 ]

Same after revert, got this night some host loosing connection (less than in 3.2.1), just at the same time as history syncer again, it never happened before the update....

Comment by Rostislav Palivoda [ 2017 Oct 23 ]

Can you confirm it's still actual?

Comment by Matthew ISIDORE [ 2017 Oct 24 ]

Hi, it's been a long time since this event. I finished my mission and i'm back to school now. I can't confirm it's still actual because i dont have my hands on the Zabbix anymore. But i'm guessing yes it's still actual. (Not 100% accurate)

Generated at Thu Apr 17 08:36:28 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.