[ZBX-12549] Items can be stuck when host becomes reachable Created: 2017 Aug 22 Updated: 2024 Apr 10 Resolved: 2017 Aug 24 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Proxy (P), Server (S) |
Affects Version/s: | None |
Fix Version/s: | 3.4.1rc1, 3.4 (plan), 4.0.0alpha1 |
Type: | Problem report | Priority: | Blocker |
Reporter: | Andris Zeila | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 15 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: |
![]() ![]() ![]() ![]() |
||||||||||||||||||||||||||||||||||||
Issue Links: |
|
||||||||||||||||||||||||||||||||||||
Team: | |||||||||||||||||||||||||||||||||||||
Sprint: | Sprint 15 |
Description |
Steps to repeat:
After agent is started item is polled once and host becomes reachable. However the item is not polled after this until server restart. |
Comments |
Comment by Andris Zeila [ 2017 Aug 22 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-12549 |
Comment by Kamil Porembinski [ 2017 Aug 23 ] |
This is very critical issue. In my case ALL agents after start are still unreachable till server restart. |
Comment by dimir [ 2017 Aug 23 ] |
Yes, paszczak000, we know it's very critical. The fix is being tested and will be available in 3.4.1rc1 ASAP. |
Comment by Mikhail Shepelev [ 2017 Aug 23 ] |
I confirm this issue, but I restart proxy server. |
Comment by sles [ 2017 Aug 23 ] |
>Fixed in development branch Is it possible to get patch for 3.4.0? |
Comment by sles [ 2017 Aug 23 ] |
OK, looks like dbconfig.c from svn works, after one agent restart everything works. |
Comment by Constantine Volodin [ 2017 Aug 23 ] |
Is there any progress? All very very much waiting. |
Comment by Andrey A. Pestretsov [ 2017 Aug 23 ] |
Waiting for RPM server-mysql in repository EL6 x64 |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 23 ] |
We understand the severity of the issue and are working on it. As you see in status, it's currently in testing. Unfortunately, your numerous comments can't make this process faster and only distract involved people. Please be patient. |
Comment by Alexey Asemov [ 2017 Aug 24 ] |
No press release about fatal showstopper bug. Release not revoked till the fix is available, and no warnings. Is everything ok? |
Comment by Konstantin Barinov [ 2017 Aug 24 ] |
Please release hotfix. This is indeed very serious bug. |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 24 ] |
Successfully tested! |
Comment by Andris Zeila [ 2017 Aug 24 ] |
Released in:
|
Comment by Hilton Kevin de Carvalho [ 2017 Aug 24 ] |
Do you have any prediction of when the .deb package will be made available with this fix? |
Comment by Constantine Volodin [ 2017 Aug 24 ] |
When can I expect a docker image? |
Comment by Giorgio Biondi [ 2017 Aug 24 ] |
Hi, have some ideas about rpm in Zabbix repo? Thanks a lot for yours job. |
Comment by Hilton Kevin de Carvalho [ 2017 Aug 24 ] |
When the fix will be available in the repositories? |
Comment by Rodrigo Moreira [ 2017 Aug 25 ] |
I upgraded to rc1 and the error still persists |
Comment by Konstantin Barinov [ 2017 Aug 25 ] |
Please make fixed version available in repositories. Thank you! |
Comment by Rob Dekkers [ 2017 Aug 25 ] |
I upgraded to rc1 and stop one agent. After it comes unavailable i started the agent and the host comes back up. Nice job! |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 25 ] |
Dear insanemor, can you provide more information? |
Comment by dimir [ 2017 Aug 25 ] |
sbr2004 unfortunately it was decided not to release rc1 packages. However we plan to release 3.4.1 packages on Monday (28.08). |
Comment by Rodrigo Moreira [ 2017 Aug 25 ] |
Glebs Ivanovskis . After it comes unavailable, hosts not comes back up ... |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 25 ] |
Dear insanemor, have you upgraded server, proxy or agent? |
Comment by Rodrigo Moreira [ 2017 Aug 25 ] |
i have upgrade server ! |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 25 ] |
Is agent monitored by server or by proxy? |
Comment by Rodrigo Moreira [ 2017 Aug 25 ] |
by server ... |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 25 ] |
Maybe we should move our questionnaire into IRC. |
Comment by Rodrigo Moreira [ 2017 Aug 25 ] |
how do I do that ? |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 25 ] |
Check http://zabbix.org/wiki/Getting_help#IRC or go directly to https://webchat.freenode.net/?channels=#zabbix |
Comment by Rodrigo Moreira [ 2017 Aug 25 ] |
Glebs tks !!! |
Comment by Christian Hagemeier [ 2017 Aug 25 ] |
Hi, also had this issue. How to fix it? |
Comment by Naxiwer Lee [ 2017 Aug 26 ] |
I use the source file - 3.4.1rc1.tgz , just replace /usr/sbin/zabbix_server with the compiled. |
Comment by Christian Hagemeier [ 2017 Aug 26 ] |
Thanks, i compiled with 3.4.1rc2 and replaced zabbix_server binary. |
Comment by Andrey A. Pestretsov [ 2017 Aug 26 ] |
Confirm, binary from rc2 resolved bug |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 26 ] |
Thank everyone for testing! Glad to hear the issue is fixed. Hopefully, insanemor's problem is resolved too. Yet again, huge apologies for this mishap. |
Comment by Hilton Kevin de Carvalho [ 2017 Aug 26 ] |
Glebs, we thank you guys for all the work, I hope that on Monday an update will be available for debian repo. |
Comment by Jiří Káša [ 2017 Aug 28 ] |
not sure if this is related problem but some of my items are in state "Enabled" but they doesn't have any data, but same item on another host have or from discovery for filesystem C: not working and for D: working.... any help ? Edit: restarted zabbix-server and now i get's values hope it will not freeze again |
Comment by Nicki Bo Otte [ 2017 Aug 28 ] |
^ Same problem here. |
Comment by Giorgio Biondi [ 2017 Aug 28 ] |
Hi, I wait package version for Redhat system. In meanwhile I have solved restart zabbix-server every hour via crontab. All the best. Giorgio Biondi. |
Comment by Giorgio Biondi [ 2017 Aug 28 ] |
Hi at all, great job.. Now are available package rpm!!! All the best. |
Comment by Christian Hagemeier [ 2017 Aug 28 ] |
Debian packages too. |
Comment by Misak Khachatryan [ 2017 Aug 29 ] |
Hi, after upgrade to 3.4.1 i see the same behavior. It's not fixed, at least for me. CentOS 7.3, zabbix repo packages, postgresql on separate host. |
Comment by Constantine Volodin [ 2017 Aug 31 ] |
When can you expect a docker image? |
Comment by Ilmar Soobik [ 2017 Sep 18 ] |
After upgrade to 3.4.1 we see the same behavior as well. |
Comment by Andris Zeila [ 2017 Sep 18 ] |
Could you please give more information about your setup (or more specifically - about the problematic host) ? Are the host monitored by proxy or directly by server? |
Comment by Ilmar Soobik [ 2017 Sep 18 ] |
Monitored directly by server. |
Comment by Andris Zeila [ 2017 Sep 18 ] |
Are there any more passive agent items on those hosts or only agent.ping? agent.ping failing on all hosts - does that meant that all hosts become unreachable and then reachable again? And all of them at the same time? |
Comment by Ilmar Soobik [ 2017 Sep 18 ] |
agent.hostname and agent.version are also enabled. |
Comment by Andris Zeila [ 2017 Sep 18 ] |
I assume the agent.hostname and agent.version stops being updated too? What is the update interval of agent.ping check? How many pollers are being used? It's really strange behaviour, and might not even be because of internal requeueing bug (which was the cause of this problem report). Can you strace a poller after agent.ping starts to fail? It might be that pollers are getting stuck somewhere and data gathering stops when server runs out of pollers. (You could also check in process list if zabbix poller process title is being updated). |
Comment by Ilmar Soobik [ 2017 Sep 18 ] |
You assume correctly. Update interval is 20 seconds. (Made no difference when it was 1m) 500 pollers. 1000 ICMP pingers. (Increased for testing, no difference with 300/50) They don't start to fail exactly - they seem to fail all at once. |
Comment by Sebastien [ 2017 Sep 22 ] |
after zabbix 3.2.6 to 3.4.1 upgrade, every zabbix server restart triggers nodata alarm even if data was collected by proxy. Issue still there in 3.4.1 |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Sep 22 ] |
Dear sfl, your issue is different. Please create a separate bug report. |
Comment by Elvar [ 2017 Oct 21 ] |
I see this is marked as Fixed and Closed but I am seeing this exact same behavior right now in 3.4.3. I have a number of hosts that are showing an active 'agent.ping.nodata(10m)}=1' but I can see agent.ping returning successfully despite the triggers not recovering. Did this issue resurface in 3.4.3? |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Oct 22 ] |
Dear elvar, if you see data coming in, your issue is different. Maybe |
Comment by Elvar [ 2017 Oct 23 ] |
Hi Glebs, that definitely sounds similar, thanks! |
Comment by Sascha Guilliard [ 2017 Oct 23 ] |
i'm running version 3.4.3 and since this version i got some hosts that alert agent.ping.nodata and I don't see any agent ping at "latest data" from those hosts but when i run the command via zabbix_get I get a response |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Oct 23 ] |
Dear sguilliard, there are number of reasons why you could see such behaviour not necessarily related to this bug. No, as far as I know, there were no reasons for this bug to "resurface" in 3.4.3. |
Comment by Merphis Ellis [ 2018 Jan 23 ] |
I am having this same issue with 3.4.6 on two different systems. I have 1 server, 275 agents. 35611 items 11339 triggers I did not have any issues with the 3.2.6. |
Comment by Glebs Ivanovskis (Inactive) [ 2018 Jan 23 ] |
Dear mellis3, please describe in a bit more detail what exactly do you experience. When hosts become reachable/unreachable there must be messages in the log. Would be nice to see Latest data of affected items. |
Comment by Ilmar Soobik [ 2018 Jan 24 ] |
Your config lacks: StartPreprocessors= Newer versions of zabbix will have that in the config by default, but if you've been upgrading and keeping your old config file, then it could be missing. |
Comment by Merphis Ellis [ 2018 Jan 24 ] |
Good Morning I had exported the templates and host, so I did an import. After about 2 hours I started to see unreachable problems on systems that were available. After a few, I started to see one by one every host became unavailable, I have adjusted the time to 15mins due to the network performance on some of the remote sites., I do not see how to post a screenshot,. at 09:37:00 host k20S01 display on the dashboard. config file edits Looking on the server I do find that host, but do not see any error's around that transaction,,,, I do not find any errors at all in the server logs. I do have debug at 5. One other thing I have noticed. if you look at the internal processing graph the lines will stop about the same time as the host become so on this last restart the processes lasted about 25 mins. It seems to me that the zabbix-web is having issues getting the data or the data writes are to slow. This is a VM on a HP360 servers, 4 vCPU's 7200rpm disk. the other VM is just file storage. |
Comment by Ilmar Soobik [ 2018 Jan 24 ] |
If you look at my previous comment: This amount of load was handled by 24 cores, 64GB RAM and 7200RPM disks. Our configuration: StartPollers=150 |
Comment by Glebs Ivanovskis (Inactive) [ 2018 Feb 10 ] |
Dear mellis3, please see available ways of getting help. Dear illukas, StartPreprocessors=300 is an overkill IMHO. There is very little sense in having more preprocessor worker processes than logical CPU cores you have available. |