[ZBX-9016] Some items in the queue indefinitely after host reboot Created: 2014 Nov 10 Updated: 2017 May 30 Resolved: 2014 Dec 04 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Proxy (P), Server (S) |
Affects Version/s: | 2.4.1 |
Fix Version/s: | 2.2.8rc1, 2.4.3rc1, 2.5.0 |
Type: | Incident report | Priority: | Critical |
Reporter: | Stanislav Antic | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | cache, queue, unreachable | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
CentOS 6 and PostgreSQL 9.1 |
Attachments: | busy_unreachable.png no_data.png no_data_graph.png | ||||||||
Issue Links: |
|
Description |
We have a problem with that after rebooting two of our servers have some items that don't get updated. They just stand in queue. In attachment are images of queue (detail view) and latest data graph for one of this items. |
Comments |
Comment by Aleksandrs Saveljevs [ 2014 Nov 11 ] |
Are these items "Zabbix agent" or "Zabbix agent (active)"? Do we understand correctly that some items on these hosts do get updated, but some do not? Is there anything special about those items that do not? Are you monitoring these hosts through a proxy? |
Comment by Stanislav Antic [ 2014 Nov 11 ] |
This are all items from "Zabbix agent".
Yes, some items are correctly updated on those hosts. There are three hosts: Windows, FreeBSD and Linux.
I dont think that there is anything special, some items are discovery some are not.
No, directly. |
Comment by Aleksandrs Saveljevs [ 2014 Nov 12 ] |
When you rebooted those servers, did you put those hosts into "no data" maintenance? If so, this looks to be a regression from |
Comment by Aleksandrs Saveljevs [ 2014 Nov 12 ] |
Actually, disregard that - |
Comment by Aleksandrs Saveljevs [ 2014 Nov 12 ] |
If your servers were down for a while, their monitoring might have been overtaken by unreachable pollers. How many unreachable pollers do you have? Are you monitoring how busy they are using internal items such as "zabbix[process,unreachable poller,avg,busy]"? If so, could you please post a graph that shows whether they are stuck? If they are stuck, could you please do their "strace" so that we know what are they stuck on? |
Comment by Stanislav Antic [ 2014 Nov 12 ] |
It doesn't look as it is busy right now. Also all "unreachable pooler" shows "got values 0". I missed one information, this hosts were disabled when they were offline, they were not in maintenance during their offline time. |
Comment by Aleksandrs Saveljevs [ 2014 Nov 13 ] |
Without ZBXNEXT-2588, this might be a bit hard to arrive at the exact cause of the problem, but we have a conjecture. It might have been that these hosts became disabled while these items were processed. After items are processed, we call DCrequeue_items(). There, if a host is disabled, we do not requeue items. However, we also do not change dc_item->location - it stays ZBX_LOC_POLLER. And if it stays ZBX_LOC_POLLER, the item never gets back into the queue. |
Comment by Stanislav Antic [ 2014 Nov 13 ] |
Assumption looks reasonable. Also we rebooted server and it's OK now, so anything that was wrong was in memory state. |
Comment by Aleksandrs Saveljevs [ 2014 Nov 14 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-9016 . |
Comment by Aleksandrs Saveljevs [ 2014 Nov 20 ] |
Issue |
Comment by Alexander Vladishev [ 2014 Dec 04 ] |
(1) Please review my changes in r50994 before a merge. asaveljevs Thank you! CLOSED. |
Comment by Aleksandrs Saveljevs [ 2014 Dec 04 ] |
Fixed in pre-2.2.8 r50997, pre-2.4.3 r50998, and pre-2.5.0 (trunk) r50999. |