[ZBX-2444] Queue is incorrectly calculated in case of Distributed Monitoring (and maybe also with standalone server) Created: 2010 May 19 Updated: 2017 May 30 Resolved: 2011 Aug 13 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Frontend (F) |
Affects Version/s: | 1.8, 1.8.1, 1.8.2 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Minor |
Reporter: | Gabriele Armao | Assignee: | Unassigned |
Resolution: | Cannot Reproduce | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
zabbix server 1.8.2 (but the problem was present on 1.8 too) configured with at least one Node |
Attachments: |
![]() ![]() |
Description |
I was getting too many queued items so I decided to check the code that calculated these items, checking the queue.php (side note, maybe some more comments in the source code would help), I noticed line 1732: $nextcheck = $delay * floor($now / $delay) + ($itemid % $delay); I'm not 100% sure what this code does, but finding the $itemid here is quite strange, even more since the itemid value is different in case of distributed monitoring is a really large number and is different for each item, so I changed it to $now and the queue seems to be normal, although I'm not sure if I did the right change. |
Comments |
Comment by Gabriele Armao [ 2010 May 19 ] |
sorry, the file is actually: includes/items.inc.php |
Comment by Gabriele Armao [ 2010 May 24 ] |
well then, I guess I need to be more specific: NOTE: I'm showing the Zabbix Appliance just to clear things up, I got the same issue with zabbix server compiled from sources. 1. boot Zabbix Appliance 1.8.2 |
Comment by richlv [ 2010 May 25 ] |
according to developers, calculation is correct (so comments in the code could be better as for queue, maybe your zabbix server simply is not able to cope with all items right after the startup |
Comment by Gabriele Armao [ 2010 May 25 ] |
the point is I just started the zabbix appliance, downloaded from the website and without any modification, I did the above steps. I didn't add any host, it's just checking the default local zabbix_agentd with the default Template_linux items. I think even a virtual machine should be able to handle these checks without any issue, in fact the queue seems to be regular until I switch zabbix_server to distributed mode, that coincidentally, increases all the $itemid values from, for example: 10078 to 100100000010078. |
Comment by richlv [ 2010 May 27 ] |
check your zabbix server logfile - has the server started up after the conversion ? |
Comment by Gabriele Armao [ 2010 May 27 ] |
here's a screenshot of the queue panel of the appliance, I did everything as I wrote on the last post, stopped server, modified the config file, converted the db, started the server and added the node with the correct nodeid. this is the mysql query I run: select host,key_,description,itemid,from_unixtime(lastclock),now(),from_unixtime(lastclock+delay) as "last+delay",delay from items,hosts where items.hostid=hosts.hostid and key_="vfs.file.cksum[/boot/vmlinuz]"; I attached a screenshot of the mysql query and result, so that it's easy to see everything. I hope it's clear now |
Comment by Gabriele Armao [ 2010 Jul 28 ] |
Any news with this? I think the examples with the zabbix appliance are pretty much explanatory...also the item zabbix[queue] reports the correct number of queued items, different than the number shown on the web interface, so this is another hint that there may be an issue with the data extracted from the php interface. |
Comment by richlv [ 2011 Aug 10 ] |
i think i might have an idea what's the cause of such data here items are scheduled to be checked at some point in time. this scheduling distributes then semi-randomly and does that in part by item id. so an item that's checked every hour would be checked at the same minute and second every hour. when queue looks at this data, it checks when was the last value and when this item was scheduled to be checked. now, what happens if you convert the db to distributed setup... all the ids change, so item schedule suddenly changes as well. item that was checked every hour on minute 15 might now be scheduled for minute 3. if you converted the db at minute 13, zabbix looks (for queue purposes) at item config, sees that it is to be polled every 60 minutes. then it sees that last value arrived almost one hour ago, BUT, based on the new scheduling, it was expected at 3rd minute. which, by now, is already one hour late. and the next check is scheduled on the 3rd minute - in the next hour. so immediately after converting the db to distributed monitoring you will see queue spike. it should settle down eventually. |
Comment by richlv [ 2011 Aug 13 ] |
the scenario with the conversion to dm seems to be clear. please reopen if you have repeatable steps with a standalone server, thanks |