[#ZBX-2444] Queue is incorrectly calculated in case of Distributed Monitoring (and maybe also with standalone server)

[ZBX-2444] Queue is incorrectly calculated in case of Distributed Monitoring (and maybe also with standalone server) Created: 2010 May 19 Updated: 2017 May 30 Resolved: 2011 Aug 13
Status:	Closed
Project:	ZABBIX BUGS AND ISSUES
Component/s:	Frontend (F)
Affects Version/s:	1.8, 1.8.1, 1.8.2
Fix Version/s:	None

Type:

Incident report

Priority:

Minor

Reporter:

Gabriele Armao

Assignee:

Unassigned

Resolution:

Cannot Reproduce

Votes:

Labels:

None

Remaining Estimate:

Not Specified

Time Spent:

Not Specified

Original Estimate:

Not Specified

Environment:

zabbix server 1.8.2 (but the problem was present on 1.8 too) configured with at least one Node

Attachments:

mysql_query.png

zabbix_queue.png

Description

I was getting too many queued items so I decided to check the code that calculated these items, checking the queue.php (side note, maybe some more comments in the source code would help), I noticed line 1732:

$nextcheck = $delay * floor($now / $delay) + ($itemid % $delay);

I'm not 100% sure what this code does, but finding the $itemid here is quite strange, even more since the itemid value is different in case of distributed monitoring is a really large number and is different for each item, so I changed it to $now and the queue seems to be normal, although I'm not sure if I did the right change.

Comments

Comment by Gabriele Armao [ 2010 May 19 ]

sorry, the file is actually: includes/items.inc.php

Comment by Gabriele Armao [ 2010 May 24 ]

well then, I guess I need to be more specific:

NOTE: I'm showing the Zabbix Appliance just to clear things up, I got the same issue with zabbix server compiled from sources.

1. boot Zabbix Appliance 1.8.2
2. connect to zabbix interface and see Administrator-> Queue is completely empty
3. stop zabbix server: service zabbix-server stop
4. edit /etc/zabbix/zabbix_server.conf and change NodeId to 1
5. run zabbix-server -n 1 -c /etc/zabbix/zabbix_server.conf
6. start zabbix server: service zabbix-server start
7. connect to zabbix interface and see Administrator-> Queue is now full of items that come and go after their check delay.

Comment by richlv [ 2010 May 25 ]

according to developers, calculation is correct (so comments in the code could be better )

as for queue, maybe your zabbix server simply is not able to cope with all items right after the startup

Comment by Gabriele Armao [ 2010 May 25 ]

the point is I just started the zabbix appliance, downloaded from the website and without any modification, I did the above steps. I didn't add any host, it's just checking the default local zabbix_agentd with the default Template_linux items. I think even a virtual machine should be able to handle these checks without any issue, in fact the queue seems to be regular until I switch zabbix_server to distributed mode, that coincidentally, increases all the $itemid values from, for example: 10078 to 100100000010078.
Also, I have the appliance running for 20 hours now, still showing the same queue.
Try it yourself with the appliance, no modifications to items is needed.

Comment by richlv [ 2010 May 27 ]

check your zabbix server logfile - has the server started up after the conversion ?
did you do the conversion to distributed setup correctly - that is, run db conversion once only, change nodeid in the config file etc etc ?

Comment by Gabriele Armao [ 2010 May 27 ]

here's a screenshot of the queue panel of the appliance, I did everything as I wrote on the last post, stopped server, modified the config file, converted the db, started the server and added the node with the correct nodeid.
The screenshot shows the items on queue, but actually it's just wrongly calculated, in fact, on a ssh session, just a second after taking the screenshot, I issued a mysql query that shows one of the item, its lastclock (last time it was executed), the configured delay and the calculated nextclock (by simply adding the delay to the lastclock).
All times are UTC for the mysql query, but in the web interface they appear with Europe/Riga timezone (so two hours ahead), the default for php.ini on the appliance. I guess this isn't an issue.

this is the mysql query I run: select host,key_,description,itemid,from_unixtime(lastclock),now(),from_unixtime(lastclock+delay) as "last+delay",delay from items,hosts where items.hostid=hosts.hostid and key_="vfs.file.cksum[/boot/vmlinuz]";

I attached a screenshot of the mysql query and result, so that it's easy to see everything.

I hope it's clear now on my production server I changed the "$itemid" to "$now" string in the code above and the queue shows the correct items with the correct "delayed by" value.

Comment by Gabriele Armao [ 2010 Jul 28 ]

Any news with this? I think the examples with the zabbix appliance are pretty much explanatory...also the item zabbix[queue] reports the correct number of queued items, different than the number shown on the web interface, so this is another hint that there may be an issue with the data extracted from the php interface.

Comment by richlv [ 2011 Aug 10 ]

i think i might have an idea what's the cause of such data here

items are scheduled to be checked at some point in time. this scheduling distributes then semi-randomly and does that in part by item id. so an item that's checked every hour would be checked at the same minute and second every hour. when queue looks at this data, it checks when was the last value and when this item was scheduled to be checked.

now, what happens if you convert the db to distributed setup... all the ids change, so item schedule suddenly changes as well. item that was checked every hour on minute 15 might now be scheduled for minute 3. if you converted the db at minute 13, zabbix looks (for queue purposes) at item config, sees that it is to be polled every 60 minutes. then it sees that last value arrived almost one hour ago, BUT, based on the new scheduling, it was expected at 3rd minute. which, by now, is already one hour late. and the next check is scheduled on the 3rd minute - in the next hour.

so immediately after converting the db to distributed monitoring you will see queue spike. it should settle down eventually.

Comment by richlv [ 2011 Aug 13 ]

the scenario with the conversion to dm seems to be clear. please reopen if you have repeatable steps with a standalone server, thanks

Generated at Wed Apr 16 03:39:46 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.

[ZBX-2444] Queue is incorrectly calculated in case of Distributed Monitoring (and maybe also with standalone server) Created: 2010 May 19 Updated: 2017 May 30 Resolved: 2011 Aug 13