[ZBX-7798] after upgrade to 2.2.2 zabbix queue graph looks anomalous due to icmp ping items Created: 2014 Feb 13 Updated: 2024 Apr 10 Resolved: 2019 Sep 16 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 2.2.2 |
Fix Version/s: | 4.0.13rc1, 4.2.7rc1, 4.4.0alpha3, 4.4 (plan) |
Type: | Problem report | Priority: | Minor |
Reporter: | Robert Jerzak | Assignee: | Michael Veksler |
Resolution: | Fixed | Votes: | 3 |
Labels: | icmpping, queue | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: | Screen Shot 2014-02-13 at 00.14.20.png Screen Shot 2014-02-13 at 00.16.18.png Screen Shot 2014-02-13 at 13.44.11.png Screen Shot 2014-02-13 at 13.52.01.png debugging-1.patch | ||||||||||||||||
Issue Links: |
|
||||||||||||||||
Team: | Team A | ||||||||||||||||
Sprint: | Sprint 56 (Sep 2019), Sprint 55 (Aug 2019), Sprint 53 (Jun 2019), Sprint 54 (Jul 2019) | ||||||||||||||||
Story Points: | 1 |
Description |
After update zabbix from version 2.2.1 to 2.2.2 zabbix queue graph looks pretty anomalous. Picture in the attachment shows it, I've made the update around 14.00. Before the update avg value was around 1.4. After the update avg is around 32 but what's more interesting is looks very unnatural, just jumps from very low value to around 100 periodically. There is no other drawbacks I'm aware of. I've confirmed this behaviour on two our zabbix servers. If you consider description or symptoms too general or irrelevant feel free to close the ticket. |
Comments |
Comment by Aleksandrs Saveljevs [ 2014 Feb 13 ] |
Would it be possible for you to identify item types that are queueing periodically? For instance, in "Administration" -> "Queue"? |
Comment by Oleksii Zagorskyi [ 2014 Feb 13 ] |
You need to investigate server log, performance graphs and other points. |
Comment by Oleksii Zagorskyi [ 2014 Feb 13 ] |
This project for bug reports only. |
Comment by Aleksandrs Saveljevs [ 2014 Feb 13 ] |
In this particular case, I am very interested in the cause. Simply upgrading from 2.2.1 to 2.2.2 should not have such drastic consequences. |
Comment by Robert Jerzak [ 2014 Feb 13 ] |
According to "Administration" -> "Queue" it's Simple check type of item. It jumps from 0 to around 100. |
Comment by Aleksandrs Saveljevs [ 2014 Feb 13 ] |
Do you have an idea which simple checks might be queueing? Which simple checks do you use a lot? Do you use VMware monitoring? |
Comment by Aleksandrs Saveljevs [ 2014 Feb 13 ] |
Do you use ping items a lot on a single host? Since |
Comment by Robert Jerzak [ 2014 Feb 13 ] |
Majority of my simple checks are icmpping and icmppingsec, so I would guess that these are most suspicious. On a single host I have usually one icmpping, on some hosts there are both icmpping and icmppingsec. |
Comment by Aleksandrs Saveljevs [ 2014 Feb 13 ] |
How busy is the pinger process? Could you please attach the graph which shows how busy Zabbix processes are, before and after the upgrade? |
Comment by Robert Jerzak [ 2014 Feb 13 ] |
Before the update (about 14.00) zabbix busy icmp pinger process was about 24%, after the update there is around 28%. I've added graph with zabbix processes usage. |
Comment by Robert Jerzak [ 2014 Feb 13 ] |
In zabbix_server.conf I have: StartPingers=12 |
Comment by Aleksandrs Saveljevs [ 2014 Feb 13 ] |
Do you use the default settings for pinging? By how much are these simple checks delayed? |
Comment by Robert Jerzak [ 2014 Feb 13 ] |
Yes, default settings. These are literally "icmpping" and "icmppingsec" witout additional parameters. Interval is 60s. Response time for hosts is relatively low, it's usually around 1-2ms. |
Comment by Aleksandrs Saveljevs [ 2014 Feb 14 ] |
Reopening, so that the issue is not forgotten. You mentioned previously that according to "Administration" -> "Queue" it is simple checks that are queueing. Could you please show by how much are they delayed (i.e., which column, "5 seconds", "10 seconds", "30 seconds", ... are they in)? |
Comment by Robert Jerzak [ 2014 Feb 14 ] |
Queue value jumps from 0 to around 100 only in the "5 seconds" column. Almost every browser refresh of this "Queue" page and the value in "5 seconds" is 0 or ~100. |
Comment by Aleksandrs Saveljevs [ 2014 Feb 17 ] |
How many hosts do you have and how many of them have ping items? I shall try to reproduce the issue in our environemnt with the same settings. |
Comment by Aleksandrs Saveljevs [ 2014 Feb 18 ] |
I have currently tried approximately 100 ICMP ping values per second and the queue is always 0. If we provide a patch for you, for instance, one that adds some debug logging to "zabbix[queue]" item to print out items that are delayed, would it be possible to recompile and run this patched server? |
Comment by Robert Jerzak [ 2014 Feb 18 ] |
I have about 1900 hosts, most of then has one "icmpping" item, some of them has second "icmppingsec" item. Interval of those items is 60s. Sure, I can run zabbix_server with your patch on my testing environment. |
Comment by Aleksandrs Saveljevs [ 2014 Feb 18 ] |
Robert, I have attached debugging-1.patch. It does two things:
The log it will produce might contain private information. Either try to strip it out or send it to me by email at [my-first-name].[my-last-name]@zabbix.com. |
Comment by Robert Jerzak [ 2014 Feb 19 ] |
I've sent you an email with logs. |
Comment by Aleksandrs Saveljevs [ 2014 Feb 19 ] |
The logs that Robert sent us were very useful. There seems to be no problem with Zabbix 2.2.2 compared to 2.2.1, it is just that the changes in So during investigation we have uncovered a behavior of fping that we were not aware of. Suppose we have 1 host to ping, the default interval between pings is 1000 ms and the default timeout is 500 ms. The fping invocation with "-C3" (three pings) in the worst case takes around 2500 ms. Now, suppose we have 10 hosts to ping. We thought that it should also take 2500 ms to ping, however, that is not true. The reason is that apart from "-p" (interval) and "-t" (timeout) options, fping also has "-i" option with the default value of 25 ms, which specifies the interval between successive ping packets (not just to one host, but to all hosts). So pinging 10 hosts in the worst case takes 2000 + 9 * 25 + 500 = 2725. With around 100 hosts, as in the Robert's case, it took nearly 7 seconds to ping all hosts. The problem is doubled by the fact that we launch both fping and fping6, so both invocations can take 14 seconds in total, and that is why there are spikes on the queue. There are two obvious ways to fix that:
Ideally, the solution is a combination of both, or a different approach altogether. |
Comment by Cicero Silva [ 2014 Mar 13 ] |
after upgrading from 2.2.1 to 2.2.2 my queue increased monitoring and packet loss are all negative (-100%). ICMP loss 12 Mar 2014 21:33:25 -100 % |
Comment by Aleksandrs Saveljevs [ 2014 Mar 13 ] |
The icmppingloss issue was fixed already in |
Comment by Tomasz ChruĊciel [ 2014 May 29 ] |
Hi all, watch this output. It seems like a kind of 2s timeout is added to pinger total execution time. I'm not sure for 100%, but before 2.2.2 a pinger execution time was proportional to a number of pinged items. zabbix(/root)# ps -ef|grep icmp zabbix 8681 8649 0 09:03 ? 00:00:01 /usr/sbin/zabbix_server: icmp pinger #1 [got 0 values in 0.000003 sec, idle 1 s Regards |
Comment by Aleksandrs Saveljevs [ 2014 May 29 ] |
Tomasz, the output you provided is correct and is expected. If it takes a pinger 2 seconds to do its job, then it comes from 3 pings with 1 second delay in-between successive pings of the same host. |
Comment by peter erbst [ 2014 Oct 03 ] |
we have plans to monitor about 4000 devices, about 2000 of them also with ping delay and ping loss. currently, with ~ 1300 devices: reduce MAX_ITEMS in pinger.c from 128 to a smaller value - that variable is no longer in the file (zabbix 2.2.4) |
Comment by Aleksandrs Saveljevs [ 2014 Oct 03 ] |
Peter, upgrade to Zabbix 2.4 will probably not help in this matter. However, the variable you are looking to reduce is MAX_PINGER_ITEMS in include/dbcache.h (currently 128). |
Comment by Vladislavs Sokurenko [ 2019 Jun 28 ] |
If anyone still experience the issue, please share your configuration. |
Comment by Vladislavs Sokurenko [ 2019 Jun 28 ] |
One pinger can process 128 items at a time (only items with same configuration are processed in bulk). Time to process those items depend on configuration but can be predicted with the following command if we wish to send 3 pings to each IP address: time fping -c3 -g 127.0.0.1/25 If fping6 is also used then this will take twice longer |
Comment by Michael Veksler [ 2019 Sep 09 ] |
Available in:
|