[ZBX-4732] Events with wrong timestamp during high load on zabbix server -> wrong Availability report Created: 2012 Mar 06 Updated: 2017 May 30 Resolved: 2015 Aug 13 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Frontend (F), Server (S) |
Affects Version/s: | 1.8.10 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Critical |
Reporter: | Daniel Kontsek | Assignee: | Unassigned |
Resolution: | Cannot Reproduce | Votes: | 1 |
Labels: | nodata, timer | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Linux (EL 6), Mysql (5.1) |
Attachments: | O_new_trigger_error.jpg e0.png e1.png zabbix_1.png zabbix_2.png | ||||
Issue Links: |
|
Description |
Sometimes we observe a situation when zabbix produces false positive alerts (based on trigger {agent.ping.nodata(180)}=1) during high IO load on the server (i.e. while running a backup on FS, which holds the zabbix database). We suppose that this leads to a problem when events are stored in switched order, which is probably caused by wrong clock values. Although the event IDs seem to be stored in right order (please see the attached pictures). The Availability report generates then graphs with hosts mostly down. Maybe this is somehow related to # |
Comments |
Comment by Oleksii Zagorskyi [ 2012 Mar 07 ] |
I suppose that I managed to hit in this case several days ago when I played with a trigger expression where nodata(30-60) function has been used.
|
Comment by Daniel Kontsek [ 2012 Mar 07 ] |
It's mentioned in the bug report - {agent.ping.nodata(180)}=1 |
Comment by Oleksii Zagorskyi [ 2012 Mar 15 ] |
Very similar issue is <ADDED> also similar issue |
Comment by Oleksii Zagorskyi [ 2012 Apr 14 ] |
Heh, I would like to share with my opinion. At pictures attached by Daniel (the issue reporter) we see repeated several PROBLEM events in a row. It's not very clear why that happened. I cannot imagine. So, I suppose data came from the item exactly at the start of minute (processed by "db syncer"), and we know that "timer" executed every 30 seconds exactly at 00 and 30 seconds. The events, to which Daniel draws attention, are less interesting for me than other events. I can show more clear case. See attached "O_new_trigger_error.jpg", there you will find all detail. Here is not clear why "db syncer" decided that the trigger is in PROBLEM state. It's possible that "timer" already changed it to PROBLEM (in some cache or in the table) So we should prevent such cases somehow. |
Comment by Daniel Kontsek [ 2012 Apr 17 ] |
Item key: agent.ping =1 |
Comment by Daniel Kontsek [ 2012 Jun 05 ] |
Any news regarding this problem? |
Comment by Cristian Mammoli [ 2014 Jun 06 ] |
Hi, we are having the exact same problem (see attachments) Zabbix 2.2.3, DB PostgreSQL 9.2 |
Comment by Roelof Spijker [ 2014 Sep 26 ] |
Seeing a very similar issue here on 2.2.3 with mysql. Events are generated in the incorrect order. The real order would be: Up - Down for 1 second - Up. But they are ordered as Up - Up - Down for 1 second. This causes the SLA to record it as being down up until the next issue occurs and is resolved. It's fixable by decreasing the clock in the DB for the events and service_alarms, but I'm not sure why it's happening in the first place. |
Comment by Oleksii Zagorskyi [ 2015 Aug 13 ] |
I feel that this issue is not very actual as for recent zabbix versions (2.4+). Feel free to ask to reopen if you think I did wrong thing. Just note that there are ZBX-8556 and ZBX-9432 which may look similar to current issue, but they are different. |