ZABBIX BUGS AND ISSUES

Events with wrong timestamp during high load on zabbix server -> wrong Availability report

Details

  • Type: Bug Bug
  • Status: Open Open
  • Priority: Critical Critical
  • Resolution: Unresolved
  • Affects Version/s: 1.8.10
  • Fix Version/s: None
  • Component/s: Frontend (F), Server (S)
  • Labels:
  • Environment:
    Linux (EL 6), Mysql (5.1)
  • Zabbix ID:
    RTD

Description

Sometimes we observe a situation when zabbix produces false positive alerts (based on trigger {agent.ping.nodata(180)}=1) during high IO load on the server (i.e. while running a backup on FS, which holds the zabbix database). We suppose that this leads to a problem when events are stored in switched order, which is probably caused by wrong clock values. Although the event IDs seem to be stored in right order (please see the attached pictures). The Availability report generates then graphs with hosts mostly down.

Maybe this is somehow related to #ZBX-4466.
  1. e0.png
    192 kB
    2012 Mar 06 12:31
  2. e1.png
    192 kB
    2012 Mar 06 12:31
  3. O_new_trigger_error.jpg
    250 kB
    2012 Apr 14 18:50

Activity

Hide
Oleksiy Zagorskyi added a comment -

I suppose that I managed to hit in this case several days ago when I played with a trigger expression where nodata(30-60) function has been used.

  1. ./zabbix_server18 -V
    Zabbix Server v1.8.11rc1 (revision 25522) (28 December 2011)
    Compilation time: Feb 22 2012 10:42:33
Show
Oleksiy Zagorskyi added a comment - I suppose that I managed to hit in this case several days ago when I played with a trigger expression where nodata(30-60) function has been used.
  1. ./zabbix_server18 -V Zabbix Server v1.8.11rc1 (revision 25522) (28 December 2011) Compilation time: Feb 22 2012 10:42:33
Hide
Daniel Kontsek added a comment -

It's mentioned in the bug report - {agent.ping.nodata(180)}=1

Show
Daniel Kontsek added a comment - It's mentioned in the bug report - {agent.ping.nodata(180)}=1
Hide
Oleksiy Zagorskyi added a comment - - edited

Very similar issue is ZBX-4763, maybe even a source of the problem is the same.

<ADDED> also similar issue ZBX-6170

Show
Oleksiy Zagorskyi added a comment - - edited Very similar issue is ZBX-4763, maybe even a source of the problem is the same. <ADDED> also similar issue ZBX-6170
Hide
Oleksiy Zagorskyi added a comment -

Heh, I would like to share with my opinion.

At pictures attached by Daniel (the issue reporter) we see repeated several PROBLEM events in a row. It's not very clear why that happened. I cannot imagine.
Additionally we don't know update interval for that item, maybe it is 180 seconds ? (the same as trigger function).
But in the events we see that OK event generated exactly at start of minute. OK events can be generated only by "db syncer" process when some value is received and the trigger is in PROBLEM state.
PROBLEM events can be generated only by "timer" process when trigger is in OK or UNKNOWN (because of server restart) states.

So, I suppose data came from the item exactly at the start of minute (processed by "db syncer"), and we know that "timer" executed every 30 seconds exactly at 00 and 30 seconds.

The events, to which Daniel draws attention, are less interesting for me than other events.
For example:
10:16:00 - PROBLEM,
10:16:00 - OK
and
00:04:00 - PROBLEM,
00:04:01 - OK

I can show more clear case. See attached "O_new_trigger_error.jpg", there you will find all detail.
We see that the trigger has been processed by two processes almost in the same time.
11:02:31 = eventID 10959666 - "timer" process generated PROBLEM event
11:02:30 = eventID 10959668 - "db syncer" process generated OK event

Here is not clear why "db syncer" decided that the trigger is in PROBLEM state. It's possible that "timer" already changed it to PROBLEM (in some cache or in the table)
And later "db syncer" probably changed the state to OK. :/

So we should prevent such cases somehow.
Zabbix server version is 1.8.6 in this last example.

Show
Oleksiy Zagorskyi added a comment - Heh, I would like to share with my opinion. At pictures attached by Daniel (the issue reporter) we see repeated several PROBLEM events in a row. It's not very clear why that happened. I cannot imagine. Additionally we don't know update interval for that item, maybe it is 180 seconds ? (the same as trigger function). But in the events we see that OK event generated exactly at start of minute. OK events can be generated only by "db syncer" process when some value is received and the trigger is in PROBLEM state. PROBLEM events can be generated only by "timer" process when trigger is in OK or UNKNOWN (because of server restart) states. So, I suppose data came from the item exactly at the start of minute (processed by "db syncer"), and we know that "timer" executed every 30 seconds exactly at 00 and 30 seconds. The events, to which Daniel draws attention, are less interesting for me than other events. For example: 10:16:00 - PROBLEM, 10:16:00 - OK and 00:04:00 - PROBLEM, 00:04:01 - OK I can show more clear case. See attached "O_new_trigger_error.jpg", there you will find all detail. We see that the trigger has been processed by two processes almost in the same time. 11:02:31 = eventID 10959666 - "timer" process generated PROBLEM event 11:02:30 = eventID 10959668 - "db syncer" process generated OK event Here is not clear why "db syncer" decided that the trigger is in PROBLEM state. It's possible that "timer" already changed it to PROBLEM (in some cache or in the table) And later "db syncer" probably changed the state to OK. :/ So we should prevent such cases somehow. Zabbix server version is 1.8.6 in this last example.
Hide
Daniel Kontsek added a comment -

Item key: agent.ping
Item type: Zabbix Agent
Item update interval: 60 s
Trigger: {agent.ping.nodata(180)}=1

Show
Daniel Kontsek added a comment - Item key: agent.ping Item type: Zabbix Agent Item update interval: 60 s Trigger: {agent.ping.nodata(180)}=1
Hide
Daniel Kontsek added a comment -

Any news regarding this problem?

Show
Daniel Kontsek added a comment - Any news regarding this problem?

People

Vote (0)
Watch (5)

Dates

  • Created:
    Updated: