[ZBX-4852] external checks scheduled to run once a week no longer working Created: 2012 Apr 10  Updated: 2017 May 30  Resolved: 2012 Sep 04

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Frontend (F), Proxy (P), Server (S)
Affects Version/s: 1.8.11rc1
Fix Version/s: 2.0.3rc1, 2.1.0

Type: Incident report Priority: Critical
Reporter: Aleksandrs Saveljevs Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: flexibleintervals, items, regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

r26149


Attachments: File calculate_item_nextcheck-new.c    

 Description   

We have a setup, described in ZBX-3961, and it has stopped working recently after an upgrade.

To recap, we have an external item with a flexible interval "1,12:45-12:46" with a delay of 60 seconds. So the item should run once a week on Mondays around that time.

It used to work fine, but on March 17, the server was upgraded to r26149. After that, the item was checked once, on Monday, March 19, but never since then.



 Comments   
Comment by richlv [ 2012 Apr 10 ]

is it known what version/revision the server was running before the upgrade ?

Comment by Aleksandrs Saveljevs [ 2012 Apr 10 ]

Unfortunately, no. We only have

{ZABBIX_REVISION}

in our logs for binaries built from svn checkout. The only thing I know is that before the upgrade the version also said "1.8.11rc1".

Comment by dimir [ 2012 Apr 11 ]

Cannot reproduce in the latest 1.8 doing manual system date adjust. Next week and the week after that the item is run. Is there a relevant zabbix_server.log part available?

Comment by Aleksandrs Saveljevs [ 2012 Apr 11 ]

Yes, there is a zabbix_server.log file that corresponds to DebugLevel=3, but no relevant errors there.

The only errors that are there are "[Z3005] query failed: [2006] MySQL server has gone away...", but that is a different story.

Comment by dimir [ 2012 Apr 11 ]

Could you give me the item id of that external check?

Comment by Aleksandrs Saveljevs [ 2012 Apr 12 ]

There are two such items: 23106 (checked at "1,03:45-03:46" with 60 second interval) and 23107 (checked at "1,12:45-12:46" with 60 second interval). Both were last checked on Monday, March 19: "19 Mar 2012 03:45:06" and "19 Mar 2012 12:45:07", respectively.

Comment by Aleksandrs Saveljevs [ 2012 Apr 12 ]

Both items have the main update interval set to 0. Both are "Active", no error messages in the "Error" column.

Comment by dimir [ 2012 Apr 16 ]

Still can't reproduce with latest 1.8. I have an "External check" item that prints system date. Here is what I get (I set system clock to the future):

mysql> select * from history_str

itemid clock value
23106 1335141906 Mon Apr 23 03:45:06 EEST 2012
23107 1335174307 Mon Apr 23 12:45:07 EEST 2012
23106 1335746706 Mon Apr 30 03:45:06 EEST 2012
23107 1335779107 Mon Apr 30 12:45:07 EEST 2012
23106 1336351506 Mon May 7 03:45:06 EEST 2012
23107 1336383907 Mon May 7 12:45:07 EEST 2012
23106 1336956306 Mon May 14 03:45:06 EEST 2012
23107 1336988707 Mon May 14 12:45:07 EEST 2012 <- zabbix server restart after this check
23106 1337561106 Mon May 21 03:45:06 EEST 2012
23107 1337593507 Mon May 21 12:45:07 EEST 2012 <- zabbix server restart after this check
23106 1338165906 Mon May 28 03:45:06 EEST 2012
23107 1338198307 Mon May 28 12:45:07 EEST 2012
23106 1338770706 Mon Jun 4 03:45:06 EEST 2012
23107 1338803107 Mon Jun 4 12:45:07 EEST 2012

What are the corresponding seconds of the check, in my case it's 6 and 7.

Comment by Aleksandrs Saveljevs [ 2012 Apr 16 ]

In my case, there are "...:06" and "...:07", too:

mysql> select itemid, lastclock, from_unixtime(lastclock), lastvalue from items where itemid in (23106, 23107);
---------------------------------------------------+

itemid lastclock from_unixtime(lastclock) lastvalue

---------------------------------------------------+

23106 1332121506 2012-03-19 03:45:06 1
23107 1332153907 2012-03-19 12:45:07 1

---------------------------------------------------+

Just a quick idea: on March 25, clock in Latvia was adjusted for summer time. Maybe that has something to do with it.

Comment by dimir [ 2012 Apr 17 ]

Yeah, first, sasha thought it could be the case but in theory it shouldn't.

Comment by Alexander Vladishev [ 2012 May 11 ]

The function calculate_item_nextcheck() not in all cases works correctly.

For example:

[itemid|flexible_interval] [UTC time] human readabe time => [UTC time] human readabe time [effective delay]
[23106|60/1,12:45-12:46] [1335779098] 2012.04.30 12:44:58 => [1335779106] 2012.04.30 12:45:06 [60]
[23106|60/1,12:45-12:46] [1335779099] 2012.04.30 12:44:59 => [1335779106] 2012.04.30 12:45:06 [60]
[23106|60/1,12:45-12:46] [1335779100] 2012.04.30 12:45:00 => [1356071106] 2012.12.21 08:25:06 [31536000]
[23106|60/1,12:45-12:46] [1335779101] 2012.04.30 12:45:01 => [1336383906] 2012.05.07 12:45:06 [60]

Comment by Andris Mednis [ 2012 Jun 20 ]

It seems that the problem is caused by function get_next_delay_interval() (file src/libs/zbxcommon/misc.c). If time happens to be within a flex interval, this function returns time equal to the last second of THIS interval (as the upper bound is excluded). Problem can be solved by correcting one line in get_next_delay_interval():
next = now - sec + sec2;
to
next = now - sec + sec2 + 1;

Comment by Andris Mednis [ 2012 Jun 25 ]

I propose 2 changes to fix this problem:

  • in get_next_delay_interval() change the line "next = now - sec + sec2;" to "next = now - sec + sec2 + 1;" for consistency (to return always BEGINNING of the next interval, not sometimes end of the current interval),
  • significantly change calculate_item_nextcheck() (see attachment calculate_item_nextcheck-new.c)

The proposed fix is in development branch svn://svn.zabbix.com/branches/dev/ZBX-4852, r28431.

Question 1. What is the right value of 'effective_delay' which comes out of calculate_item_nextcheck() ?
For example, if flexible interval is "60/1,12:45-12:46", itemid=60 and current time is "Mon Apr 30 12:44:59 2012", the nextcheck is after 1 second - at "12:45:00", but the 'effective_delay' is 60 seconds. After 1 second, at 12:45:00, the nextcheck moves to next week - to "Mon May 7 12:45:00 2012". What is the right 'effective_delay' now ? Still 60 sec ? 604800 sec (a difference between "Mon Apr 30 12:44:59" and "Mon May 7 12:45:00") ? A wrong 'effective_delay' value will produce a wrong value of internal parameter "zabbix["queue"<from>,<to>]".

<Sasha> I think here it is necessary to rework queue calculation. Having probably removed this parameter.

Question 2. Is it ok if calculate_item_nextcheck() returns the 'nextcheck' value up to "now+SEC_PER_YEAR -1" (as "now+SEC_PER_YEAR" is reserved for disabled checks) ?
Or 'nextcheck' must not exceed 'now+SEC_PER_WEEK-1" ? Zabbix PHP frontend apparently allows to set item update interval up to SEC_PER_DAY (once a day). Flexible intervals allow up to once a week.

<Sasha> Theoretically such situation shan't be .We don't calculate nextcheck for disabled items.

Comment by Andris Mednis [ 2012 Aug 16 ]

Errors were found in the previous fix.
A new fix is in development branch svn://svn.zabbix.com/branches/dev/ZBX-4852, r29614.

Comment by Andris Mednis [ 2012 Aug 20 ]

Errors were found in the previous fix, too.
A new fix is in development branch svn://svn.zabbix.com/branches/dev/ZBX-4852, r29650.

Comment by richlv [ 2012 Aug 20 ]

heh. once a solution is finalised, would be great to describe the logic in detail

Comment by Alexander Vladishev [ 2012 Aug 21 ]

(1) Function return an incorrect next check:

Date : 2012.08.20 18:08:37
Period : 2012.08.20 00:00:00 - 2012.08.21 00:00:00
itemid : 3600

delay | flexible intervals | now | expected | result |
---------------------------------------------------------------------------------------------------+------
3600 | 1/1,12:05-12:10;2/1,12:08-12:15 | 2012.08.20 12:14:58 | 2012.08.20 13:00:00 | 2012.08.20 12:15:00 | 3600/3600
3600 | 1/1,12:05-12:10;2/1,12:08-12:15 | 2012.08.20 12:14:59 | 2012.08.20 13:00:00 | 2012.08.20 12:15:00 | 3600/3600
---------------------------------------------------------------------------------------------------+------

Tests failed : 2

<Sasha> CLOSED

Comment by Alexander Vladishev [ 2012 Aug 21 ]

(2) Compilation warnings
misc.c: In function ‘calculate_item_nextcheck’:
misc.c:638: warning: ‘current_delay’ may be used uninitialized in this function
misc.c:626: warning: ‘nextcheck’ may be used uninitialized in this function

<Sasha> CLOSED

Comment by Andris Mednis [ 2012 Aug 21 ]

A new fix is in development branch svn://svn.zabbix.com/branches/dev/ZBX-4852, r29684.

Comment by Alexander Vladishev [ 2012 Aug 22 ]

(3) Please review my changes in r29691:29705

<Andris> Reviewed changes in C code (file src/libs/zbxcommon/misc.c). Agree.

<pavels> The code formatting doesn't match PHP guidelines (comments, underscores in variable names, missing brackets in some places).

<Sasha> RESOLVED in r29806 and r30052

<pavels> One more thing: the comments should be written using // and on a new line. Corrected a typo in r30068.

<Sasha> RESOLVED in r30096.

<pavels> CLOSED.

Comment by Andris Mednis [ 2012 Sep 04 ]

Fixed in versions pre-2.0.3 rev. 30098 and pre-2.1.0 rev. 30099.

Generated at Fri Mar 29 13:06:14 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.