[ZBX-4852] external checks scheduled to run once a week no longer working Created: 2012 Apr 10 Updated: 2017 May 30 Resolved: 2012 Sep 04 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Frontend (F), Proxy (P), Server (S) |
Affects Version/s: | 1.8.11rc1 |
Fix Version/s: | 2.0.3rc1, 2.1.0 |
Type: | Incident report | Priority: | Critical |
Reporter: | Aleksandrs Saveljevs | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | flexibleintervals, items, regression | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
r26149 |
Attachments: | calculate_item_nextcheck-new.c |
Description |
We have a setup, described in To recap, we have an external item with a flexible interval "1,12:45-12:46" with a delay of 60 seconds. So the item should run once a week on Mondays around that time. It used to work fine, but on March 17, the server was upgraded to r26149. After that, the item was checked once, on Monday, March 19, but never since then. |
Comments |
Comment by richlv [ 2012 Apr 10 ] | ||||||||||||
is it known what version/revision the server was running before the upgrade ? | ||||||||||||
Comment by Aleksandrs Saveljevs [ 2012 Apr 10 ] | ||||||||||||
Unfortunately, no. We only have {ZABBIX_REVISION}in our logs for binaries built from svn checkout. The only thing I know is that before the upgrade the version also said "1.8.11rc1". | ||||||||||||
Comment by dimir [ 2012 Apr 11 ] | ||||||||||||
Cannot reproduce in the latest 1.8 doing manual system date adjust. Next week and the week after that the item is run. Is there a relevant zabbix_server.log part available? | ||||||||||||
Comment by Aleksandrs Saveljevs [ 2012 Apr 11 ] | ||||||||||||
Yes, there is a zabbix_server.log file that corresponds to DebugLevel=3, but no relevant errors there. The only errors that are there are "[Z3005] query failed: [2006] MySQL server has gone away...", but that is a different story. | ||||||||||||
Comment by dimir [ 2012 Apr 11 ] | ||||||||||||
Could you give me the item id of that external check? | ||||||||||||
Comment by Aleksandrs Saveljevs [ 2012 Apr 12 ] | ||||||||||||
There are two such items: 23106 (checked at "1,03:45-03:46" with 60 second interval) and 23107 (checked at "1,12:45-12:46" with 60 second interval). Both were last checked on Monday, March 19: "19 Mar 2012 03:45:06" and "19 Mar 2012 12:45:07", respectively. | ||||||||||||
Comment by Aleksandrs Saveljevs [ 2012 Apr 12 ] | ||||||||||||
Both items have the main update interval set to 0. Both are "Active", no error messages in the "Error" column. | ||||||||||||
Comment by dimir [ 2012 Apr 16 ] | ||||||||||||
Still can't reproduce with latest 1.8. I have an "External check" item that prints system date. Here is what I get (I set system clock to the future): mysql> select * from history_str itemid clock value What are the corresponding seconds of the check, in my case it's 6 and 7. | ||||||||||||
Comment by Aleksandrs Saveljevs [ 2012 Apr 16 ] | ||||||||||||
In my case, there are "...:06" and "...:07", too: mysql> select itemid, lastclock, from_unixtime(lastclock), lastvalue from items where itemid in (23106, 23107);
-------
------- Just a quick idea: on March 25, clock in Latvia was adjusted for summer time. Maybe that has something to do with it. | ||||||||||||
Comment by dimir [ 2012 Apr 17 ] | ||||||||||||
Yeah, first, sasha thought it could be the case but in theory it shouldn't. | ||||||||||||
Comment by Alexander Vladishev [ 2012 May 11 ] | ||||||||||||
The function calculate_item_nextcheck() not in all cases works correctly. For example: [itemid|flexible_interval] [UTC time] human readabe time => [UTC time] human readabe time [effective delay] | ||||||||||||
Comment by Andris Mednis [ 2012 Jun 20 ] | ||||||||||||
It seems that the problem is caused by function get_next_delay_interval() (file src/libs/zbxcommon/misc.c). If time happens to be within a flex interval, this function returns time equal to the last second of THIS interval (as the upper bound is excluded). Problem can be solved by correcting one line in get_next_delay_interval(): | ||||||||||||
Comment by Andris Mednis [ 2012 Jun 25 ] | ||||||||||||
I propose 2 changes to fix this problem:
The proposed fix is in development branch svn://svn.zabbix.com/branches/dev/ZBX-4852, r28431. Question 1. What is the right value of 'effective_delay' which comes out of calculate_item_nextcheck() ? <Sasha> I think here it is necessary to rework queue calculation. Having probably removed this parameter. Question 2. Is it ok if calculate_item_nextcheck() returns the 'nextcheck' value up to "now+SEC_PER_YEAR -1" (as "now+SEC_PER_YEAR" is reserved for disabled checks) ? <Sasha> Theoretically such situation shan't be .We don't calculate nextcheck for disabled items. | ||||||||||||
Comment by Andris Mednis [ 2012 Aug 16 ] | ||||||||||||
Errors were found in the previous fix. | ||||||||||||
Comment by Andris Mednis [ 2012 Aug 20 ] | ||||||||||||
Errors were found in the previous fix, too. | ||||||||||||
Comment by richlv [ 2012 Aug 20 ] | ||||||||||||
heh. once a solution is finalised, would be great to describe the logic in detail | ||||||||||||
Comment by Alexander Vladishev [ 2012 Aug 21 ] | ||||||||||||
(1) Function return an incorrect next check: Date : 2012.08.20 18:08:37 delay | flexible intervals | now | expected | result | Tests failed : 2 <Sasha> CLOSED | ||||||||||||
Comment by Alexander Vladishev [ 2012 Aug 21 ] | ||||||||||||
(2) Compilation warnings <Sasha> CLOSED | ||||||||||||
Comment by Andris Mednis [ 2012 Aug 21 ] | ||||||||||||
A new fix is in development branch svn://svn.zabbix.com/branches/dev/ZBX-4852, r29684. | ||||||||||||
Comment by Alexander Vladishev [ 2012 Aug 22 ] | ||||||||||||
(3) Please review my changes in r29691:29705 <Andris> Reviewed changes in C code (file src/libs/zbxcommon/misc.c). Agree. <pavels> The code formatting doesn't match PHP guidelines (comments, underscores in variable names, missing brackets in some places). <Sasha> RESOLVED in r29806 and r30052 <pavels> One more thing: the comments should be written using // and on a new line. Corrected a typo in r30068. <Sasha> RESOLVED in r30096. <pavels> CLOSED. | ||||||||||||
Comment by Andris Mednis [ 2012 Sep 04 ] | ||||||||||||
Fixed in versions pre-2.0.3 rev. 30098 and pre-2.1.0 rev. 30099. |