-
Type:
Problem report
-
Resolution: Unresolved
-
Priority:
Trivial
-
None
-
Affects Version/s: 7.0.27
-
Component/s: Server (S)
-
None
-
Support backlog
A trigger using a time-based function (nodata()) stops being recalculated on schedule and stays in its last value indefinitely, even though the data condition has changed. The trigger value is frozen, state is normal, there is no error. Only a server restart, or editing the trigger expression (which creates a new functionid), restores normal recalculation. zabbix_server -R config_cache_reload and disabling/re-enabling the trigger have no effect.
Time-based triggers are recalculated by per-function timers held in the configuration cache trigger queue. Once such a timer is removed and not recreated, the only event that re-evaluates the function is incoming data; for nodata() the time-based path is the only way it can change, so it never recovers.
Root cause analysis (source: git tag 7.0.27, src/libs/zbxcacheconfig/dbconfig.c)
- The history syncer pops due timers and validates them in trigger_timer_validate() (~line 11219). For a function timer, validation returns FAIL and dc_remove_invalid_timer() (~line 11258) frees the timer and resets function->timer_revision = 0 when, at the instant of validation, either:
- dc_function->revision > timer->revision (the function revision has advanced past the timer), or
- the trigger is momentarily non-functional (TRIGGER_FUNCTIONAL_TRUE != trigger->functional). functional is set FALSE in dc_item_update_trigger_functional() (~line 7422) whenever the host is not monitored or the item is not active.
- Timer (re)creation happens in dc_update_function_timer() (~line 4678). On a running server it is only called for the changed functions of an incremental config sync: the changed-function list is built solely in DCsync_functions() from changed function rows (append at ~line 4906), and dc_schedule_trigger_timers() (~line 4781) is given that list. The full all-functions pass (function_timers == NULL) is performed only in ZBX_DBSYNC_INIT mode, i.e. at server start (call site ~line 8593; list created only in the non-INIT branch ~line 8185). In addition, dc_update_function_timer() returns without creating the timer if the trigger is not functional at that instant (~line 4704), leaving timer_revision at 0.
- Crucially, the functional-state recompute (dc_trigger_update_cache(), ~line 7458) does NOT add anything to the changed-function list. So a function whose row never changes again is never re-fed to the scheduler.
- Consequence: after a function's timer has been removed and timer_revision reset to 0, the timer is recreated only if that function later appears in a changed config sync. If the function row does not change again, the scheduler never revisits it, the timer is never recreated, and the time-based function is never recalculated. The trigger stays in its last value until the server is restarted (full INIT reschedule) or the expression is edited (new functionid).
Steps to reproduce
A. Create an orphaned timer (deterministic)
- On a running server, create a trapper item I on a monitored host H, and a trigger T: nodata(/H/I,2m)=1 (PROBLEM when no data for 2 min).
- Send one value to I (e.g. zabbix_sender). Data is present, so T evaluates to OK and now has a working function timer.
- Disable item I (frontend or item.update status=1). At the next config sync this sets T non-functional (item not active) but does NOT change the function row.
- Wait for the server's time-based re-evaluation to pop T's timer at least once (~2-3 min). During this window trigger_timer_validate() fails (functional=false), the timer is freed, and function->timer_revision is reset to 0.
- Re-enable item I (status=0). T becomes functional again, but its function row is unchanged, so the function is not included in the changed-function list passed to the timer scheduler -> the timer is not recreated.
B. Confirm the timer is orphaned
- Do NOT send any new data to I. Wait past the 2 min window.
- Expected: T -> PROBLEM (no data for > 2 min).
- Actual (bug): T is never recalculated and stays OK indefinitely; the "no data" alert silently never fires.
- zabbix_server -R config_cache_reload -> no effect (incremental sync feeds only changed functions).
- Disable, then re-enable the TRIGGER -> no effect (bumps the trigger row, not the function row).
- Edit T's expression so a new functionid is created (e.g. 2m -> 2m1s) -> T is recalculated and behaves correctly. A server restart has the same effect for all stuck triggers.
The only timing-sensitive step is A4 (the timer must be popped while the trigger is non-functional). Since the server pops due timers continuously and the nodata timer recurs on its own schedule, keeping the item disabled for a couple of recalculation periods makes this reliable.
Conditions observed in production
The same orphaning also arises without manual intervention through the dc_function->revision > timer->revision branch: a race between the configuration syncer (which reschedules a changed function and sets timer_revision=revision) and the history-syncer timer processing (which validates the old timer as revision-behind and resets timer_revision=0 after the syncer already passed the function). In our environment this occurred for nodata(...,"strict") triggers on proxy-monitored hosts, correlated with proxy data-flow / proxy-session churn and short connectivity disruptions. Only triggers actively in PROBLEM and receiving timer evaluations at that instant were affected (2-3 out of a 1405-trigger cohort with identical configuration).
Actual result
The nodata() trigger is never recalculated by the time-based scheduler and stays in its last value. config_cache_reload and trigger enable/disable do not restore it; only a server restart or an expression change does.
Expected result
After function->timer_revision is reset to 0, the function timer must always be (re)created so the trigger continues to be recalculated on schedule. config_cache_reload should be sufficient to recover an orphaned timer without a full server restart.
Diagnostic evidence
Trigger stuck in PROBLEM while the item had no data for ~1.5 h:
SELECT t.triggerid, t.value, t.state, to_timestamp(t.lastchange) AS lastchange, i.key_, to_timestamp(max(hl.clock)) AS last_data, extract(epoch from now())::int - max(hl.clock) AS age_s FROM triggers t JOIN functions f ON f.triggerid = t.triggerid AND f.name = 'nodata' JOIN items i ON i.itemid = f.itemid LEFT JOIN history_log hl ON hl.itemid = i.itemid WHERE t.triggerid = <id> GROUP BY t.triggerid, t.value, t.state, t.lastchange, i.key_; -- value=1 (PROBLEM), state=0 (normal), no error, age_s ~5600 (no data for 1.5 h)
Clearing matrix (each tested):
| action | effect on stuck trigger |
|---|---|
| config_cache_reload | no change (incremental sync feeds only changed functions) |
| trigger disable -> enable | no change (bumps trigger row, not the function row) |
| edit expression (new functionid) | recovers (fresh timer guaranteed) |
| server restart (INIT) | recovers (full all-functions reschedule) |
Suggested fix direction
Ensure every reset of function->timer_revision = 0 is paired with the function being (re)scheduled. For example: include functions whose timer_revision was reset by trigger_timer_validate() / dc_remove_invalid_timer() in the set passed to dc_schedule_trigger_timers() on the next sync, or have config_cache_reload perform a full timer reschedule pass (not only the changed-function list).
Possibly related (not the same issue)
ZBX-12251 (stuck PROBLEM from transaction error), ZBX-22580 (nodata false-positive after proxy disruption), ZBX-18418 (nodata + proxy + restart). No matching fix found after 7.0.27.
Environment
Zabbix server 7.0.27 (HA cluster, failover delay 60 s), PostgreSQL 17 + TimescaleDB 2.27.2. The DB backend is not on the failure path – this is about trigger-recalculation scheduling in the configuration cache, not history storage. ~2251 hosts, ~177k triggers, ~3708 NVPS. Item type: trapper (value_type log).