[ZBXNEXT-2452] "Multiple PROBLEM events generation" and Timer process Created: 2014 Sep 09 Updated: 2025 Apr 21 |
|
Status: | Open |
Project: | ZABBIX FEATURE REQUESTS |
Component/s: | Server (S) |
Affects Version/s: | 2.0.12, 2.2.6, 3.0.13, 3.2.10, 3.4.4, 4.0.32, 5.0.14, 5.4.3 |
Fix Version/s: | None |
Type: | Change Request | Priority: | Major |
Reporter: | Constantin Oshmyan | Assignee: | Unassigned |
Resolution: | Unresolved | Votes: | 43 |
Labels: | multiple, timer, triggers, usability | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Trigger with "Multiple PROBLEM events generation" in combination with timer-related functions: nodata(), date(), dayofmonth(), dayofweek(), time(), now() |
Attachments: |
![]() ![]() ![]() ![]() |
||||||||||||
Issue Links: |
|
Description |
According to documentation:
It is good practice; however, there is a problem if trigger has option "Multiple PROBLEM events generation" turned ON. In this case there is a possibility that the timer process could generate "trigger goes to the PROBLEM state" event every 30 second (when all conditions are TRUE, even in the case when no new datas arrived). My suggestion is the following: all trigger calculations performed by the timer process should be made as the "Multiple PROBLEM events generation" option is OFF, independently of its real setting. In other words: if all conditions are TRUE and the trigger is already in the PROBLEM state, the new event should not be generated. At the same time, if some of conditions becomes FALSE and trigger was in PROBLEM state, it should be closed (i.e. event "trigger goes to the OK state" must be generated). The combination of timer-related trigger functions and "Multiple PROBLEM events generation" option is not very widespread. However, when this combination is used, it causes to difficult-to-understanding problems. The most typical problem - double or multiple alerts (some exapmles: ZBX-8114, ZBX-4732, ZBX-6170). |
Comments |
Comment by Constantin Oshmyan [ 2014 Sep 09 ] |
Example 1: Using "nodata()" function to close trigger automatically by time-outTaskIt is necessary to have a trigger for Windows Application Log: if a new message with the severity "Error" or "Critical" appeared, it should be forwarded the administrator by e-mail with a delay maximum 1 minute. Item (in the appropriate Template)Key: eventlog[Application,,Error|Critical",,,100,] (Trigger in the same Template)Name: New error message in Windows Application log ({Template OS Windows:eventlog[Application,,"Error|Critical",,,100,].logseverity(0)}=4 | {Template OS Windows:eventlog[Application,,"Error|Critical",,,100,].logseverity(0)}=9 ) & {Template OS Windows:eventlog[Application,,"Error|Critical",,,100,].nodata(30)}#1 Multiple PROBLEM events generation: ON ActionAction has the following logic: ResultsIf several error messages appeared in the given interval (30 seconds), then all messages are delivered successfully. However, the last message is delivered twice: the 1-st time upon new datas receiving from the Agent, and the 2-nd time - generated by the timer process. If the timeout for nodata() function is longer, then the last message is repeated every 30 seconds: for example, for 10 minutes (to have possibility for operator to see it on the Web-console) it repeates 21 times. |
Comment by Aleksandrs Saveljevs [ 2014 Sep 09 ] |
The first and second item references in your trigger seem to be identical. You might wish to simplify that. |
Comment by Constantin Oshmyan [ 2014 Sep 09 ] |
asaveljevs, thank you! I've fixed this example (logseverity are different: ERROR and CRITICAL). Example 2: using "time()" function as an additional condition clauseInitial TaskIt is necessary to monitor a log.file of some application for error messages (lines containing "ERROR" string) for notification of the appropriate administrator. Itemlog[/var/log/myApp/myApp.log,ERROR,,,skip,] Trigger{Host:log[/var/log/myApp/myApp.log,ERROR,,,skip,].str(ERROR)}=1 As in example 1, the "Multiple PROBLEM events generation" should be enabled to avoid missing of some messages. ResultIt works OK. However, every night (between midnight and 02:00) the database performs an offline backup, it cause to some error messages in the log that could be ignored. Modified TaskIt is necessary to monitor this log.file for error messages only after 02:00 AM. Modified Trigger{Host:log[/var/log/myApp/myApp.log,ERROR,,,skip,].str(ERROR)}=1 & {Host:log[/var/log/myApp/myApp.log,ERROR,,,skip,].time(0)}>020000 ResultDespite of minimal changes, the result will very differ. If some error occurs in this log file after 02:00 AM, then the event "Trigger goes to PROBLEM state" will be generated every 30 seconds by the timer process; in this example - the rest of day up to midnight... |
Comment by Oleg Ivanivskyi [ 2014 Sep 10 ] |
Looks like a regex to find A and not B on a line could be a workaround in the example 2. Of course, it will not help in the first example. |
Comment by Constantin Oshmyan [ 2015 Jan 26 ] |
I agree that in some cases it's possible. If the monitored log file includes the clearly formatted timestamp, for example log of Zabbix-server: 7294:20141227:100121.813 SNMP agent item "ifNumber" on host "CiscoV-SW1" failed: first network error, wait for 15 seconds then a trigger expression could be re-formulated to use a regexp() instead of time() function, something like the following: {Host:log[/tmp/zabbix_server.log,error,,,skip,].str(error)}=1 & {Host:log[/tmp/zabbix_server.log,error,,,skip,].regexp([0-9]*:[0-9]{8}:0[01][0-9]{4}\.[0-9]{3})}#1 I.e. "if the timestamp in this record of log file has the hour==00 or hour==01, then ignore". However, in other cases it is difficult or impossible to use just regexp. For example, in the Windows Event logs the timestamp is a separate field; many Java applications have a multi-line error messages (where the timestamp and the message text are on different lines), some applications could have their timestamps in the same format that could occurs in the message text also, etc. After all, using the time() function is just more understandable. |
Comment by Oleksii Zagorskyi [ 2016 Sep 19 ] |
|
Comment by Constantin Oshmyan [ 2017 Dec 05 ] |
Unfortunately, all new versions still have this trouble. |
Comment by Victor [ 2018 Nov 01 ] |
Agree, this is very useful feature! Voted. |
Comment by Constantin Oshmyan [ 2019 Oct 14 ] |
Just a reminder as this problem is still actual. And it still exists in v4.0.x also (and, probably, 4.2 and 4.4 as well). |
Comment by Constantin Oshmyan [ 2021 Aug 06 ] |
Just once more reminder about this problem importance. It still does exist in versions 5.0 (LTS) and, probably, 5.2 and 5.4 also. |
Comment by Constantin Oshmyan [ 2022 Jun 02 ] |
Reminder: this problem is still actual. |
Comment by Marcel Renner [ 2022 Jul 13 ] |
+1 Voted as well! For example, we would simply like to have the successful and failed logins (from each login) as a separate info event in Zabbix. Therefore only multiple event generation is practicable to not miss any events. But the event should close automatically after X minutes. Due to the mentioned issue this can't be implemented, which makes eventlog, log, logrt and snmptrap quite useless (at least for simple info events that don't send a resolved notification). With this workaround there is probably a way, but Zabbix should offer something more user friendly. FYI, 6.2.x is still affected. |
Comment by Dimitri Bellini [ 2022 Sep 16 ] |
Hi DevTeam, |
Comment by Constantin Oshmyan [ 2023 Oct 10 ] |
alexei, just reminder, as discussed at Zabbix Summit 2023 |
Comment by Vladislavs Sokurenko [ 2023 Oct 11 ] |
Possible fix to explore as a starting point (not released yet): |
Comment by Constantin Oshmyan [ 2023 Oct 11 ] |
vso, is your possible fix for which version of Zabbix, please? |
Comment by Vladislavs Sokurenko [ 2023 Oct 11 ] |
constantin.oshmyan attached both for 6.4 and 7.0, the fix is as per your request in description. |
Comment by Constantin Oshmyan [ 2023 Oct 11 ] |
vso, great, thanks! Can this patch be applied also to the v6.0 (the current LTS version)? |
Comment by dimir [ 2023 Oct 11 ] |
It's only a quick research thing to try out, if you have a chance. For a proper thing this needs to be code-reviewed, tested, documented and so on. Another thing that needs to be considered here is a possible regression for those that need an every 30-second alarm. Maybe a checkbox should be added. Anyway, we wanted to quickly check how complicated the fix might be here and if you can try this maybe become aware of some possible side effects/regressions early. |
Comment by Vladislavs Sokurenko [ 2023 Oct 11 ] |
constantin.oshmyan sure, added patch for 6.0 ZBXNEXT-2452-6.0.diff |
Comment by Constantin Oshmyan [ 2023 Oct 11 ] |
Thank you, guys! |
Comment by Constantin Oshmyan [ 2023 Oct 20 ] |
vso, dimir I've tried this patch in the test environment with the current v6.0.22 release; it works great!
Yes, I understand your doubts. |
Comment by dimir [ 2023 Oct 21 ] |
Thanks constantin.oshmyan! What you're saying makes sense. We'll discuss it internally and let you know. |
Comment by Constantin Oshmyan [ 2024 Mar 26 ] |
dimir, what are the results of these discussions? Are there any chances this feature could be implemented in the v7.0 (nearest LTS version), or will it be postponed for a few more years? |
Comment by Constantin Oshmyan [ 2024 May 03 ] |
|
Comment by Constantin Oshmyan [ 2024 Oct 04 ] |
Oh, we can celebrate the 10-years anniversary of this ticket! |
Comment by Chintan Jain [ 2025 Apr 21 ] |
What is the solution for this? I am facing same problem, once the alert is resolved, it sends multiple actions. version 7 |