-
Incident report
-
Resolution: Fixed
-
Major
-
1.4.3
-
None
OK so I am going to use exactly what is listed in the manual.
I wrote a bash shell script that will generate nothing but "heartbeats" every 5 seconds till I manually stop it (aka infinite loop).
#!/bin/sh
i=1
while [ $i -ne 0 ]
do
echo 12
i=$(($i+1))
sleep 5
done
I run the script by simply doing:
./loop.sh > /opt/zabbix/mytest.log
so my log file looks like
12
12
12
12
12
12
12
...ect. ect. ect.
I have Zabbix configured to monitor the log as Item:
type: Zabbix agent (active)
Key: log[/opt/zabbix/mytest.log]
Type: log
Update interval: 1 sec.
My Trigger is configured as the way I understand it from the manual (release 009 pg 112):
{zabbixserver:log[/opt/zabbix/mytest.log].count(600,12)}<107
Information (the log file) shows up correctly in the latest data history.
1 "heartbeat" every 5 seconds for 600 seconds will generate 120 "heartbeats" or rather will print 12 to the log file 120 times. If less then 107 are found then we have not gotten a heartbeat in the past minute or so, therefore, the alert should trigger.
However, as soon as the trigger is enabled it turns true regardless of log status. If there are already 130 12's present (more then 107) it turns true. If there are 107 or less or none it turns true (which the latter two make sense because there are less then 107 and hence the trigger should be true).
Starting out with a blank log file, and waiting 15 minutes for the log file to fill with 12's (180 is a lot more then 107) does not turn the trigger to FALSE as it should.
Stopping the script so that nothing more is added to the log file should trigger something. So I stopped the script and waited. After 15 minutes of no update to the log at all, the trigger finally went False. When the heartbeats started again, the trigger went true. That doesn't make any sense at all. 0<107=True. Why was the trigger off when it should be on? If the count function is supposed to return the amount of values found in the past X amount of seconds then why does it not trigger properly?
Thinking that maybe the trigger needed quotes around the expression (since this was a log file and hence dealing with strings) I also tried this as my Trigger and got the same results except that the trigger never turned off even after the time limit.
{zabbixserver:log[/opt/zabbix/mytest.log].count(600,"12")}<107
According to the manual and I quote "count(600,12) will return exact number of values equal to '12' stored in the history." Why does it seem to never have the right value? Why does this expression not work? All I want to be able to do is track a heartbeat event and trigger if the heartbeat event has not been heard-from/seen in X amount of time. Maybe the count function can not be used in this manner but surely someone out there has something to monitor the failure of heartbeat events.
Can anyone please tell me what/if I am doing wrong? If anyone has a different way of tracking a heartbeat from a log file I would appreciate the help in pointing me in the right direction. An example of the log file that I actually have is a couple of posts up.
[EDIT] Even though it would not make much logical sense, I tried something else. Because the trigger is doing opposite of what I want, I tried reversing the trigger to be:
{zabbixserver:log[/opt/zabbix/mytest.log].count(600,12)}>107
Now I can not get it to trigger at all. So that doesn't work either.