[ZBX-9939] wrong events (ok|problem) for item(trapper) sending via zabbix_sender in the same time Created: 2015 Oct 06 Updated: 2017 May 30 Resolved: 2016 Sep 30 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 2.4.5 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Blocker |
Reporter: | Natalia Kagan | Assignee: | Unassigned |
Resolution: | Duplicate | Votes: | 0 |
Labels: | events, triggers | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
zabbix 2.4.4 on centos 6 |
Attachments: |
![]() ![]() ![]() ![]() ![]() ![]() |
||||||||
Issue Links: |
|
Description |
Hi, We have a problem once two text messages (first is "problem" and the second is "ok") are sending via zabbix_sender from the same host to item(trapper) in the same time, zabbix create triggers in wrong order (the last one "problem") but in latest data it appeared correct. |
Comments |
Comment by Oleksii Zagorskyi [ 2015 Oct 07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Custom trigger severity colors, a bit different style of frontend, supposedly customized. Taking into account that values are multiline ones, it's not possible to send from single input file, but still, to be sure: We don't see other possibly existing triggers and don't see full trigger name when it expanded on events page. Does this trigger have "multiple problem events generation" enabled? Please do not to hide too much information next time, to not force us to guess Could be related to ZBX-9432. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2015 Oct 07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sorry, will try to provide more details next time I am able to reproduce the issue: on host "server141" (centos 6.4) I run 2 zabbix_sender commands in the same time: %/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_]' -o severity:Warning; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_]' -o severity:normal info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000040" sent: 1; skipped: 0; total: 1 info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000036" sent: 1; skipped: 0; total: 1 key[LPTRAP_60_] is a part of LLD, item type "Zabbix trapper" , trigger (with Multiple PROBLEM events generation): Warning {server141:key[LPTRAP_60_].regexp(severity:Warning)}=1 Minor {server141:key[LPTRAP_60_].regexp(severity:Minor)}=1 Major {server141:key[LPTRAP_60_].regexp(severity:Major)}=1 Critical {server141:key[LPTRAP_60_].regexp(severity:Critical)}=1 in latest data I have :
in Monitoring-> triggers :
in Monitoring -> events (as you can see - trigger status is "PROBLEM" but should be "OK")::
If I send zabbix_sender separate (one after other): %/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_]' -o severity:Warning info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000044" sent: 1; skipped: 0; total: 1 %/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_]' -o severity:normal info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000030" sent: 1; skipped: 0; total: 1 in Monitoring -> events : there is no problem (trigger status is "OK"):
Let me know if you need more details Many Thanks for the help! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2015 Oct 07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ho many DB syncers do you have? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2015 Oct 07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
20 DB syncers | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2015 Oct 07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Do you want to say that you sent those 2 values to a zabbix_proxy? Not sure it can be related to your case, just FYI | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2015 Oct 07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yes.correct. I send it to proxy. Not sure that it relevant since in latest data I see it in correct order. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2015 Oct 07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Could you try to reproduce it creating only one regular item and trigger, not LLD ones ? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Marc [ 2015 Oct 07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I'm wondering whether it could be related to ZBX-6942. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2015 Oct 08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
it's not happened to regular item and trigger, the problem only for LLD one but it's not happened every time but very often I create the regular item and trigger on the same host and did the test : sending to LLD one : % /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_]' -o severity:Warning;/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_]' -o severity:normal;/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_]' -o severity:Warning;/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_]' -o severity:normal info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000049" sent: 1; skipped: 0; total: 1 info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000040" sent: 1; skipped: 0; total: 1 info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000057" sent: 1; skipped: 0; total: 1 info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000023" sent: 1; skipped: 0; total: 1 sending to regular item: %/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_test]' -o severity:Warning;/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_test]' -o severity:normal;/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_test]' -o severity:Warning;/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_test]' -o severity:normal info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000053" sent: 1; skipped: 0; total: 1 info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000146" sent: 1; skipped: 0; total: 1 info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000036" sent: 1; skipped: 0; total: 1 info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000040" sent: 1; skipped: 0; total: 1 in Monitoring -> events :
for regular it's ok :
Thanks | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2015 Oct 08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ok, please confirm as for LLD: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2015 Oct 09 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Correct, only for lld Thanks | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2015 Oct 09 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
When you tested with regular item/trigger - did you create also 4 triggers for the regular item? ok, one more request for test: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2015 Oct 12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I created new template with lld (4 triggers) and regular item (4 triggers) assign it to the new host (reporting to the same proxy like the first host) and now I am not able to reproduce the issue (it happened only once) Thanks | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2015 Oct 12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Looking to your previous attachments I believe that zabbix server really did something wrong. And I don't like that. Older zabbix server versions had some issues with LLD objects creation (like Bad thing is that this issue is randomly reproducible, so most likely it cannot be related to possibly wrong data in database, but who knows .... All said - IMO. I'd happy to reproduce it but I'm afraid I'd spend a lot of time without success to reproduce it. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2015 Oct 12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If you can reproduce it at least once (even if reproducible randomly) for newly created item/triggers - it's also good results for us, which forces us to consider it seriously. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2015 Oct 12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ok, I have bad/good news I did the tests again with attached template (Template app trap test.xml) include : lld (4 triggers) and regular item (4 triggers) and the results are (see in attached file test_results.jpg) : lld from 211 tests trigger was wrong 3 times in order to use my template : for lld, run /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k external.lptraps.discovery.test -o "{\"data\":[{\"{#TRAPKEY}\":\"LPTRAP_60_\"}]}" then run : /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_1]' -o severity:Warning; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_1]' -o severity:normal; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_1]' -o severity:Warning; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_1]' -o severity:normal; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_1]' -o severity:Warning; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_1]' -o severity:normal; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_1]' -o severity:Warning; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_1]' -o severity:normal for regular item : /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_test]' -o severity:Warning; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_test]' -o severity:normal; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_test]' -o severity:Warning; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_test]' -o severity:normal; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_test]' -o severity:Warning; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_test]' -o severity:normal; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_test]' -o severity:Warning; /usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k 'key[LPTRAP_60_test]' -o severity:normal Let me know if you want me to do more tests btw, I started to work with zabbix in v2.0, then 2.2,2.4,2.4.4 Thanks! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2015 Oct 12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Try please other tests: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2015 Oct 12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
with one trigger during 200 tests : lld didn't failed looks like it failed from time to time without dependency if it's lld or regular item and how many triggers define | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2015 Oct 12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
with 10 triggers : lld failed 9 times | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Oleg Ivanivskyi [ 2016 Apr 30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
You can reproduce this issue by sending several values via Zabbix sender at the same time and using "{Zabbix server:test.strlen()}>0" trigger with enabled "Multiple PROBLEM events generation". Commands executed in one go are: zabbix_sender -z 127.0.0.1 -s "Zabbix server" -k test -o 'bal bla' zabbix_sender -z 127.0.0.1 -s "Zabbix server" -k test -o "" zabbix_sender -z 127.0.0.1 -s "Zabbix server" -k test -o "bal bla" zabbix_sender -z 127.0.0.1 -s "Zabbix server" -k test -o "" My environment - Zabbix server 2.4.7, no proxies. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Oleg Ivanivskyi [ 2016 Apr 30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In the DB: +----+--------+----------------------+---------+-----------+ | id | itemid | from_unixtime(clock) | value | ns | +----+--------+----------------------+---------+-----------+ | 9 | 23660 | 2016-04-29 21:00:43 | bal bla | 31162306 | | 10 | 23660 | 2016-04-29 21:00:43 | | 32888625 | | 11 | 23660 | 2016-04-29 21:00:43 | bal bla | 34332592 | | 12 | 23660 | 2016-04-29 21:00:43 | | 35673574 | +----+--------+----------------------+---------+-----------+ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2016 May 23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This may happen simply because it's internet and there is no guarantee that values sent with independent zabbix_sender commands will arrive and be processed by server in the same order. Things that might help:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2016 May 25 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hi, there is a problem to provide server log with DebugLevel=4 and tcp dump since I have ~6000 VPS in zabbix server. Thanks! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2016 May 25 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Dear Natalia, since there is nothing specific about your installation and the issue can be reproduced there is no need to increase debug log level or capture TCP dump on 6000 NVPS installation. Thank you for the information you've already provided. My comment was addressed to Oleg, because so far I've failed to reproduce the issue myself (but I was using trunk, new history cache implemented in 3.0). I apologise for misleading you. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2016 May 25 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Thanks for the update! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2016 May 26 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Natalia, few questions for you. As I understand, this issue is randomly reproducible with quite low probability (0-9 failures out of 200 tests) and for both ordinary, templated and LLD items/triggers. This makes me think that trigger evaluation must be correct, but sometimes pre-last value is used instead of last one to evaluate trigger.
I will be very glad to see listings from the database (ids, clocks and ns from history and events) like the one Oleg provided. I think ZBX-6942 is not related. My initial guess about TCP connections not arriving in order can explain OK/PROBLEM order swapping in ZBX-6942, but here we have a miscalculated trigger. And DB listing from Oleg shows that values arrived (and were written to DB) in order. So, minor clock adjustments or discontinuities and/or race condition between history syncers are my best guesses at the moment. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2016 May 31 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hi, Sorry for delay. 1. We don't send value to server, only via proxy. We have very complex structure : zabbix agent -> proxy VIP (F5) -> proxy server -> zabbix server VIP (F5) -> zabbix server -> zabbix DB VIP (F5) -> zabbix DB server Could you provide me the query for DB (mysql 5.6) for ids, clocks and ns from history and events ? Oleg was able to reproduce the problem in your env. , so maybe it's not relate to my env. and configuration ... Thanks for your help | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2016 May 31 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
No worries! Thank you for the information. Unfortunately, Oleg dropped his environment where he reproduced the issue and his attempt to recreate it were unsuccessful. Can I assume that you know problematic event clock and item/host which generated it? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2016 May 31 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Just noticed a weird thing in | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2016 May 31 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yes, description(item.value) is correct but status and severity are wrong | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2016 Jun 01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hi! Sorry for a delay. Query should look like this (you need to choose appropriate history table and replace HOSTNAME, ITEM_NAME and TIMESTAMP): select e.*, hi.* from events as e, history/history_log/history_str/history_text/history_uint as hi, triggers as t, functions as f, items as i, hosts as ho where ho.name = HOSTNAME and i.name = ITEM_NAME and e.clock = TIMESTAMP and hi.itemid = i.itemid and e.clock = hi.clock and e.ns = hi.ns and e.source = 0 and e.object = 0 and e.objectid = t.triggerid and f.triggerid = t.triggerid and f.itemid = i.itemid; This will return events of all triggers for the item generated at given clock, so you might want to massage the output afterwards. Looking forward to your reply! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2016 Jun 01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There is a scenario in which it may happen:
For this to happen order must be changed twice as A and B progress along the pipeline. This is unlikely but not impossible. However, this contradicts Oleg's DB contents where timestamps are in perfect order. That's a pity we can't double-check this! With a small addition to process_hist_data() which adds extra 2 ms to non-empty values it is possible to simulate aforementioned scenario. if ('\0' != *tmp) { av->ts.ns += 2000000; if (1000000000 < av->ts.ns) { av->ts.ns -= 1000000000; av->ts.sec++; } } I send: $ src/zabbix_sender/zabbix_sender -z localhost -p 10051 -s Dummy -k test.trapper -o test && src/zabbix_sender/zabbix_sender -z localhost -p 10051 -s Dummy -k test.trapper -o '' info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000138" sent: 1; skipped: 0; total: 1 info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000060" sent: 1; skipped: 0; total: 1 This generates two PROBLEM events: select e.*, hi.* from events as e, history_text as hi, triggers as t, functions as f, items as i, hosts as ho where ho.name = 'Dummy' and i.name = 'Test' and hi.itemid = i.itemid and e.source = 0 and e.object = 0 and e.objectid = t.triggerid and f.triggerid = t.triggerid and f.itemid = i.itemid and e.clock = hi.clock and e.ns = hi.ns and e.clock = 1464778041; eventid | source | object | objectid | clock | value | acknowledged | ns | id | itemid | clock | value | ns ---------+--------+--------+----------+------------+-------+--------------+-----------+-----+--------+------------+-------+----------- 302 | 0 | 0 | 13609 | 1464778041 | 1 | 0 | 960187415 | 236 | 23770 | 1464778041 | test | 960187415 303 | 0 | 0 | 13609 | 1464778041 | 1 | 0 | 959718399 | 237 | 23770 | 1464778041 | | 959718399 (2 rows) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2016 Jun 06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Dear Natalia, contents of your DB will still be much appreciated as they can prove or disprove my hypothesis. Also I would like to ask about NTP settings and how it behaves in your setup. Does it adjust clock ticking speed or perform jumps (and how big they are)? Does "hardware clock" of proxy's virtual environment drift away from host machine's clock? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2016 Jun 07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hi, regarding NTP, we have the same ntp.conf on all hosts. %less /etc/ntp.conf | grep -v "^#" | grep -v "^$" restrict default kod nomodify notrap nopeer noquery restrict -6 default kod nomodify notrap nopeer noquery restrict 127.0.0.1 restrict -6 ::1 fudge 127.127.1.0 stratum 10 driftfile /var/lib/ntp/drift keys /etc/ntp/keys server xxx.xxx.xxx.xxx ... I don't have more information, let me know if you need more details regarding the DB data, I am try to running the test now and will update Thanks | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2016 Jun 13 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Good morning, I am not able to reproduce the problem with my test template anymore (:- I will try to reproduce it later again and will update Thanks! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2016 Jun 13 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Good morning! I'm glad that you are not experiencing the issue any more. I would recommend not to use two sender calls when the order of values is crucial. To be on a safe side it's better to put values in file and call zabbix_sender once. My thoughts on how timekeeping and NTP issues may lead to this situation and ideas how we can improve things are registered as ZBXNEXT-3298. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Natalia Kagan [ 2016 Jun 13 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I am not sure that the problem not exists anymore, for real alerts we implement workaround : added "sleep 5" between zabbix_sender, I just not able to reproduce this. We have no control how many/when zabbix_sender should send the value since events are coming from application code (we have very complex solution that using lld trapper) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by dimir [ 2016 Sep 27 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
OverviewWas able to reproduce the problem. Used trapper item with trigger {Zabbix server:test.strlen()}<3 (with multiple PROBLEM events generation), monitored by proxy, all components of latest 2.2 (2.2.15rc1 r62671). In my case the problem happened when the values sent were ending up in different parcels. In order to guarantee that I had to apply the following change to Zabbix proxy:
Index: include/proxy.h
===================================================================
--- include/proxy.h (revision 62691)
+++ include/proxy.h (working copy)
@@ -26,7 +26,8 @@
#define ZBX_PROXYMODE_ACTIVE 0
#define ZBX_PROXYMODE_PASSIVE 1
-#define ZBX_MAX_HRECORDS 1000
+//#define ZBX_MAX_HRECORDS 1000
+#define ZBX_MAX_HRECORDS 1
#define AGENT_VALUE struct zbx_agent_value_t
This guarantees the history data parcels sent from proxy to server always contain 1 value. I was feeding PROBLEM, OK values sent one by one by pairs using zabbix_sender to the proxy: while true; do for i in a bbb; do bin/zabbix_sender -z 192.168.1.100 -s "Zabbix server" -k test -o $i; done; sleep 3; done
The proxy sets value timestamps as current timestamp. Then the server, when receiving history data is adjusting every value timestamp by the time difference between proxy and server (proxy_timediff), here's what server does for each history data parcel:
Time difference between proxy and server (proxy_timediff) is supposed to guarantee the order of history data flow, but it appears in some cases (network latency) it can, on the contrary, screw the timestamps of values in separate parcels. In the following example 3 pairs of PROBLEM, OK values were sent (that's 6 parcels of "history data", 1 value per parcel). The values of the second pair (3rd and 4th parcels) end up on the server with crossed timestamps, resulting in 3rd value (which is PROBLEM) processed 2 times and 4th not processed at all. This happens because when processing the 4th value the 3rd is still the latest (greater timestamp). Simple view of the problemWhat happens on the serverThis is taken from zabbix_server.log
Detailed view of the problemHistory values sent by proxyzabbix_proxy.log: -- parcel 1 (PROBLEM value) 29682:20160927:152434.271 In send_data_to_server() [{ "request":"history data", "host":"Zabbix proxy", "data":[ { "host":"Zabbix server", "key":"test", "clock":1474979070, "ns":631108357, "value":"a"}], "clock":1474979074, "ns":271853707}] -- parcel 2 (OK value) 29682:20160927:152435.287 In send_data_to_server() [{ "request":"history data", "host":"Zabbix proxy", "data":[ { "host":"Zabbix server", "key":"test", "clock":1474979070, "ns":636563317, "value":"bbb"}], "clock":1474979075, "ns":287670248}] -- parcel 3 (PROBLEM value) 29682:20160927:152437.296 In send_data_to_server() [{ "request":"history data", "host":"Zabbix proxy", "data":[ { "host":"Zabbix server", "key":"test", "clock":1474979073, "ns":651108838, "value":"a"}], "clock":1474979077, "ns":296742783}] -- parcel 4 (OK value) 29682:20160927:152438.309 In send_data_to_server() [{ "request":"history data", "host":"Zabbix proxy", "data":[ { "host":"Zabbix server", "key":"test", "clock":1474979073, "ns":656584887, "value":"bbb"}], "clock":1474979078, "ns":309290537}] -- parcel 5 (PROBLEM value) 29682:20160927:152442.319 In send_data_to_server() [{ "request":"history data", "host":"Zabbix proxy", "data":[ { "host":"Zabbix server", "key":"test", "clock":1474979076, "ns":681714069, "value":"a"}], "clock":1474979082, "ns":319267404}] -- parcel 6 (OK value) 29682:20160927:152443.327 In send_data_to_server() [{ "request":"history data", "host":"Zabbix proxy", "data":[ { "host":"Zabbix server", "key":"test", "clock":1474979076, "ns":686862613, "value":"bbb"}], "clock":1474979083, "ns":327434212}] -- This is the part of the log where server was receiving the parcels. When receiving, server is adjusting value timestamps according to proxy_timediff. Notice how after adjustment the timestamp of 4th parcel becomes in the past from 3rd. History values received by server from proxyzabbix_server.log: -- parcel 1 (PROBLEM value) "clock":1474979070, "ns":631108357, "value":"a"}], 29645:20160927:152434.280 DIMIR: proxy_timediff: 0.009 sec 29645:20160927:152434.280 DIMIR: process_hist_data() after proxy_timediff adjustment [ a] 1474979070.639617587 -- parcel 2 (OK value) "clock":1474979070, "ns":636563317, "value":"bbb"}], 29644:20160927:152435.292 DIMIR: proxy_timediff: 0.005 sec 29644:20160927:152435.292 DIMIR: process_hist_data() after proxy_timediff adjustment [bbb] 1474979070.641515576 <-- all good: after proxy_timediff adjustment the second value is still in the future from the first -- parcel 3 (PROBLEM value) "clock":1474979073, "ns":651108838, "value":"a"}], 29642:20160927:152437.305 DIMIR: proxy_timediff: 0.009 sec 29642:20160927:152437.305 DIMIR: process_hist_data() after proxy_timediff adjustment [ a] 1474979073.659902111 -- parcel 4 (OK value, but after server adjusted value timestamp it ends up before 3rd value!) "clock":1474979073, "ns":656584887, "value":"bbb"}], 29641:20160927:152438.311 DIMIR: proxy_timediff: 0.002 sec <-- BUG: proxy_timediff of the second parcel is much smaller than the first (0.009), the difference is 0.007 seconds, while the difference of value timestamps is just 0.005! 29641:20160927:152438.311 DIMIR: process_hist_data() after proxy_timediff adjustment [bbb] 1474979073.658875417 <-- BUG: after proxy_timediff adjustment the second value is in the past from the first! -- parcel 5 (PROBLEM value) "clock":1474979076, "ns":681714069, "value":"a"}], 29641:20160927:152442.324 DIMIR: proxy_timediff: 0.005 sec 29641:20160927:152442.324 DIMIR: process_hist_data() after proxy_timediff adjustment [ a] 1474979076.686639954 -- parcel 6 (OK value) "clock":1474979076, "ns":686862613, "value":"bbb"}], 29641:20160927:152443.330 DIMIR: proxy_timediff: 0.003 sec 29641:20160927:152443.330 DIMIR: process_hist_data() after proxy_timediff adjustment [bbb] 1474979076.689605427 <-- all good: after proxy_timediff adjustment the second value is still in the future from the first -- Proxy history tablemysql> select *,from_unixtime(clock) from proxy_history where clock between 1474979070 and 1474979076 order by clock,ns; +------+--------+------------+-----------+--------+----------+-------+------------+-----------+-------+----------------------+ | id | itemid | clock | timestamp | source | severity | value | logeventid | ns | state | from_unixtime(clock) | +------+--------+------------+-----------+--------+----------+-------+------------+-----------+-------+----------------------+ | 1719 | 23661 | 1474979070 | 0 | | 0 | a | 0 | 631108357 | 0 | 2016-09-27 15:24:30 | | 1720 | 23661 | 1474979070 | 0 | | 0 | bbb | 0 | 636563317 | 0 | 2016-09-27 15:24:30 | | 1721 | 23661 | 1474979073 | 0 | | 0 | a | 0 | 651108838 | 0 | 2016-09-27 15:24:33 | | 1722 | 23661 | 1474979073 | 0 | | 0 | bbb | 0 | 656584887 | 0 | 2016-09-27 15:24:33 | | 1723 | 23661 | 1474979076 | 0 | | 0 | a | 0 | 681714069 | 0 | 2016-09-27 15:24:36 | | 1724 | 23661 | 1474979076 | 0 | | 0 | bbb | 0 | 686862613 | 0 | 2016-09-27 15:24:36 | +------+--------+------------+-----------+--------+----------+-------+------------+-----------+-------+----------------------+ 6 rows in set (0.00 sec) Server history table (order of 3rd and 4th values is screwed)mysql> select * from history_str where clock between 1474979070 and 1474979076 order by clock,ns; +--------+------------+-------+-----------+ | itemid | clock | value | ns | +--------+------------+-------+-----------+ | 23661 | 1474979070 | a | 639617587 | | 23661 | 1474979070 | bbb | 641515576 | | 23661 | 1474979073 | bbb | 658875417 | | 23661 | 1474979073 | a | 659902111 | | 23661 | 1474979076 | a | 686639954 | | 23661 | 1474979076 | bbb | 689605427 | +--------+------------+-------+-----------+ 6 rows in set (0.00 sec) Event generation29653:20160927:152435.560 query [txnlev:1] [insert into events (eventid,source,object,objectid,clock,ns,value) values (38638,0,0,13558,1474979070,639617587,1); 29652:20160927:152440.548 query [txnlev:1] [insert into events (eventid,source,object,objectid,clock,ns,value) values (38639,0,0,13558,1474979070,641515576,0); 29652:20160927:152440.553 query [txnlev:1] [insert into events (eventid,source,object,objectid,clock,ns,value) values (38640,0,0,13558,1474979073,659902111,1); 29652:20160927:152440.556 query [txnlev:1] [insert into events (eventid,source,object,objectid,clock,ns,value) values (38641,0,0,13558,1474979073,658875417,1); <-- BUG 29652:20160927:152445.558 query [txnlev:1] [insert into events (eventid,source,object,objectid,clock,ns,value) values (38642,0,0,13558,1474979076,686639954,1); 29652:20160927:152445.565 query [txnlev:1] [insert into events (eventid,source,object,objectid,clock,ns,value) values (38643,0,0,13558,1474979076,689605427,0); Trigger changes-- 29653:20160927:152435.560 In process_trigger() triggerid:13558 value:0(0) new_value:1 29653:20160927:152435.560 query [txnlev:1] [update triggers set lastchange=1474979070,value=1 where triggerid=13558; -- 29652:20160927:152440.548 In process_trigger() triggerid:13558 value:1(0) new_value:0 29652:20160927:152440.548 query [txnlev:1] [update triggers set lastchange=1474979070,value=0 where triggerid=13558; -- 29652:20160927:152440.553 In process_trigger() triggerid:13558 value:0(0) new_value:1 29652:20160927:152440.553 query [txnlev:1] [update triggers set lastchange=1474979073,value=1 where triggerid=13558; -- 29652:20160927:152440.555 In process_trigger() triggerid:13558 value:1(0) new_value:1 29652:20160927:152440.555 query [txnlev:1] [update triggers set lastchange=1474979073 where triggerid=13558; <-- BUG, value is still 1 -- 29652:20160927:152445.558 In process_trigger() triggerid:13558 value:1(0) new_value:1 29652:20160927:152445.558 query [txnlev:1] [update triggers set lastchange=1474979076 where triggerid=13558; -- 29652:20160927:152445.565 In process_trigger() triggerid:13558 value:1(0) new_value:0 29652:20160927:152445.565 query [txnlev:1] [update triggers set lastchange=1474979076,value=0 where triggerid=13558; -- Trigger expression evaluation-- 29653:20160927:152435.559 End of zbx_vc_get_value_range():SUCCEED count:1 cached:1 29653:20160927:152435.560 End of evaluate_function():SUCCEED value:'1' 29653:20160927:152435.560 zbx_substitute_functions_results() expression[0]:'{13165}<3' => '1<3' -- 29652:20160927:152440.547 End of zbx_vc_get_value_range():SUCCEED count:1 cached:1 29652:20160927:152440.547 End of evaluate_function():SUCCEED value:'3' 29652:20160927:152440.547 zbx_substitute_functions_results() expression[0]:'{13165}<3' => '3<3' -- 29652:20160927:152440.553 End of zbx_vc_get_value_range():SUCCEED count:1 cached:1 29652:20160927:152440.553 End of evaluate_function():SUCCEED value:'1' 29652:20160927:152440.553 zbx_substitute_functions_results() expression[0]:'{13165}<3' => '1<3' -- 29652:20160927:152440.555 End of zbx_vc_get_value_range():SUCCEED count:1 cached:1 29652:20160927:152440.555 End of evaluate_function():SUCCEED value:'1' 29652:20160927:152440.555 zbx_substitute_functions_results() expression[0]:'{13165}<3' => '1<3' -- 29652:20160927:152445.558 End of zbx_vc_get_value_range():SUCCEED count:1 cached:1 29652:20160927:152445.558 End of evaluate_function():SUCCEED value:'1' 29652:20160927:152445.558 zbx_substitute_functions_results() expression[0]:'{13165}<3' => '1<3' -- 29652:20160927:152445.565 End of zbx_vc_get_value_range():SUCCEED count:1 cached:1 29652:20160927:152445.565 End of evaluate_function():SUCCEED value:'3' 29652:20160927:152445.565 zbx_substitute_functions_results() expression[0]:'{13165}<3' => '3<3' -- Monitoring ->Events | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by dimir [ 2016 Sep 28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The problem happens because of inefficient proxy/server time difference calculation. I believe this was partly fixed in From the latest versions I could not reproduce my scenario on other than 2.2. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Veeresh K [ 2016 Sep 29 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I am probably hitting the same issue. I have two trapper items, (numeric/unsigned), I am sending data for same via protobix (python lib). The values does get updated in zabbix, however the triggers dont generate (triggers are checking if last value is non zero). This is happening in lan env with nvps of about 37. I am on 3.0.4. However if I send the same data again, the triggers are generated. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2016 Sep 29 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Dear vkhanorkar, if it's a newly created trigger you are probably not waiting enough for the server to update configuration from database. I don't think your problem is related to this one. <dimir> Moreover, this issue is already fixed (at least partly) in 3.0.4. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Veeresh K [ 2016 Sep 29 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
@Glebs, Let me explain further. The first time the item value changes to nonzero to zero or vice versa, the trigger is not updated. But the next time, such value is updated, the triggers gets created immediately. Regarding waiting, the trigger was observed for about 3-4 minutes, but was nt updated. However once the same value was sent again, the trigger was generated immediately. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2016 Sep 29 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Dear vkhanorkar, you didn't get my point. How much time passes between creation of trigger and the moment of sending new value that is supposed to generate event. If the server wasn't aware of the existence of the trigger, that explains why it passed the value through itself without generating an event. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Comment by Alexander Vladishev [ 2016 Sep 30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Closed as duplicate of |