[ZBX-7452] Event synchronization is broken in multinode DM case Created: 2013 Nov 28  Updated: 2017 May 30  Resolved: 2013 Dec 30

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.2.0
Fix Version/s: 2.2.2rc1, 2.3.0

Type: Incident report Priority: Blocker
Reporter: Oleg Korchagin Assignee: Unassigned
Resolution: Fixed Votes: 4
Labels: dm, regression, synchronization
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

oracle linux ( RHEL clone ) 6.4, zabbix 2.2 ( rpm from official site ), mysql 5.6, vsphere 5.1


Attachments: File ZBX-7365_ZBX-7452_quick_and_dirty_fix.patch    
Issue Links:
Duplicate
is duplicated by ZBX-7365 Events won't be synced to master node... Closed
is duplicated by ZBX-7560 Synchronization between nodes with Za... Closed

 Description   

Zabbix 2.2, distributed monitoring, nodes. After update from 2.0, event syncronization ( from slave nodes to master node ) is completely broken.
Other data ( history_log, history_str_sync, history_uint_sync, history_sync ) synchronization works fine.

As a result, there are obsoleted data on master node. Triggers are actual, mail notification send in time, but monitoring->dashboard show obsoleted information and monitoring->events show obsoleted information. So, multinode configuration become garbage.

In traffic dump i can see that

1. zabbix master return constant value for all requests "ZBX_GET_HISTORY_LAST_ID.202.202 events.eventid"
2. nevertheless, zabbix server response "OK" for every synchronization request "History.202.202.events" with ~ 5k events.

I rebuild zabbix from source on master node with debug messages ( src/zabbix_server/events.c, zabbix_log(LOG_LEVEL_WARNING,"events num: %i", events_num); ) . After that, i see that events successfully add to an array ( add_event() ), and events_num increase every sync. But in process_events() debug message every time show "events_num: 0" , su events does not push to database

I create test stand ( 2 node, clear install zabbix 2.2 and mysql 5.6 ). It show the same problem. I.e. it's not an upgrade issue, but zabbix branch 2.2 bug



 Comments   
Comment by Gerrit Fluck [ 2013 Nov 28 ]

Seems to be a related issue: ZBX-7365
We experience the same problems in 2.2

Comment by Dilson Tomé [ 2013 Dec 07 ]

Tested on Zabbix 2.2.1rc1 and same problem.
Can't sync events.

NODE 2>
mysql> select * from events;
--------------------------------------------------------------------------------+

eventid source object objectid clock value acknowledged ns

--------------------------------------------------------------------------------+

200000000000028 3 0 200200000013575 1386427560 1 0 371474894
200000000000029 0 0 200200000013575 1386427611 0 0 173221504
200000000000030 3 0 200200000013575 1386427611 0 0 173221504
200000000000031 0 0 200200000013579 1386427615 0 0 185215493
200000000000032 0 0 200200000013580 1386427616 0 0 196416635
200000000000033 0 0 200200000013581 1386427620 0 0 206510097
200000000000034 0 0 200200000013582 1386427625 0 0 324965089
200000000000035 0 0 200200000013584 1386427634 0 0 392126704
200000000000036 0 0 200200000013588 1386427640 0 0 421204147
200000000000037 3 0 200200000013586 1386427937 1 0 205579069
200000000000038 3 0 200200000013574 1386428020 1 0 344295607
200000000000039 3 0 200200000013576 1386428020 1 0 344295607
200000000000040 3 0 200200000013577 1386428020 1 0 344295607
200000000000041 3 0 200200000013578 1386428020 1 0 344295607
200000000000042 3 0 200200000013579 1386428020 1 0 344295607
200000000000043 3 0 200200000013580 1386428020 1 0 344295607
200000000000044 3 0 200200000013581 1386428020 1 0 344295607
200000000000045 3 0 200200000013582 1386428020 1 0 344295607
200000000000046 3 0 200200000013583 1386428020 1 0 344295607
200000000000047 3 0 200200000013584 1386428020 1 0 344295607
200000000000048 3 0 200200000013585 1386428020 1 0 344295607
200000000000049 3 0 200200000013587 1386428020 1 0 344295607
200000000000050 3 0 200200000013588 1386428020 1 0 344295607
200000000000051 3 0 200200000013589 1386428020 1 0 344295607
200000000000052 3 0 200200000013590 1386428020 1 0 344295607
200000000000053 0 0 200200000013575 1386428040 1 0 574789153

--------------------------------------------------------------------------------+
26 rows in set (0.00 sec)

NODE 1
mysql> select * from events;
Empty set (0.00 sec)

Comment by Oleg Korchagin [ 2013 Dec 17 ]

In 2.2.1 bug still actual

quick & dirty fix:

see attached file "ZBX-7365_ZBX-7452_quick_and_dirty_fix.patch"

Comment by Oleksii Zagorskyi [ 2013 Dec 28 ]

Issue CONFIRMED.
It's a regression caused changes in ZBXNEXT-1575 in rev 34766
Usage of "process_events()" function has disappeared in that revision.
https://www.zabbix.org/websvn/wsvn/zabbix.com/trunk/src/zabbix_server/trapper/nodehistory.c?op=diff&rev=34766&peg=34766

Comment by Alexander Vladishev [ 2013 Dec 30 ]

Oleg,

Thank you for the patch! It will be integrated in version 2.2.2 with little change.

Comment by Alexander Vladishev [ 2013 Dec 30 ]

Fixed in the development branch svn://svn.zabbix.com/branches/dev/ZBX-7452

Comment by Andris Zeila [ 2014 Jan 02 ]

Successfully tested

Comment by Alexander Vladishev [ 2014 Jan 02 ]

Fixed in pre-2.2.2 r41210 and pre-2.3.0 (trunk) r41211.

Comment by Andreas Franke [ 2014 Feb 07 ]

Hello,
i have updated today my 2 server to 2.2.2rc2 but the problem is still existing. It is possible to clean the master database or the sync status from the events table?

Comment by Oleksii Zagorskyi [ 2014 Feb 07 ]

Andreas, could you restart master node with DebugLevel=4, let it run for 3 minutes and attach compressed log file here ?

Comment by Andreas Franke [ 2014 Feb 07 ]

Hello Oleksiy i have found the failure. I had some bad entrys in the events table on my child node. There were some entrys which has as objectid some triggers who don't exists.

Comment by Oleksii Zagorskyi [ 2014 Feb 07 ]

Andreas, good to know.
Then there is a question - how they could appear ?
and it goes to ZBX-3996

Comment by Andreas Franke [ 2014 Feb 07 ]

Yes i think this is the same problem.

Comment by Giovanni Lovato [ 2014 May 02 ]

I updated to 2.2.2 but I still get

4046:20140502:094156.452 NODE 3: sending events of node 3 to node 1 datalen 557097
4046:20140502:094156.467 NOT OK

Maybe because the DB is unclean from 2.2.1? How can I fix that?

Generated at Fri Apr 19 21:10:08 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.