[ZBX-6154] Events and Alerts not replicated properly in a Distributed Monitoring setup Created: 2013 Jan 19  Updated: 2017 May 30  Resolved: 2015 Feb 02

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.0.4
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: Yaroslav Zhavoronkov Assignee: Unassigned
Resolution: Won't fix Votes: 0
Labels: dm, events, patch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File zabbix-fix-history_last_id.patch    

 Description   

In a Distributed Monitoring setup of several nodes, where Node 1 is a master and Node 2 is a child of Node 1:

When a Trigger for an Item located at Node 2 is created via the web interface at Node 1, an "unknown event" for that Trigger is created in the `events' table of the database at Node 1. eventid of this "unknown_event" is set according to the local & remote node IDs and is (for Node 1 id=1 and Node 2 id=2) in the range of 200100xxxxxxxxxxx.
Meanwhile, at Node 2 events occuring locally are stored in the `events' table with corresponding ids in the range of 200000xxxxxxxxxxx.
When Node 2 requests ZBX_GET_HISTORY_LAST_ID from its master Node 1 in order to replicate these events, it receives send_history_last_id() response containing the id of that "unknown event", since DBnode() function looks up IDs disregarding the source_node part, i.e. up to 29999999999999999. And the id of the "unknown event" associated with the newly created Trigger overrides the id of last event received by Node 2 from Node 1, since the former starts from 200100xxx and the latter from 200000xxx.

So, after creating a new Trigger and appearing the "unknown event" of its creation in the database of Node 1, the replication of all events occurring at Node 2 to Node 1 STOPS completely.
Now, if that Trigger "fires" at Node 2 and a related Alert is produced, when Node 2 tries to replicate `alerts' table to Node 1, it generates a foreign key violation in the database at Node 1 due to alerts replicated referencing eventids not replicated to Node 1.

Steps to reproduce:
1) Set up a clean Zabbix 2.0.4 installation on two machines.
2) Set up the first machine as a Node 1 in a Distributed Monitoring configuration, Node 2 being its child.
3) Set up the second machine as a Node 2 in a Distributed Monitoring configuration, Node 1 being its master.
4) With the web frontend at Node 1, add Node 2's machine as a Host at Node 2.
5) Set up any Item for that host at Node 2 (for example, proc.num[atd])
6) Set up any Trigger for that host at Node 2 (for example,

{node2:proc.num[atd].last(0)}

<1)
7) Make the Trigger fire (for example, stop atd at Node 2).
Now see that `events' data is not replicated to Node 1 due to conflicting eventid-s.

Possible solution: change send_history_last_id() function to select only node-local IDs (up to NNN00099999999999 instead of NNN99999999999999) for specified node ID (see patch attached).

The same issue is addressed in the ticket https://support.zabbix.com/browse/ZBX-5929.



 Comments   
Comment by richlv [ 2015 Feb 02 ]

with nodes being removed since 2.4, this issue is unlikely to be looked in

Generated at Thu Mar 28 15:38:42 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.