-
Incident report
-
Resolution: Won't fix
-
Major
-
None
-
2.0.4
In a Distributed Monitoring setup of several nodes, where Node 1 is a master and Node 2 is a child of Node 1:
When a Trigger for an Item located at Node 2 is created via the web interface at Node 1, an "unknown event" for that Trigger is created in the `events' table of the database at Node 1. eventid of this "unknown_event" is set according to the local & remote node IDs and is (for Node 1 id=1 and Node 2 id=2) in the range of 200100xxxxxxxxxxx.
Meanwhile, at Node 2 events occuring locally are stored in the `events' table with corresponding ids in the range of 200000xxxxxxxxxxx.
When Node 2 requests ZBX_GET_HISTORY_LAST_ID from its master Node 1 in order to replicate these events, it receives send_history_last_id() response containing the id of that "unknown event", since DBnode() function looks up IDs disregarding the source_node part, i.e. up to 29999999999999999. And the id of the "unknown event" associated with the newly created Trigger overrides the id of last event received by Node 2 from Node 1, since the former starts from 200100xxx and the latter from 200000xxx.
So, after creating a new Trigger and appearing the "unknown event" of its creation in the database of Node 1, the replication of all events occurring at Node 2 to Node 1 STOPS completely.
Now, if that Trigger "fires" at Node 2 and a related Alert is produced, when Node 2 tries to replicate `alerts' table to Node 1, it generates a foreign key violation in the database at Node 1 due to alerts replicated referencing eventids not replicated to Node 1.
Steps to reproduce:
1) Set up a clean Zabbix 2.0.4 installation on two machines.
2) Set up the first machine as a Node 1 in a Distributed Monitoring configuration, Node 2 being its child.
3) Set up the second machine as a Node 2 in a Distributed Monitoring configuration, Node 1 being its master.
4) With the web frontend at Node 1, add Node 2's machine as a Host at Node 2.
5) Set up any Item for that host at Node 2 (for example, proc.num[atd])
6) Set up any Trigger for that host at Node 2 (for example,
<1)
7) Make the Trigger fire (for example, stop atd at Node 2).
Now see that `events' data is not replicated to Node 1 due to conflicting eventid-s.
Possible solution: change send_history_last_id() function to select only node-local IDs (up to NNN00099999999999 instead of NNN99999999999999) for specified node ID (see patch attached).
The same issue is addressed in the ticket https://support.zabbix.com/browse/ZBX-5929.