Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-6154

Events and Alerts not replicated properly in a Distributed Monitoring setup

XMLWordPrintable

    • Icon: Incident report Incident report
    • Resolution: Won't fix
    • Icon: Major Major
    • None
    • 2.0.4
    • Server (S)

      In a Distributed Monitoring setup of several nodes, where Node 1 is a master and Node 2 is a child of Node 1:

      When a Trigger for an Item located at Node 2 is created via the web interface at Node 1, an "unknown event" for that Trigger is created in the `events' table of the database at Node 1. eventid of this "unknown_event" is set according to the local & remote node IDs and is (for Node 1 id=1 and Node 2 id=2) in the range of 200100xxxxxxxxxxx.
      Meanwhile, at Node 2 events occuring locally are stored in the `events' table with corresponding ids in the range of 200000xxxxxxxxxxx.
      When Node 2 requests ZBX_GET_HISTORY_LAST_ID from its master Node 1 in order to replicate these events, it receives send_history_last_id() response containing the id of that "unknown event", since DBnode() function looks up IDs disregarding the source_node part, i.e. up to 29999999999999999. And the id of the "unknown event" associated with the newly created Trigger overrides the id of last event received by Node 2 from Node 1, since the former starts from 200100xxx and the latter from 200000xxx.

      So, after creating a new Trigger and appearing the "unknown event" of its creation in the database of Node 1, the replication of all events occurring at Node 2 to Node 1 STOPS completely.
      Now, if that Trigger "fires" at Node 2 and a related Alert is produced, when Node 2 tries to replicate `alerts' table to Node 1, it generates a foreign key violation in the database at Node 1 due to alerts replicated referencing eventids not replicated to Node 1.

      Steps to reproduce:
      1) Set up a clean Zabbix 2.0.4 installation on two machines.
      2) Set up the first machine as a Node 1 in a Distributed Monitoring configuration, Node 2 being its child.
      3) Set up the second machine as a Node 2 in a Distributed Monitoring configuration, Node 1 being its master.
      4) With the web frontend at Node 1, add Node 2's machine as a Host at Node 2.
      5) Set up any Item for that host at Node 2 (for example, proc.num[atd])
      6) Set up any Trigger for that host at Node 2 (for example,

      {node2:proc.num[atd].last(0)}

      <1)
      7) Make the Trigger fire (for example, stop atd at Node 2).
      Now see that `events' data is not replicated to Node 1 due to conflicting eventid-s.

      Possible solution: change send_history_last_id() function to select only node-local IDs (up to NNN00099999999999 instead of NNN99999999999999) for specified node ID (see patch attached).

      The same issue is addressed in the ticket https://support.zabbix.com/browse/ZBX-5929.

            Unassigned Unassigned
            yaroslav.zh Yaroslav Zhavoronkov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: