Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-6154

Events and Alerts not replicated properly in a Distributed Monitoring setup

    Details

    • Type: Incident report
    • Status: Closed
    • Priority: Major
    • Resolution: Won't fix
    • Affects Version/s: 2.0.4
    • Fix Version/s: None
    • Component/s: Server (S)
    • Labels:

      Description

      In a Distributed Monitoring setup of several nodes, where Node 1 is a master and Node 2 is a child of Node 1:

      When a Trigger for an Item located at Node 2 is created via the web interface at Node 1, an "unknown event" for that Trigger is created in the `events' table of the database at Node 1. eventid of this "unknown_event" is set according to the local & remote node IDs and is (for Node 1 id=1 and Node 2 id=2) in the range of 200100xxxxxxxxxxx.
      Meanwhile, at Node 2 events occuring locally are stored in the `events' table with corresponding ids in the range of 200000xxxxxxxxxxx.
      When Node 2 requests ZBX_GET_HISTORY_LAST_ID from its master Node 1 in order to replicate these events, it receives send_history_last_id() response containing the id of that "unknown event", since DBnode() function looks up IDs disregarding the source_node part, i.e. up to 29999999999999999. And the id of the "unknown event" associated with the newly created Trigger overrides the id of last event received by Node 2 from Node 1, since the former starts from 200100xxx and the latter from 200000xxx.

      So, after creating a new Trigger and appearing the "unknown event" of its creation in the database of Node 1, the replication of all events occurring at Node 2 to Node 1 STOPS completely.
      Now, if that Trigger "fires" at Node 2 and a related Alert is produced, when Node 2 tries to replicate `alerts' table to Node 1, it generates a foreign key violation in the database at Node 1 due to alerts replicated referencing eventids not replicated to Node 1.

      Steps to reproduce:
      1) Set up a clean Zabbix 2.0.4 installation on two machines.
      2) Set up the first machine as a Node 1 in a Distributed Monitoring configuration, Node 2 being its child.
      3) Set up the second machine as a Node 2 in a Distributed Monitoring configuration, Node 1 being its master.
      4) With the web frontend at Node 1, add Node 2's machine as a Host at Node 2.
      5) Set up any Item for that host at Node 2 (for example, proc.num[atd])
      6) Set up any Trigger for that host at Node 2 (for example,

      {node2:proc.num[atd].last(0)}

      <1)
      7) Make the Trigger fire (for example, stop atd at Node 2).
      Now see that `events' data is not replicated to Node 1 due to conflicting eventid-s.

      Possible solution: change send_history_last_id() function to select only node-local IDs (up to NNN00099999999999 instead of NNN99999999999999) for specified node ID (see patch attached).

      The same issue is addressed in the ticket https://support.zabbix.com/browse/ZBX-5929.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              yaroslav.zh Yaroslav Zhavoronkov
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: