Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-2651

Eventlog processing on Agent side. Serious experiments about performance and sequence sending.

XMLWordPrintable

    • Icon: Incident report Incident report
    • Resolution: Won't fix
    • Icon: Major Major
    • None
    • 1.8.2
    • Agent (G)

      I spent a few very accurate and serious experiments with processing Windows Eventlog.
      Maybe someone else it may seem madness, but I was interesting and useful
      I'll describe how I did it - maybe it will be useful to someone. And then I give my opinions and suggestions to improvement.
      I created an custom Eventlog «Alarm» and filled them with events on a particular algorithm.
      To create a custom Eventlog in the registry need to add a branch with the name of the eventlog on the patch:
      [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog\ Alarm]
      "Retention"=dword:00000000
      "MaxSize"=dword:07ff0000
      Two additional key determines the size and mode of rotation of the eventlog.
      Custom eventlog was created to eliminate the influence of other possible factors and for test in the future.

      Then, using the bat file, I filled the event log. As a result, the eventlog has 100200 events in a clear sequence of a particular algorithm. (the eventlog was filled with 18 minutes
      This structure of the created eventlog, which shows how it is formed:

      2010.Jun.28 22:26:20 FireSRC Information 555 100000 range END. Text for ALARM with ID 555
      2010.Jun.28 22:26:20 FireSRC Error 333 1000 - in 100000 range. Text for ALARM with ID 333
      2010.Jun.28 22:26:20 FireSRC Error 333 999 - in 100000 range. Text for ALARM with ID 333
      2010.Jun.28 22:26:20 FireSRC Error 333 998 - in 100000 range. Text for ALARM with ID 333
      ..................................................................
      2010.Jun.28 22:08:16 FireSRC Error 333 2 - in 2000 range. Text for ALARM with ID 333
      2010.Jun.28 22:08:16 FireSRC Error 333 1 - in 2000 range. Text for ALARM with ID 333
      2010.Jun.28 22:08:16 FireSRC Warning 444 2000 range START. Text for ALARM with ID 444
      2010.Jun.28 22:08:16 FireSRC Information 555 1000 range END. Text for ALARM with ID 555
      2010.Jun.28 22:08:16 FireSRC Error 333 1000 - in 1000 range. Text for ALARM with ID 333
      2010.Jun.28 22:08:15 FireSRC Error 333 999 - in 1000 range. Text for ALARM with ID 333
      ...........................................................
      2010.Jun.28 22:08:02 FireSRC Error 333 2 - in 1000 range. Text for ALARM with ID 333
      2010.Jun.28 22:08:02 FireSRC Error 333 1 - in 1000 range. Text for ALARM with ID 333
      2010.Jun.28 22:08:02 FireSRC Warning 444 1000 range START. Text for ALARM with ID 444

      The basic principle - through every thousand ordinary event (EventID 333) are repeated several distinctive events (EventID 444,555) which we will filter on the Zabbix-agent side.

      Bat file and reg file, you can take from the attached file.
      I think this little How-To may be useful to those who need to verify a Item key and complex Triggers in the real world forcibly creating random events in the event logs and observing Zabbix is working correctly or not.

      Then I created a few different keys and made experiments.
      So:
      The first dimension - performance (speed) reading the eventlog with a few Item with filtration for the EventID on the agent side.

      Immediately, I note that if the agent cofig define DebugLevel = 4, then the speed of processing eventlog catastrophic falls, so the speed need to check without the debug level!

      All parameters of the agent, which may affect performance - defaulted, but one exception MaxLinesPerSecond = 1000. This is done to better express the difference in the speed of the agent works.

      All Items have attribute Update interval (in sec)=1.

      Thus, first experiment: two Items with keys:
      eventlog [Alarm,,,, 444]
      eventlog [Alarm,,,, 555]
      Agent processed 100200 events and the sent to server 200 events for 1 min. 30 sec.
      picture "Two_Item_ID444+555.png"

      Second experiment: single Item with key:
      eventlog [Alarm,,,, 444 | 555]
      Agent processed 100200 events and the sent to server 200 events for 1 min. 10 sec.
      picture "One_Item_ID444_or_555.png"

      That is significantly faster than in the previous example. This processor is Core2Duo 3.0Ghz load also smaller (see picture "processor_load_differenses.png")

      Another dimension - the sequence of construction and sending events by agent.

      As you can see in the picture "Two_Item_ID444+555.png" events were generated and transmitted to the server is not in the same sequence as they were created on the Windows Host. They are generated as the agent read them (two different Items) and transmit to the server. This is not quite right !!!

      It is suggested that the idea: to make that - when the agent asks and receives from the server the list of active checks, it groups the entire Items for each Eventlog in separate groups and when parsing of the eventlog will process new event through the elements in this group in a single pass !!! Thus will be fulfilled the real sequence of events from the agent side within each unique journal!. Also on the idea will be improved performance.

      Example realistic Items for single Windows Host:
      eventlog[System,,,,26|6009]
      eventlog[System,,,Warning,1007|3019|24]
      eventlog[System,,,Error]
      must be formed into one group.
      Only need to decide what to do with the attribute of the Items - Update interval (in sec). Perhaps it should also be taken as a criterion for forming groups. In this case, the documentation should give a recommendation - using multiple items for a log - set the same Update interval (in sec).

      I also want to prove that in the real environment may be situations where I described the remark is relevant.
      For example - Zabbix agent work a long time without connection to the server and could not send events. After the restoration of connection, he will handle the log and send events with great speed (as in my first experiment).
      Or example - host a significant amount of time is Not monitored by Zabbix server in the web-interfaceand then back to Monitored, etc.... If the Zabbix administrator wants to see list of events for a few elements - it does not receive a reliable eventlog with the correct order of occurrence of windows-events.
      In other words, the Administrator shall not think about the fact that the events could come in the wrong sequence, even in very rare situations, it should always be sure that the events came with right sequence!
      The work of triggers in these circumstances I'd rather say nothing - that is another story.

      Although the experiment was made for the Item key eventlog[], but all told, probably true, also for keys log[] and logrt[].
      Thank you for your attention.

      Sorry for my English (original Russian text attached)

            Unassigned Unassigned
            zalex_ua Oleksii Zagorskyi
            Votes:
            5 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: