Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-16543

Service with algorithm "if all children have PROBLEM", and problems with negative duration

    XMLWordPrintable

Details

    • Sprint 57 (Oct 2019), Sprint 58 (Nov 2019), Sprint 59 (Dec 2019), Sprint 60 (Jan 2020), Sprint 61 (Feb 2020), Sprint 62 (Mar 2020), Sprint 63 (Apr 2020), Sprint 64 (May 2020), Sprint 65 (Jun 2020), Sprint 66 (Jul 2020), Sprint 67 (Aug 2020)
    • 0.5

    Description

      A complex service with at least two dependencies reports a false problem if:

      1. The calculation algorithm is "Problem, if all children have problems".
      2. One dependency has a false problem (negative duration).
      3. All other dependencies are already in problem (positive duration), prior to the false one.

      When the recovery for the false problem occurs (recovery before the problem), if all other triggers are in problem, Zabbix considers that there was a recovery, therefore, all dependencies where in problem simultaneously.

      This behavior was observed in three services, with two dependencies each.

      The correct behavior was observed in a service with four dependencies. In this instance, one dependency reported a real problem, and another reported a false one. The remaining two didn't report any problem.

      The result was a correct calculation.

      Had the remaining two reported problems before the false recovery, a false an incorrect calculation would've been observed.

      In all of these cases, the algorithm is "Problem, if all children have problems".

       

      Steps to reproduce (if possible):

      1. Configure at least two simple services, each linked to its own trigger.
      2. Configure a complex service, with at least two dependencies, and calculation algorithm "Problem, if all children have problems".
      3. Create a real PROBLEM in all dependencies, except one.
      4. Create a false PROBLEM in the last dependency.

      Result:

      Unfortunately, I've corrected the database before taking a screenshot.

      This is what I can show.

       

      Example with two dependencies.

      A problem was reported with a duration of 38:45 minutes.

      Simple service 1. A real problem reported at 00:09:54.

      Simple service 2 - A false problem with a recovery at 00:48:49.

      Expected:
      Only one of the dependencies had a real problem.

      The complex service should've ignored the false one.

       

      Suggestion:

      It appears the complex service calculation is querying the dependencies trigger's events.

      It should instead be querying the dependencies service alarms, since this table appears to be correctly ignoring the trigger's false problems.

      This seems to be a simpler solution, than having the calculation ignore the trigger's false problems.

      Attachments

        1. image-2019-08-20-12-37-30-470.png
          image-2019-08-20-12-37-30-470.png
          13 kB
        2. image-2019-08-20-12-40-02-750.png
          image-2019-08-20-12-40-02-750.png
          14 kB
        3. missing_ok_record_parent.gif
          missing_ok_record_parent.gif
          6.39 MB
        4. sender.py
          2 kB
        5. TimeDiff.png
          TimeDiff.png
          40 kB
        6. Timeline.jpg
          Timeline.jpg
          35 kB
        7. zbx-16543.jpeg
          zbx-16543.jpeg
          2.06 MB
        8. ZBX-16543.pdf
          66 kB

        Issue Links

          Activity

            People

              zabbix.dev Zabbix Development Team
              joao.g.carvalho João Carvalho
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: