SLA: add minimum severity threshold for downtime classification

XMLWordPrintable

    • Type: Change Request
    • Resolution: Unresolved
    • Priority: Major
    • None
    • Affects Version/s: None
    • None

      Problem

      The current SLA SLI calculation in CSla::calculateSli() uses a binary classification: a service is either OK (ZBX_SEVERITY_OK) or down (any other value). This means a service in Warning state is treated identically to one in Disaster state for SLA reporting purposes.

      In practice, many organizations define SLA compliance against a severity threshold — for example, only High and Disaster should count as downtime, while Warning and Average represent degraded operation that does not violate the SLA. The current implementation requires operators to work around this by adjusting service status propagation rules, which conflates monitoring policy with SLA policy.

      Proposal

      Add a min_down_severity field to the SLA object, specifying the minimum trigger severity at which a service is considered down for SLA purposes.

      • Type: integer, range 0–5 (mapping to existing TRIGGER_SEVERITY_* constants)
      • Default: 0 (TRIGGER_SEVERITY_NOT_CLASSIFIED), preserving current behavior exactly
      • Exposed via sla.createsla.update, and sla.get API methods

      The required change to calculateSli() is minimal. The status timeline already carries per-alarm severity values; they are simply not compared against a threshold today. The core logic change is:

      // Current (CSla.php, calculateSli):
      if ($prev_value == ZBX_SEVERITY_OK) {

      // Proposed:
      if ($prev_value == ZBX_SEVERITY_OK || $prev_value < $db_sla['min_down_severity']) {

      Scope

      • Schema: one new column on the sla table (min_down_severity INT DEFAULT 0)
      • API: validation rules in CSla::create() and CSla::update(); output field in CSla::get()
      • Calculation: one conditional change in CSla::calculateSli()
      • UI: severity dropdown on the SLA edit form
      • Migration: ALTER TABLE sla ADD min_down_severity INT DEFAULT 0 NOT NULL

      Backward compatibility

      The default value of 0 means all non-OK states count as downtime, which is identical to current behvior. Existing SLA definitions, API integrations, and reports are unaffected.

            Assignee:
            Alexander Vladishev
            Reporter:
            Frederick Loucks
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: