Async SNMP poller responses cross-contaminate between hosts when StartSNMPPollers >= 5

XMLWordPrintable

    • Type: Problem report
    • Resolution: Unresolved
    • Priority: Trivial
    • None
    • Affects Version/s: 7.0.25
    • Component/s: Server (S)
    • None
    • Environment:

        1. Steps to Reproduce

      1. Configure a Zabbix 7.0.25 server with *5 or more SNMP pollers* (`StartSNMPPollers=5+`).
      2. Add ≥ 20 SNMP-monitored hosts in the same subnet, each with the `system.name[sysName.0]` item from `Generic by SNMP` (or any `cpqHe*` / vendor-specific SNMP item from vendor templates).
      3. Tested with both `bulk=1` (default) and `bulk=0` on the SNMP interfaces — no difference.
      4. Tested with both `useip=1` (fixed IP) and `useip=0` (DNS resolution) — no difference.
      5. Allow the pollers to run for 5+ minutes.

        1. Expected Behaviour

      Each host's SNMP item is populated with the SNMP response from *that host's* IP/DNS endpoint.

        1. Actual Behaviour

      SNMP responses are intermittently routed to the *wrong host's items*. Concrete observations:

      Polled host (technical) Item `system.name[sysName.0]` value Expected
      `AP05` `AP15` `AP05`
      `AP04` `AP16` `AP04`
      `AP02` `AP14` `AP02`
      `AP01` `AP12` `AP01`
      `PLL01` `PWEA2` `PLL01`
      `SWBZ09` `SWSW01` `SWBZ09`
      `SWBZ05` `SWBZ06` `SWBZ05`

      The "incorrect" sysName values are always those of *other hosts in the same poll batch*, never random / garbage strings. This rules out network-level corruption and points to a response-to-host mapping race in the async poller layer.

      The triggering effect: stock template trigger `"System name has changed"` fires repeatedly (and incorrectly).

      Manual `snmpget -v2c -c <community> <ip> .1.3.6.1.2.1.1.5.0` from the Zabbix server host always returns the *correct* sysName for the queried IP — the underlying SNMP agents and the network are fine.

        1. Workaround

      Setting `StartSNMPPollers=1` in `zabbix_server.conf` and restarting the server *completely eliminates* the cross-contamination. All SNMP items immediately start showing correct values from their own hosts. Trade-off: a single poller is slower at clearing the queue but works correctly.

      We tried (without success):

      • `bulk=0` on all SNMP interfaces
      • Switching `useip=0` (DNS mode) on all interfaces
      • `config_cache_reload`, `snmp_cache_reload`, full server restart
      • Reducing item update intervals

      Only `StartSNMPPollers=1` resolved the issue.

        1. Hypothesis

      The async SNMP poller infrastructure dispatches multiple in-flight SNMP requests in parallel (one per poller worker), and the response-to-item mapping appears to confuse responses when:

      • multiple in-flight requests are outstanding simultaneously, AND
      • response packets arrive in close timing windows (likely on the same UDP socket or a shared completion queue).
        1. Evidence / Logs

      (Collected on 2026-05-07. Excerpts redacted — IPs / hostnames available on request.)

      • Item history shows the exact moment when responses got mismapped (timestamps correlate with poller process activity).
      • After `StartSNMPPollers=1` + server restart, *all subsequent polls* (verified for AP01-08, SWBZ09 etc.) show correct sysName values.
      • No firewall / network changes occurred at the boundaries of the affected time window.
      • Reproducer is robust: pre-restart, contamination resumes within ~5 minutes; post-restart with 1 poller, never reoccurs.
        1. Severity (Suggested)

      *Major* — silent data corruption (items receive values from the wrong source), causing false-positive alerts on stock-template triggers (e.g. `"System name has changed"`, `"Disk media type changed"`, `"Sensor name changed"`). Affects any monitoring setup with mid-to-large SNMP fleets.

        1. Submitter Notes
      • The issue did NOT exist on Zabbix 6.x with traditional synchronous SNMP pollers.
      • The issue ONLY appeared after enabling 5 parallel async SNMP pollers in 7.0.25.
      • We have not tested 7.0.26+ — the bug may already be fixed upstream. Filing for visibility.

            Assignee:
            Andrejs Poddubnaks
            Reporter:
            Sebastian Geisler
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: