Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-15578

IPMI times out and fails to read values when polls aren't frequent enough

XMLWordPrintable

    • Icon: Problem report Problem report
    • Resolution: Fixed
    • Icon: Trivial Trivial
    • 4.0.6rc2, 4.2.0rc2, 4.2 (plan)
    • 4.0.4rc2, 4.2.0alpha3
    • Proxy (P), Server (S)
    • Ubuntu 18.04, OpenIPMI 2.0.22 from repository, OpenIPMI 2.0.25 from sources
    • Team C
    • Sprint 49 (Feb 2019), Sprint 50 (Mar 2019)
    • 1

      Steps to reproduce:

      1. Set DebugLevel=4 in zabbix_server.conf
      2. Set StartIPMIPollers=1 in zabbix_server.conf
      3. Create host "IPMI timeouts" with a single IPMI interface
      4. Create item "Power", set update interval to 30s
      5. Start monitoring server's log by running
        tail -f /var/log/zabbix/zabbix_server.log | grep -i -E "(ipmi (poller|manager)|ipmi_lan\.c|network error|restored|power)"
      1. Run Zabbix server

      Result:
      See log file. There's a repeating pattern between "Connection 0 to the BMC is up" lines. Connection goes up, gets a value, then there are network errors with "Received IPMI error: c3" and "Received IPMI error: ff" log entries.

      0xC3 is IPMI_TIMEOUT_CC in OpenIPMI. 0xFF is IPMI_UNKNOWN_ERR_CC.

      There are also gaps in "Latest data" when these errors occur.

      Expected:

      1. No "Received IPMI error" in log file
      2. No gaps in collected data

      Proposed fix:
      There are calls to os_hnd->perform_one_op(os_hnd) in some places. If I understand correctly, these calls allow OpenIPMI to do internal processing. If these calls aren't frequent enough, thenĀ OpenIPMI cannot do its work and timeouts happen.

      To fix this, IPMI poller process should call os_hnd->perform_one_op(os_hnd) while it is waiting for messages from IPMI manager.

            Unassigned Unassigned
            asitals Andrejs Sitals
            Team C
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: