[ZBX-15578] IPMI times out and fails to read values when polls aren't frequent enough Created: 2019 Feb 04  Updated: 2024 Apr 10  Resolved: 2019 Mar 22

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: 4.0.4rc2, 4.2.0alpha3
Fix Version/s: 4.0.6rc2, 4.2.0rc2, 4.2 (plan)

Type: Problem report Priority: Trivial
Reporter: Andrejs Sitals (Inactive) Assignee: Unassigned
Resolution: Fixed Votes: 1
Labels: ipmi
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 18.04, OpenIPMI 2.0.22 from repository, OpenIPMI 2.0.25 from sources


Attachments: File zabbix_server.log    
Issue Links:
Causes
causes ZBX-15935 zbx_perform_all_openipmi_ops can ente... Closed
causes ZBX-16058 Zabbix server queue (IPMI agent) has ... Closed
Duplicate
Sub-task
Team: Team C
Team: Team C
Sprint: Sprint 49 (Feb 2019), Sprint 50 (Mar 2019)
Story Points: 1

 Description   

Steps to reproduce:

  1. Set DebugLevel=4 in zabbix_server.conf
  2. Set StartIPMIPollers=1 in zabbix_server.conf
  3. Create host "IPMI timeouts" with a single IPMI interface
  4. Create item "Power", set update interval to 30s
  5. Start monitoring server's log by running
    tail -f /var/log/zabbix/zabbix_server.log | grep -i -E "(ipmi (poller|manager)|ipmi_lan\.c|network error|restored|power)"
  1. Run Zabbix server

Result:
See log file. There's a repeating pattern between "Connection 0 to the BMC is up" lines. Connection goes up, gets a value, then there are network errors with "Received IPMI error: c3" and "Received IPMI error: ff" log entries.

0xC3 is IPMI_TIMEOUT_CC in OpenIPMI. 0xFF is IPMI_UNKNOWN_ERR_CC.

There are also gaps in "Latest data" when these errors occur.

Expected:

  1. No "Received IPMI error" in log file
  2. No gaps in collected data

Proposed fix:
There are calls to os_hnd->perform_one_op(os_hnd) in some places. If I understand correctly, these calls allow OpenIPMI to do internal processing. If these calls aren't frequent enough, thenĀ OpenIPMI cannot do its work and timeouts happen.

To fix this, IPMI poller process should call os_hnd->perform_one_op(os_hnd) while it is waiting for messages from IPMI manager.



 Comments   
Comment by Andrejs Sitals (Inactive) [ 2019 Feb 04 ]

This issue was discovered while trying to reproduce ZBX-15152.

Comment by Andrejs Sitals (Inactive) [ 2019 Mar 19 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-15578

Comment by Andris Mednis [ 2019 Mar 22 ]

Available in versions:

  • pre-4.0.6rc2 r91473
  • pre-4.2.0rc2 (trunk) r91475
Generated at Fri Apr 19 19:03:06 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.