-
Problem report
-
Resolution: Fixed
-
Critical
-
4.0.6, 4.2.0
-
OpenIPMI >= 2.0.26
-
Sprint 51 (Apr 2019), Sprint 52 (May 2019)
-
0.5
Steps to reproduce:
- Have IPMI checks which enter zbx_perform_all_openipmi_ops() and return from perform_one_op() before the timeout expires. perform_one_op() updates the remaining timeout, which will get driven down to 0.0, at which point we just sit and spin.
Result:
**
100% CPU usage on IPMI thread
Expected:
Not this.
Further discussion:
Once perform_one_op() returns before timeout, we never break out of the loop, since we reset start_time each cycle, but we keep comparing duration against the original timeout, not the (updated by perform_one_op()) remaining timeout.
Since perform_one_op() updates the remaining timeout internally (and returns a timeout of {0,0} if it did timeout), skip the start_time tracking completely, and just loop while (tv.tv_sec + tv.tv_usec > 0) and reset the tv to the timeout at the start of each loop:
void zbx_perform_all_openipmi_ops(int timeout) { struct timeval tv = {1, 0}; while (tv.tv_sec + tv.tv_usec > 0) { int res; tv.tv_sec = timeout; tv.tv_usec = 0; res = os_hnd->perform_one_op(os_hnd, &tv); /* perform_one_op() returns 0 on success, errno on failure (timeout means success) */ if (0 != res) { zabbix_log(LOG_LEVEL_DEBUG, "IPMI error: %s", zbx_strerror(res)); break; } } }
- caused by
-
ZBX-15578 IPMI times out and fails to read values when polls aren't frequent enough
- Closed