[ZBX-15935] zbx_perform_all_openipmi_ops can enter infinite loop Created: 2019 Apr 04 Updated: 2024 Apr 10 Resolved: 2019 May 19 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Proxy (P), Server (S) |
Affects Version/s: | 4.0.6, 4.2.0 |
Fix Version/s: | 4.0.8rc1, 4.2.2rc1, 4.4.0alpha1, 4.4 (plan) |
Type: | Problem report | Priority: | Critical |
Reporter: | Eric A. Borisch | Assignee: | Andrejs Sitals (Inactive) |
Resolution: | Fixed | Votes: | 0 |
Labels: | bug, ipmi | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
OpenIPMI >= 2.0.26 |
Issue Links: |
|
||||||||
Team: | Team I | ||||||||
Team: | Team I | ||||||||
Sprint: | Sprint 51 (Apr 2019), Sprint 52 (May 2019) | ||||||||
Story Points: | 0.5 |
Description |
Steps to reproduce:
Result: 100% CPU usage on IPMI thread Expected: Not this.
Further discussion: Once perform_one_op() returns before timeout, we never break out of the loop, since we reset start_time each cycle, but we keep comparing duration against the original timeout, not the (updated by perform_one_op()) remaining timeout. Since perform_one_op() updates the remaining timeout internally (and returns a timeout of {0,0} if it did timeout), skip the start_time tracking completely, and just loop while (tv.tv_sec + tv.tv_usec > 0) and reset the tv to the timeout at the start of each loop: void zbx_perform_all_openipmi_ops(int timeout) { struct timeval tv = {1, 0}; while (tv.tv_sec + tv.tv_usec > 0) { int res; tv.tv_sec = timeout; tv.tv_usec = 0; res = os_hnd->perform_one_op(os_hnd, &tv); /* perform_one_op() returns 0 on success, errno on failure (timeout means success) */ if (0 != res) { zabbix_log(LOG_LEVEL_DEBUG, "IPMI error: %s", zbx_strerror(res)); break; } } } |
Comments |
Comment by Andrejs Sitals (Inactive) [ 2019 Apr 04 ] |
Thanks for your report, eborisch. perform_one_op() just passes timeout to sel_select() which is defined in selector.c. It doesn't do anything else with timeout. sel_select() started updating timeout in version 2.0.26 which was released on 2018-12-14. It didn't modify timeout in 2.0.25 and older versions. |
Comment by Eric A. Borisch [ 2019 Apr 04 ] |
Aha; yes I am running 2.0.27, so that explains it. Perhaps just resetting tv after every call, then. Thanks for digging into this! |
Comment by Andrejs Sitals (Inactive) [ 2019 Apr 08 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-15935 |
Comment by Eric A. Borisch [ 2019 Apr 23 ] |
Doesn't appear to have made the window for 4.0.7... |
Comment by Andrejs Sitals (Inactive) [ 2019 Apr 24 ] |
Available in versions:
|
Comment by Eric A. Borisch [ 2019 Apr 26 ] |
No surprise, but also encountered on FreeBSD with zabbix_proxy performing IPMI checks – pegged the CPU. Current FreeBSD ports versions of openipmi and zabbix4 are 2.0.27 and 4.0.7, respectively. |