[ZBX-10983] zabbix server 3.0 crash - ipmi problem Created: 2016 Jul 11 Updated: 2017 Jul 14 Resolved: 2016 Dec 17 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 3.0.3 |
Fix Version/s: | 3.0.7rc1, 3.2.3rc1, 3.4.0alpha1 |
Type: | Incident report | Priority: | Major |
Reporter: | Krzysztof P | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Centos 6.7 x86_64 |
Attachments: | backtrace.txt objdump.zip | ||||||||||||||||
Issue Links: |
|
Description |
Zabbix Server 3.0 crash on ipmi |
Comments |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Jul 11 ] |
Can you please attach the results of objdump -DSswx zabbix_server too? |
Comment by Krzysztof P [ 2016 Jul 11 ] |
HI, I added an attachment with the result objdump -DSswx zabbix_server |
Comment by Aleksandrs Saveljevs [ 2016 Jul 11 ] |
Potentially related issue: |
Comment by Krzysztof P [ 2016 Jul 11 ] |
HI Aleksandrs, In my enviroment i have packages versions This is the latest version of these packages, available in the official Centos repository. Only ipmitool have new version, ipmitool-1.8.15-2. If is needed i can compile from source OpenIPMI in version OpenIPMI-2.0.22 (latest version). How is your recomendation? |
Comment by Aleksandrs Saveljevs [ 2016 Jul 11 ] |
The backtrace is too short and there is no information on what the server was doing prior to the crash. Theoretically, you could enable DebugLevel=4 for IPMI pollers, although it might generate quite a bit of logs if you have 17 IPMI pollers and it crashes once a month... |
Comment by Krzysztof P [ 2016 Jul 12 ] |
Compiled and installed to the latest version of OpenIPMI (2.0.22) |
Comment by Aleksandrs Saveljevs [ 2016 Sep 20 ] |
Any new crashes since July? |
Comment by Krzysztof P [ 2016 Nov 10 ] |
After OpenIPMI update to 2.0.22, problem doesn't occur. |
Comment by Aleksandrs Saveljevs [ 2016 Nov 11 ] |
Thank you! Closing as "Cannot reproduce" then. If the issue manifests again, please reopen. |
Comment by Andris Mednis [ 2016 Nov 25 ] |
This small change helped to prevent 'ipmi poller' crash: Index: src/zabbix_server/poller/checks_ipmi.c =================================================================== --- src/zabbix_server/poller/checks_ipmi.c (revision 64040) +++ src/zabbix_server/poller/checks_ipmi.c (working copy) @@ -1037,8 +1037,6 @@ { int i; - h->con->close_connection(h->con); - for (i = 0; i < h->control_count; i++) { zbx_free(h->controls[i].c_name); (Note: this change is not yet reviewed and approved). There is one more place where modified zbx_free_ipmi_connection() is called - in zbx_free_ipmi_handler(). But zbx_free_ipmi_handler() is called only from zbx_on_exit() - we can leave connection open, it will be closed by OS. |
Comment by Andris Mednis [ 2016 Nov 25 ] |
Seems like |
Comment by Andris Mednis [ 2016 Nov 28 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-10983 . |
Comment by Aleksandrs Saveljevs [ 2016 Dec 07 ] |
Took a look at OpenHPI source code (http://openhpi.org/), which also uses OpenIPMI library. It does not seem to call close_connection() either - only ipmi_domain_close(). |
Comment by Aleksandrs Saveljevs [ 2016 Dec 14 ] |
The fix looks good. I could reproduce the crash and Valgrind complaints using Andris' scenario before the fix, but not with the fix applied. The only suspicious place is zbx_close_inactive_host(), where we call zbx_domain_close_cb(), which calls ipmi_domain_close(). That last function can fail, in which case we do not free the host in zbx_close_inactive_host(). That leads to a question: if we fail to close a domain on first attempt, what are our chances of succeeding on following attempts? However, while I could reproduce ipmi_domain_close() failing in zbx_connection_change_cb(), which happens in zbx_init_ipmi_host(), I have not managed to reproduce it failing inside zbx_close_inactive_host(). Still, it may be a topic for further consideration. |
Comment by Andris Mednis [ 2016 Dec 14 ] |
I decided to keep a host in our list in case ipmi_domain_close() fails. This seems a safer choice as we do not know what data the OpenIPMI library might keep about that host, what our callbacks might be called later. |
Comment by Andris Mednis [ 2016 Dec 16 ] |
Fixed in versions:
|
Comment by Andris Mednis [ 2016 Dec 16 ] |
Documented in:
martins-v Reviewed, with minor wording changes. CLOSED. |