[ZBX-10983] zabbix server 3.0 crash - ipmi problem Created: 2016 Jul 11  Updated: 2017 Jul 14  Resolved: 2016 Dec 17

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.0.3
Fix Version/s: 3.0.7rc1, 3.2.3rc1, 3.4.0alpha1

Type: Incident report Priority: Major
Reporter: Krzysztof P Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Centos 6.7 x86_64


Attachments: Text File backtrace.txt     Zip Archive objdump.zip    
Issue Links:
Duplicate
is duplicated by ZBX-11169 Zabbix server 3.0 keeps crashing Closed
Sub-task
depends on ZBX-4823 unreachable pollers may "hang" when a... Closed

 Description   

Zabbix Server 3.0 crash on ipmi
Server is in production enviroment. This is a first crash after update from version 2.2.11, server was working properly from 17.06.2016 to last night.
Enviroment
CentOS release 6.7
Mysql server version 5.6.27-2.el6.x86_64



 Comments   
Comment by Glebs Ivanovskis (Inactive) [ 2016 Jul 11 ]

Can you please attach the results of objdump -DSswx zabbix_server too?

Comment by Krzysztof P [ 2016 Jul 11 ]

HI,

I added an attachment with the result objdump -DSswx zabbix_server

Comment by Aleksandrs Saveljevs [ 2016 Jul 11 ]

Potentially related issue: ZBX-10940.

Comment by Krzysztof P [ 2016 Jul 11 ]

HI Aleksandrs,

In my enviroment i have packages versions
OpenIPMI-devel-2.0.16-14.el6.x86_64
OpenIPMI-libs-2.0.16-14.el6.x86_64
OpenIPMI-2.0.16-14.el6.x86_64
ipmitool-1.8.11-29.el6_7.x86_64

This is the latest version of these packages, available in the official Centos repository. Only ipmitool have new version, ipmitool-1.8.15-2. If is needed i can compile from source OpenIPMI in version OpenIPMI-2.0.22 (latest version). How is your recomendation?

Comment by Aleksandrs Saveljevs [ 2016 Jul 11 ]

The backtrace is too short and there is no information on what the server was doing prior to the crash. Theoretically, you could enable DebugLevel=4 for IPMI pollers, although it might generate quite a bit of logs if you have 17 IPMI pollers and it crashes once a month...

Comment by Krzysztof P [ 2016 Jul 12 ]

Compiled and installed to the latest version of OpenIPMI (2.0.22)

Comment by Aleksandrs Saveljevs [ 2016 Sep 20 ]

Any new crashes since July?

Comment by Krzysztof P [ 2016 Nov 10 ]

After OpenIPMI update to 2.0.22, problem doesn't occur.

Comment by Aleksandrs Saveljevs [ 2016 Nov 11 ]

Thank you! Closing as "Cannot reproduce" then. If the issue manifests again, please reopen.

Comment by Andris Mednis [ 2016 Nov 25 ]

This small change helped to prevent 'ipmi poller' crash:

Index: src/zabbix_server/poller/checks_ipmi.c
===================================================================
--- src/zabbix_server/poller/checks_ipmi.c      (revision 64040)
+++ src/zabbix_server/poller/checks_ipmi.c      (working copy)
@@ -1037,8 +1037,6 @@
 {
        int     i;
 
-       h->con->close_connection(h->con);
-
        for (i = 0; i < h->control_count; i++)
        {
                zbx_free(h->controls[i].c_name);

(Note: this change is not yet reviewed and approved).

There is one more place where modified zbx_free_ipmi_connection() is called - in zbx_free_ipmi_handler(). But zbx_free_ipmi_handler() is called only from zbx_on_exit() - we can leave connection open, it will be closed by OS.

Comment by Andris Mednis [ 2016 Nov 25 ]

Seems like ZBX-10940 is not related to this ZBX-10983 crash. ZBX-10940 is a crash in 'unreachable poller' process, zbx_free_ipmi_connection() do not run there as part of normal operation (except at termination in zbx_on_exit()).

Comment by Andris Mednis [ 2016 Nov 28 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-10983 .

Comment by Aleksandrs Saveljevs [ 2016 Dec 07 ]

Took a look at OpenHPI source code (http://openhpi.org/), which also uses OpenIPMI library. It does not seem to call close_connection() either - only ipmi_domain_close().

Comment by Aleksandrs Saveljevs [ 2016 Dec 14 ]

The fix looks good. I could reproduce the crash and Valgrind complaints using Andris' scenario before the fix, but not with the fix applied.

The only suspicious place is zbx_close_inactive_host(), where we call zbx_domain_close_cb(), which calls ipmi_domain_close(). That last function can fail, in which case we do not free the host in zbx_close_inactive_host(). That leads to a question: if we fail to close a domain on first attempt, what are our chances of succeeding on following attempts? However, while I could reproduce ipmi_domain_close() failing in zbx_connection_change_cb(), which happens in zbx_init_ipmi_host(), I have not managed to reproduce it failing inside zbx_close_inactive_host(). Still, it may be a topic for further consideration.

Comment by Andris Mednis [ 2016 Dec 14 ]

I decided to keep a host in our list in case ipmi_domain_close() fails. This seems a safer choice as we do not know what data the OpenIPMI library might keep about that host, what our callbacks might be called later.

Comment by Andris Mednis [ 2016 Dec 16 ]

Fixed in versions:

  • pre-3.0.7rc1 r64510,
  • pre-3.2.3rc1 r64511,
  • pre-3.3.0 (trunk) r64512.
Comment by Andris Mednis [ 2016 Dec 16 ]

Documented in:

martins-v Reviewed, with minor wording changes. CLOSED.

Generated at Tue Jan 21 12:54:28 EET 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.