[ZBX-633] zabbix_server crash on disconnected IPMI devices Created: 2008 Dec 18  Updated: 2017 May 30  Resolved: 2010 Aug 23

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 1.6, 1.6.1, 1.6.2, 1.6.3, 1.6.4, 1.6.5, 1.6.6
Fix Version/s: 1.8.4, 1.9.0 (alpha)

Type: Incident report Priority: Major
Reporter: Jaroslaw Tabor Assignee: Unassigned
Resolution: Fixed Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Debian Etch/Stable with all updates, libopenipmi Version: 2.0.7-1,


Attachments: Text File slavik_zbx_log.txt    
Issue Links:
Duplicate
is duplicated by ZBX-2898 Zabbix server crash - ipmi problem Closed

 Description   

zabbix_server crashes, when some IPMI device becomes unavailable. Problem exists with 1.6.1 and pre-1.6.1 (I think it should be 1.6.2).

3627:20081218:013331 In substitute_simple_macros (data:"host1.ipmi.abc.local")
3627:20081218:013331 End substitute_simple_macros (result:host1.ipmi.abc.local)
3627:20081218:013331 In int_in_list(list:,value:10062)
3627:20081218:013331 End int_in_list(ret:FAIL)
3627:20081218:013331 In get_value(key
3627:20081218:013331 In get_value_ipmi(key
3627:20081218:013331 In init_ipmi_host([host1.ipmi.abc.local]:623)
3627:20081218:013331 In get_ipmi_host([host1.ipmi.abc.local]:623)
3627:20081218:013331 In get_ipmi_sensor_by_name() Fan 1 Tach@[host1.ipmi.abc.local]:623
3627:20081218:013331 In read_ipmi_sensor() Fan 1 Tach@[host1.ipmi.abc.local]:623
3627:20081218:013331 WARN: 0 ipmi_lan.c(lost_connection): Connection 0 to the BMC is down
3627:20081218:013331 SEVR: 0 ipmi_lan.c(lost_connection): All connections to the BMC are down
3627:20081218:013331 In setup_done() [host2.ipmi.abc.local]:623
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Fan 1 Tach@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Planar 1.5V@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Planar 1.8V @[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Planar 12V @[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Planar 5V @[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Planar 3.3V @[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor RSA II Detect0@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Ambient Temp@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor CPU OverTemp0@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor CPU PFA0@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Fan 5 Tach@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor SEL Fullness@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor CPU Vtt@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Fan 2 Tach@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Fan 3 Tach@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Fan 4 Tach@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In entity_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In control_change()
3627:20081218:013331 In delete_ipmi_control()
3627:20081218:013331 Control power@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In control_change()
3627:20081218:013331 In delete_ipmi_control()
3627:20081218:013331 Control reset@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In entity_change()
3627:20081218:013331 In domain_closed() [host2.ipmi.abc.local]:623
3627:20081218:013331 WARN: 0 ipmi_lan.c(lost_connection): Connection 0 to the BMC is down
3627:20081218:013331 SEVR: 0 ipmi_lan.c(lost_connection): All connections to the BMC are down
3578:20081218:013331 One child process died. Exiting ...
3587:20081218:013331 Got signal. Exiting ...
3589:20081218:013331 Got signal. Exiting ...
3593:20081218:013331 Got signal. Exiting ...
3590:20081218:013331 Got signal. Exiting ...
3599:20081218:013331 Got signal. Exiting ...
3595:20081218:013331 Got signal. Exiting ...
3619:20081218:013331 Got signal. Exiting ...
3597:20081218:013331 Got signal. Exiting ...
3601:20081218:013331 Got signal. Exiting ...
3628:20081218:013331 Got signal. Exiting ...
3604:20081218:013331 Got signal. Exiting ...
3585:20081218:013331 Got signal. Exiting ...
3615:20081218:013331 Got signal. Exiting ...
3581:20081218:013331 Got signal. Exiting ...
3610:20081218:013331 Got signal. Exiting ...
3580:20081218:013331 Got signal. Exiting ...
3613:20081218:013331 Got signal. Exiting ...
3617:20081218:013331 Got signal. Exiting ...
3583:20081218:013331 Got signal. Exiting ...
3623:20081218:013331 Got signal. Exiting ...
3621:20081218:013331 Got signal. Exiting ...
3607:20081218:013331 Got signal. Exiting ...
3611:20081218:013331 Got signal. Exiting ...
3602:20081218:013331 Got signal. Exiting ...
3625:20081218:013331 Got signal. Exiting ...
3579:20081218:013331 Got signal. Exiting ...
3631:20081218:013331 Got signal. Exiting ...
3578:20081218:013333 Query [SET CHARACTER SET utf8]
3578:20081218:013333 In free_ipmi_handler()
3578:20081218:013333 ZABBIX Server stopped. ZABBIX 1.6.1.



 Comments   
Comment by Igor Danoshaites (Inactive) [ 2009 Mar 13 ]

Also one more similar problem:

Originally has been posted on the Forum: http://www.zabbix.com/forum/showthread.php?p=43327#post43327
Log file in the attachment.

Current Zabbix server configuration:

                        1. GENERAL PARAMETERS #################

NodeID=651

StartPollers=10

StartIPMIPollers=10

#StartPollersUnreachable=1

#StartTrappers=5

#StartPingers=1

#StartDiscoverers=1

#StartHTTPPollers=1

#ListenPort=10051

#SourceIP=

#ListenIP=127.0.0.1

#HousekeepingFrequency=1

SenderFrequency=30

#DisableHousekeeping=1

  1. Specifies debug level
  1. 0 - debug is not created
  1. 1 - critical information
  1. 2 - error information
  1. 3 - warnings (default)
  1. 4 - for debugging (produces lots of information)

DebugLevel=4
Timeout=5

#TrapperTimeout=5

#UnreachablePeriod=45

#UnavailableDelay=60

PidFile=/var/tmp/zabbix_server.pid

LogFile=/tmp/zabbix_server.log

#LogFileSize=1

AlertScriptsPath=/home/zabbix/bin/

#ExternalScripts=/etc/zabbix/externalscripts

#FpingLocation=/usr/sbin/fping

#Fping6Location=/usr/sbin/fping6

#TmpDir=/tmp

#PingerFrequency=60

Comment by Roman Sozinov [ 2009 Aug 26 ]

Have the same issue. Zabbix 1.6.5, Centos 5.3, all system and openipmi libraries are updated from base repository.

Comment by Roman Sozinov [ 2009 Aug 27 ]

Zabbix 1.6.6 the same.

Comment by macindy [ 2009 Nov 30 ]

Same issue with Zabbix 1.6.7, gentoo, openipmi-2.0.11

26858:20091130:172744 Poller spent 0.064024 seconds while updating 19 values. Sleeping for 1 seconds
26868:20091130:172744 In get_values()
26868:20091130:172744 Query [select i.itemid,i.key_,h.host,h.port,i.delay,i.description,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h.useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.hostid,h.status,i.value_type,h.errors_from,i.snmp_port,i.d$
26868:20091130:172744 In substitute_simple_macros (data:"xxx")
26868:20091130:172744 In int_in_list(list:,value:10049)
26868:20091130:172744 End int_in_list(ret:FAIL)
26868:20091130:172744 In get_value(key:FAN2 PSU1)
26868:20091130:172744 In get_value_ipmi(key:FAN2 PSU1)
26868:20091130:172744 In init_ipmi_host([xxx]:623)
26868:20091130:172744 In get_ipmi_host([xxx]:623)
26868:20091130:172744 In get_ipmi_sensor_by_name() FAN2 PSU1@[xxx]:623
26868:20091130:172744 In read_ipmi_sensor() FAN2 PSU1@[xxx]:623
26868:20091130:172744 WARN: 0 ipmi_lan.c(lost_connection): Connection 0 to the BMC is down
26868:20091130:172744 SEVR: 0 ipmi_lan.c(lost_connection): All connections to the BMC are down
26854:20091130:172744 One child process died. Exiting ...
26858:20091130:172744 Got signal. Exiting ...
26860:20091130:172744 Got signal. Exiting ...
26862:20091130:172744 Got signal. Exiting ...
26864:20091130:172744 Got signal. Exiting ...
26865:20091130:172744 Got signal. Exiting ...
26861:20091130:172744 Got signal. Exiting ...
26867:20091130:172744 Got signal. Exiting ...
26863:20091130:172744 Got signal. Exiting ...
26866:20091130:172744 Got signal. Exiting ...
26859:20091130:172744 Got signal. Exiting ...
26854:20091130:172746 Query [SET CHARACTER SET utf8]
26854:20091130:172746 In free_database_cache()
26854:20091130:172746 End of free_database_cache()
26854:20091130:172746 In free_ipmi_handler()
26854:20091130:172746 ZABBIX Server stopped. ZABBIX 1.6.7 (revision 8252).

Comment by Alexei Vladishev [ 2010 Apr 01 ]

We confirm this problem with 1.8.2 as well. To be fixed soon.

Comment by Alexei Vladishev [ 2010 Apr 22 ]

We spend nearly two-weeks trying to debug this issue. It is very likely that the problem is related to the open-ipmi library.

For now, we added more debug information for IPMI checks, also removed linkage of IPMI library to Zabbix Agents.

All testing was performed with OpenIPMI 2.0.14 and 2.0.16. Different symptoms, but it fails to work normally with all versions.

Alexei

Comment by Vyacheslav [ 2010 Jun 29 ]

Have same issue in 1.8.3 (revision 13099), 100% failure rate if enabled ipmi from sun xf4100 or xf4600.

Comment by Vyacheslav [ 2010 Jun 29 ]

Zabbix 1.8.3 (revision 13099) crash log

Comment by ufocek [ 2010 Aug 19 ]

Any idea when this problem is fixed? I have a lot of important item based on the IPMI.

Comment by ufocek [ 2010 Aug 20 ]

Which zabbix version from 1.8.4 have fixed this problem, because I download the last version from night build zabbix 1.8.4rc1 and problem still occurs.

Comment by richlv [ 2010 Aug 20 ]

the problem is believed to be resolved, but has not been reviewed and merged back to the main branches yet. when that will happen, this issue will be closed

Comment by Aleksandrs Saveljevs [ 2010 Aug 20 ]

ufocek, if you wish to do some testing, feel free to svn checkout svn://svn.zabbix.com/branches/dev/zbx-633-ipmi-crash development branch.

Comment by Aleksandrs Saveljevs [ 2010 Aug 23 ]

Hopefully fixed in pre-1.8.4 in r14009.

Generated at Thu Apr 25 15:00:48 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.