ZABBIX BUGS AND ISSUES

zabbix_server crash on disconnected IPMI devices

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 1.6, 1.6.1, 1.6.2, 1.6.3, 1.6.4, 1.6.5, 1.6.6
  • Fix Version/s: 1.8.4, 1.9.0 (alpha)
  • Component/s: Server (S)
  • Labels:
    None
  • Environment:
    Debian Etch/Stable with all updates, libopenipmi Version: 2.0.7-1,

Description

zabbix_server crashes, when some IPMI device becomes unavailable. Problem exists with 1.6.1 and pre-1.6.1 (I think it should be 1.6.2).

3627:20081218:013331 In substitute_simple_macros (data:"host1.ipmi.abc.local")
3627:20081218:013331 End substitute_simple_macros (result:host1.ipmi.abc.local)
3627:20081218:013331 In int_in_list(list:,value:10062)
3627:20081218:013331 End int_in_list(ret:FAIL)
3627:20081218:013331 In get_value(key:)
3627:20081218:013331 In get_value_ipmi(key:)
3627:20081218:013331 In init_ipmi_host([host1.ipmi.abc.local]:623)
3627:20081218:013331 In get_ipmi_host([host1.ipmi.abc.local]:623)
3627:20081218:013331 In get_ipmi_sensor_by_name() Fan 1 Tach@[host1.ipmi.abc.local]:623
3627:20081218:013331 In read_ipmi_sensor() Fan 1 Tach@[host1.ipmi.abc.local]:623
3627:20081218:013331 WARN: 0 ipmi_lan.c(lost_connection): Connection 0 to the BMC is down
3627:20081218:013331 SEVR: 0 ipmi_lan.c(lost_connection): All connections to the BMC are down
3627:20081218:013331 In setup_done() [host2.ipmi.abc.local]:623
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Fan 1 Tach@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Planar 1.5V@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Planar 1.8V @[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Planar 12V @[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Planar 5V @[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Planar 3.3V @[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor RSA II Detect0@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Ambient Temp@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor CPU OverTemp0@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor CPU PFA0@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Fan 5 Tach@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor SEL Fullness@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor CPU Vtt@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Fan 2 Tach@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Fan 3 Tach@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In entity_change()
3627:20081218:013331 In sensor_change()
3627:20081218:013331 In delete_ipmi_sensor()
3627:20081218:013331 Sensor Fan 4 Tach@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In entity_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In entity_change()
3627:20081218:013331 In control_change()
3627:20081218:013331 In delete_ipmi_control()
3627:20081218:013331 Control power@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In control_change()
3627:20081218:013331 In delete_ipmi_control()
3627:20081218:013331 Control reset@[host2.ipmi.abc.local]:623 deleted
3627:20081218:013331 In entity_change()
3627:20081218:013331 In domain_closed() [host2.ipmi.abc.local]:623
3627:20081218:013331 WARN: 0 ipmi_lan.c(lost_connection): Connection 0 to the BMC is down
3627:20081218:013331 SEVR: 0 ipmi_lan.c(lost_connection): All connections to the BMC are down
3578:20081218:013331 One child process died. Exiting ...
3587:20081218:013331 Got signal. Exiting ...
3589:20081218:013331 Got signal. Exiting ...
3593:20081218:013331 Got signal. Exiting ...
3590:20081218:013331 Got signal. Exiting ...
3599:20081218:013331 Got signal. Exiting ...
3595:20081218:013331 Got signal. Exiting ...
3619:20081218:013331 Got signal. Exiting ...
3597:20081218:013331 Got signal. Exiting ...
3601:20081218:013331 Got signal. Exiting ...
3628:20081218:013331 Got signal. Exiting ...
3604:20081218:013331 Got signal. Exiting ...
3585:20081218:013331 Got signal. Exiting ...
3615:20081218:013331 Got signal. Exiting ...
3581:20081218:013331 Got signal. Exiting ...
3610:20081218:013331 Got signal. Exiting ...
3580:20081218:013331 Got signal. Exiting ...
3613:20081218:013331 Got signal. Exiting ...
3617:20081218:013331 Got signal. Exiting ...
3583:20081218:013331 Got signal. Exiting ...
3623:20081218:013331 Got signal. Exiting ...
3621:20081218:013331 Got signal. Exiting ...
3607:20081218:013331 Got signal. Exiting ...
3611:20081218:013331 Got signal. Exiting ...
3602:20081218:013331 Got signal. Exiting ...
3625:20081218:013331 Got signal. Exiting ...
3579:20081218:013331 Got signal. Exiting ...
3631:20081218:013331 Got signal. Exiting ...
3578:20081218:013333 Query [SET CHARACTER SET utf8]
3578:20081218:013333 In free_ipmi_handler()
3578:20081218:013333 ZABBIX Server stopped. ZABBIX 1.6.1.

Issue Links

Activity

Alexei Vladishev made changes -
Field Original Value New Value
Assignee Alexei Vladishev [ alexei ] Alexander Vladishev [ sasha ]
Hide
Igor Danoshaites added a comment -

Also one more similar problem:

Originally has been posted on the Forum: http://www.zabbix.com/forum/showthread.php?p=43327#post43327
Log file in the attachment.

Current Zabbix server configuration:

                        1. GENERAL PARAMETERS #################

NodeID=651

StartPollers=10

StartIPMIPollers=10

#StartPollersUnreachable=1

#StartTrappers=5

#StartPingers=1

#StartDiscoverers=1

#StartHTTPPollers=1

#ListenPort=10051

#SourceIP=

#ListenIP=127.0.0.1

#HousekeepingFrequency=1

SenderFrequency=30

#DisableHousekeeping=1

  1. Specifies debug level
  1. 0 - debug is not created
  1. 1 - critical information
  1. 2 - error information
  1. 3 - warnings (default)
  1. 4 - for debugging (produces lots of information)

DebugLevel=4
Timeout=5

#TrapperTimeout=5

#UnreachablePeriod=45

#UnavailableDelay=60

PidFile=/var/tmp/zabbix_server.pid

LogFile=/tmp/zabbix_server.log

#LogFileSize=1

AlertScriptsPath=/home/zabbix/bin/

#ExternalScripts=/etc/zabbix/externalscripts

#FpingLocation=/usr/sbin/fping

#Fping6Location=/usr/sbin/fping6

#TmpDir=/tmp

#PingerFrequency=60

Show
Igor Danoshaites added a comment - Also one more similar problem: Originally has been posted on the Forum: http://www.zabbix.com/forum/showthread.php?p=43327#post43327 Log file in the attachment. Current Zabbix server configuration:
                        1. GENERAL PARAMETERS #################
NodeID=651 StartPollers=10 StartIPMIPollers=10 #StartPollersUnreachable=1 #StartTrappers=5 #StartPingers=1 #StartDiscoverers=1 #StartHTTPPollers=1 #ListenPort=10051 #SourceIP= #ListenIP=127.0.0.1 #HousekeepingFrequency=1 SenderFrequency=30 #DisableHousekeeping=1
  1. Specifies debug level
  1. 0 - debug is not created
  1. 1 - critical information
  1. 2 - error information
  1. 3 - warnings (default)
  1. 4 - for debugging (produces lots of information)
DebugLevel=4 Timeout=5 #TrapperTimeout=5 #UnreachablePeriod=45 #UnavailableDelay=60 PidFile=/var/tmp/zabbix_server.pid LogFile=/tmp/zabbix_server.log #LogFileSize=1 AlertScriptsPath=/home/zabbix/bin/ #ExternalScripts=/etc/zabbix/externalscripts #FpingLocation=/usr/sbin/fping #Fping6Location=/usr/sbin/fping6 #TmpDir=/tmp #PingerFrequency=60
Igor Danoshaites made changes -
Attachment zabbix_server.log [ 10715 ]
Igor Danoshaites made changes -
Attachment zabbix_server.log [ 10715 ]
Hide
Roman Sozinov added a comment -

Have the same issue. Zabbix 1.6.5, Centos 5.3, all system and openipmi libraries are updated from base repository.

Show
Roman Sozinov added a comment - Have the same issue. Zabbix 1.6.5, Centos 5.3, all system and openipmi libraries are updated from base repository.
Hide
Roman Sozinov added a comment -

Zabbix 1.6.6 the same.

Show
Roman Sozinov added a comment - Zabbix 1.6.6 the same.
Alexei Vladishev made changes -
Workflow jira [ 11284 ] Zabbix workflow [ 12421 ]
richlv made changes -
Affects Version/s 1.6.1 [ 10040 ]
Affects Version/s 1.6.2 [ 10041 ]
Affects Version/s 1.6.3 [ 10042 ]
Affects Version/s 1.6.4 [ 10043 ]
Affects Version/s 1.6.5 [ 10044 ]
Affects Version/s 1.6.6 [ 10045 ]
Alexei Vladishev made changes -
Workflow Zabbix workflow [ 12421 ] Zabbix workflow2 [ 13452 ]
Hide
macindy added a comment -

Same issue with Zabbix 1.6.7, gentoo, openipmi-2.0.11

26858:20091130:172744 Poller spent 0.064024 seconds while updating 19 values. Sleeping for 1 seconds
26868:20091130:172744 In get_values()
26868:20091130:172744 Query [select i.itemid,i.key_,h.host,h.port,i.delay,i.description,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h.useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.hostid,h.status,i.value_type,h.errors_from,i.snmp_port,i.d$
26868:20091130:172744 In substitute_simple_macros (data:"xxx")
26868:20091130:172744 In int_in_list(list:,value:10049)
26868:20091130:172744 End int_in_list(ret:FAIL)
26868:20091130:172744 In get_value(key:FAN2 PSU1)
26868:20091130:172744 In get_value_ipmi(key:FAN2 PSU1)
26868:20091130:172744 In init_ipmi_host([xxx]:623)
26868:20091130:172744 In get_ipmi_host([xxx]:623)
26868:20091130:172744 In get_ipmi_sensor_by_name() FAN2 PSU1@[xxx]:623
26868:20091130:172744 In read_ipmi_sensor() FAN2 PSU1@[xxx]:623
26868:20091130:172744 WARN: 0 ipmi_lan.c(lost_connection): Connection 0 to the BMC is down
26868:20091130:172744 SEVR: 0 ipmi_lan.c(lost_connection): All connections to the BMC are down
26854:20091130:172744 One child process died. Exiting ...
26858:20091130:172744 Got signal. Exiting ...
26860:20091130:172744 Got signal. Exiting ...
26862:20091130:172744 Got signal. Exiting ...
26864:20091130:172744 Got signal. Exiting ...
26865:20091130:172744 Got signal. Exiting ...
26861:20091130:172744 Got signal. Exiting ...
26867:20091130:172744 Got signal. Exiting ...
26863:20091130:172744 Got signal. Exiting ...
26866:20091130:172744 Got signal. Exiting ...
26859:20091130:172744 Got signal. Exiting ...
26854:20091130:172746 Query [SET CHARACTER SET utf8]
26854:20091130:172746 In free_database_cache()
26854:20091130:172746 End of free_database_cache()
26854:20091130:172746 In free_ipmi_handler()
26854:20091130:172746 ZABBIX Server stopped. ZABBIX 1.6.7 (revision 8252).

Show
macindy added a comment - Same issue with Zabbix 1.6.7, gentoo, openipmi-2.0.11 26858:20091130:172744 Poller spent 0.064024 seconds while updating 19 values. Sleeping for 1 seconds 26868:20091130:172744 In get_values() 26868:20091130:172744 Query [select i.itemid,i.key_,h.host,h.port,i.delay,i.description,i.nextcheck,i.type,i.snmp_community,i.snmp_oid,h.useip,h.ip,i.history,i.lastvalue,i.prevvalue,i.hostid,h.status,i.value_type,h.errors_from,i.snmp_port,i.d$ 26868:20091130:172744 In substitute_simple_macros (data:"xxx") 26868:20091130:172744 In int_in_list(list:,value:10049) 26868:20091130:172744 End int_in_list(ret:FAIL) 26868:20091130:172744 In get_value(key:FAN2 PSU1) 26868:20091130:172744 In get_value_ipmi(key:FAN2 PSU1) 26868:20091130:172744 In init_ipmi_host([xxx]:623) 26868:20091130:172744 In get_ipmi_host([xxx]:623) 26868:20091130:172744 In get_ipmi_sensor_by_name() FAN2 PSU1@[xxx]:623 26868:20091130:172744 In read_ipmi_sensor() FAN2 PSU1@[xxx]:623 26868:20091130:172744 WARN: 0 ipmi_lan.c(lost_connection): Connection 0 to the BMC is down 26868:20091130:172744 SEVR: 0 ipmi_lan.c(lost_connection): All connections to the BMC are down 26854:20091130:172744 One child process died. Exiting ... 26858:20091130:172744 Got signal. Exiting ... 26860:20091130:172744 Got signal. Exiting ... 26862:20091130:172744 Got signal. Exiting ... 26864:20091130:172744 Got signal. Exiting ... 26865:20091130:172744 Got signal. Exiting ... 26861:20091130:172744 Got signal. Exiting ... 26867:20091130:172744 Got signal. Exiting ... 26863:20091130:172744 Got signal. Exiting ... 26866:20091130:172744 Got signal. Exiting ... 26859:20091130:172744 Got signal. Exiting ... 26854:20091130:172746 Query [SET CHARACTER SET utf8] 26854:20091130:172746 In free_database_cache() 26854:20091130:172746 End of free_database_cache() 26854:20091130:172746 In free_ipmi_handler() 26854:20091130:172746 ZABBIX Server stopped. ZABBIX 1.6.7 (revision 8252).
Alexei Vladishev made changes -
Workflow Zabbix workflow2 [ 13452 ] Zabbix workflow [ 15134 ]
Alexei Vladishev made changes -
Assignee Alexander Vladishev [ sasha ]
Hide
Alexei Vladishev added a comment -

We confirm this problem with 1.8.2 as well. To be fixed soon.

Show
Alexei Vladishev added a comment - We confirm this problem with 1.8.2 as well. To be fixed soon.
Alexei Vladishev made changes -
Status Open [ 1 ] Confirmed [ 10000 ]
Assignee Alexander Vladishev [ sasha ]
Alexei Vladishev made changes -
Fix Version/s 1.8.3 [ 10063 ]
Alexander Vladishev made changes -
Status Confirmed [ 10000 ] In Progress [ 3 ]
Alexander Vladishev made changes -
Assignee Alexander Vladishev [ sasha ] Aleksandrs Saveljevs [ asaveljevs ]
Aleksandrs Saveljevs made changes -
Assignee Aleksandrs Saveljevs [ asaveljevs ] Alexander Vladishev [ sasha ]
Alexei Vladishev made changes -
Fix Version/s 1.8.4 [ 10081 ]
Fix Version/s 1.8.3 [ 10063 ]
Hide
Alexei Vladishev added a comment -

We spend nearly two-weeks trying to debug this issue. It is very likely that the problem is related to the open-ipmi library.

For now, we added more debug information for IPMI checks, also removed linkage of IPMI library to Zabbix Agents.

All testing was performed with OpenIPMI 2.0.14 and 2.0.16. Different symptoms, but it fails to work normally with all versions.

Alexei

Show
Alexei Vladishev added a comment - We spend nearly two-weeks trying to debug this issue. It is very likely that the problem is related to the open-ipmi library. For now, we added more debug information for IPMI checks, also removed linkage of IPMI library to Zabbix Agents. All testing was performed with OpenIPMI 2.0.14 and 2.0.16. Different symptoms, but it fails to work normally with all versions. Alexei
Alexander Vladishev made changes -
Status In Progress [ 3 ] Open [ 1 ]
Hide
Vyacheslav added a comment -

Have same issue in 1.8.3 (revision 13099), 100% failure rate if enabled ipmi from sun xf4100 or xf4600.

Show
Vyacheslav added a comment - Have same issue in 1.8.3 (revision 13099), 100% failure rate if enabled ipmi from sun xf4100 or xf4600.
Hide
Vyacheslav added a comment -

Zabbix 1.8.3 (revision 13099) crash log

Show
Vyacheslav added a comment - Zabbix 1.8.3 (revision 13099) crash log
Vyacheslav made changes -
Attachment slavik_zbx_log.txt [ 13263 ]
Aleksandrs Saveljevs made changes -
Link This issue is duplicated by ZBX-2898 [ ZBX-2898 ]
Hide
ufocek added a comment -

Any idea when this problem is fixed? I have a lot of important item based on the IPMI.

Show
ufocek added a comment - Any idea when this problem is fixed? I have a lot of important item based on the IPMI.
Aleksandrs Saveljevs made changes -
Assignee Alexander Vladishev [ sasha ] Aleksandrs Saveljevs [ asaveljevs ]
Aleksandrs Saveljevs made changes -
Status Open [ 1 ] In Progress [ 3 ]
Aleksandrs Saveljevs made changes -
Status In Progress [ 3 ] Resolved [ 5 ]
Assignee Aleksandrs Saveljevs [ asaveljevs ] Alexander Vladishev [ sasha ]
Fix Version/s 1.9 (trunk) [ 10046 ]
Resolution Fixed [ 1 ]
Hide
ufocek added a comment -

Which zabbix version from 1.8.4 have fixed this problem, because I download the last version from night build zabbix 1.8.4rc1 and problem still occurs.

Show
ufocek added a comment - Which zabbix version from 1.8.4 have fixed this problem, because I download the last version from night build zabbix 1.8.4rc1 and problem still occurs.
Hide
richlv added a comment -

the problem is believed to be resolved, but has not been reviewed and merged back to the main branches yet. when that will happen, this issue will be closed

Show
richlv added a comment - the problem is believed to be resolved, but has not been reviewed and merged back to the main branches yet. when that will happen, this issue will be closed
Hide
Aleksandrs Saveljevs added a comment -

ufocek, if you wish to do some testing, feel free to svn checkout svn://svn.zabbix.com/branches/dev/zbx-633-ipmi-crash development branch.

Show
Aleksandrs Saveljevs added a comment - ufocek, if you wish to do some testing, feel free to svn checkout svn://svn.zabbix.com/branches/dev/zbx-633-ipmi-crash development branch.
Alexander Vladishev made changes -
Status Resolved [ 5 ] Tested [ 10002 ]
Assignee Alexander Vladishev [ sasha ] Aleksandrs Saveljevs [ asaveljevs ]
Hide
Aleksandrs Saveljevs added a comment -

Hopefully fixed in pre-1.8.4 in r14009.

Show
Aleksandrs Saveljevs added a comment - Hopefully fixed in pre-1.8.4 in r14009.
Aleksandrs Saveljevs made changes -
Status Tested [ 10002 ] Closed [ 6 ]

People

Vote (4)
Watch (4)

Dates

  • Created:
    Updated:
    Resolved: