-
Problem report
-
Resolution: Unresolved
-
Trivial
-
None
-
6.0.9
-
None
Steps to reproduce:
- Set DebugLevel=4
- Add zabbix[java,,ping] and/or zabbix[java,,version] items
- Start zabbix-server
Result:
**
zabbix-server sometimes hangs up.
zabbix_server.log output stops like:
20229:20221101:150551.868 In get_values() 20229:20221101:150551.868 In DCconfig_get_poller_items() poller_type:5 20229:20221101:150551.868 End of DCconfig_get_poller_items():1 20229:20221101:150551.868 In substitute_key_macros_impl() data:'zabbix[java,,ping]' 20229:20221101:150551.868 End of substitute_key_macros_impl():SUCCEED data:'zabbix[java,,ping]' 20229:20221101:150551.868 In get_value() key:'zabbix[java,,ping]'
and zabbix-server can not be terminated by "systemctl stop zabbix-server".
Expected:
zabbix-server works without hangup.
Cause:
I attached the process using gdb. backtrace is:
(gdb) bt #0 0x00007f15f3aa481d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f15f3a9dac9 in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x000056096e046cab in __zbx_mutex_lock (filename=filename@entry=0x56096e15305a "log.c", line=line@entry=264, mutex=<optimized out>) at mutexs.c:441 #3 0x000056096dfc7391 in lock_log () at log.c:264 #4 0x000056096dfc8225 in __zbx_zabbix_log (level=1, fmt=0x56096e160cb0 "Got signal [signal:%d(%s),reason:%d,refaddr:%p]. Crashing ...") at log.c:434 #5 0x000056096e03b70d in fatal_signal_handler (sig=<optimized out>, siginfo=<optimized out>, context=0x7ffc7a49e280) at sighandler.c:58 #6 <signal handler called> #7 0x00007f15f194ec75 in __strlen_avx2 () from /lib64/libc.so.6 #8 0x00007f15f18eac6f in vfprintf () from /lib64/libc.so.6 #9 0x00007f15f19bf1ac in __vfprintf_chk () from /lib64/libc.so.6 #10 0x000056096dfc8324 in vfprintf (__ap=0x7ffc7a49ee38, __fmt=0x56096e128af8 "In %s() jmx_endpoint:'%s' num:%d", __stream=0x56096f9ca520) at /usr/include/bits/stdio2.h:130 #11 __zbx_zabbix_log (level=level@entry=4, fmt=fmt@entry=0x56096e128af8 "In %s() jmx_endpoint:'%s' num:%d") at log.c:459 #12 0x000056096deebce1 in get_values_java (request=request@entry=0 '\000', items=items@entry=0x7ffc7a4b4a40, results=results@entry=0x7ffc7a4b2a40, errcodes=errcodes@entry=0x7ffc7a4b1634, num=num@entry=1) at checks_java.c:133 #13 0x000056096deec264 in get_value_java (request=request@entry=0 '\000', item=item@entry=0x7ffc7a4b4a40, result=result@entry=0x7ffc7a4b2a40) at checks_java.c:121 #14 0x000056096dee9df1 in get_value_internal (item=item@entry=0x7ffc7a4b4a40, result=result@entry=0x7ffc7a4b2a40) at checks_internal.c:420 #15 0x000056096dee6da3 in get_value (add_results=<optimized out>, result=0x7ffc7a4b2a40, item=0x7ffc7a4b4a40) at poller.c:287 #16 zbx_check_items (items=0x7ffc7a4b4a40, errcodes=0x7ffc7a4b2840, num=1, results=0x7ffc7a4b2a40, add_results=<optimized out>, poller_type=<optimized out>) at poller.c:704 #17 0x000056096dee7343 in get_values (poller_type=poller_type@entry=5 '\005', nextcheck=nextcheck@entry=0x7ffc7a4bb9c0) at poller.c:807 #18 0x000056096dee78dd in poller_thread (args=args@entry=0x7ffc7a4bbaa0) at poller.c:973 #19 0x000056096e046f4b in zbx_thread_start (handler=0x56096dee7790 <poller_thread>, thread_args=0x7ffc7a4bbaa0, thread=0x56096f9f0940) at threads.c:124 #20 0x000056096dece4a9 in server_startup (listen_sock=0x7ffc7a4bbc50, rtc=0x7ffc7a4bbb90, ha_failover=0x56096e4bc110 <ha_failover_delay>, ha_stat=0x56096e4bc114 <ha_status>) at server.c:1783 #21 0x000056096decff6b in MAIN_ZABBIX_ENTRY (flags=flags@entry=0) at server.c:2111 #22 0x000056096e03a875 in daemon_start (allow_root=<optimized out>, user=<optimized out>, flags=0) at daemon.c:463 #23 0x000056096dec66bb in main (argc=3, argv=0x56096f9c0b40) at server.c:1149
SEGV occurs at get_values_java() in src/zabbix_server/poller/checks_java.c:
zabbix_log(LOG_LEVEL_DEBUG, "In %s() jmx_endpoint:'%s' num:%d", __func__, items[0].jmx_endpoint, num);
Then, SEGV signal handler is called and it tries to lock log file mutex, but mutex is already taken by original (non-signal handler) code.
A deadlock occurs like this.
I looked into codes and I found "jmx_endpoint" member of "items" variable is referenced without initialized when internal java gateway monitoring (zabbix[java,,ping] and zabbix[java,,version] items).
In addition, zabbix-server sometimes outputs strange logs like:
19878:20221101:150535.905 In get_values_java() jmx_endpoint:'<F3>^O^^<FA>H<8B>G^XH<8B>8<E9><A0><FF><FF><FF><F3>^O^^<FA>AWI<89><CF>AVA<89><D6>AUI<89><F5>ATUSH<89><FB>H<83><EC>xL<89>DdH<8B>^D%(' num:1
It says that "jmx_endpoint" is not initialized correctly.
Fix:
To fix the problem, "item" variable should be initialized correctly.
diff --git a/src/zabbix_server/poller/poller.c b/src/zabbix_server/poller/poller.c index 58338b6d91..9439348179 100644 --- a/src/zabbix_server/poller/poller.c +++ b/src/zabbix_server/poller/poller.c @@ -791,6 +791,7 @@ static int get_values(unsigned char poller_type, int *nextcheck) zabbix_log(LOG_LEVEL_DEBUG, "In %s()", __func__); + memset(&item, 0, sizeof(item)); items = &item; num = DCconfig_get_poller_items(poller_type, &items);