-
Incident report
-
Resolution: Duplicate
-
Major
-
None
-
4.0 (plan)
-
None
Steps to reproduce:
unknown. possible happens after manually deleting some lld-items linked to triggers
Result:
14219:20181030:151309.599 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ...
inside the escalator thread (and "graceful" server shutdown after). repeated each time I've tried to restart server.
I've traced the problem to libs/zbxserver/expression.c substitute_simple_macros() - it accepts tons of pointer parameters, seems newer bothering to check them all for null values.
I've no time to dig future in ugly c code, but this dirty hack at least solved my problem (allowing escalator to skip problematic item and making time for garbage collector to clear escalations table):
@@ -2779,8 +2780,20 @@ res = FAIL; continue; } + zabbix_log(LOG_LEVEL_DEBUG, "after newer happen %s() data:'%s'", __function_name, *data); ret = SUCCEED; + if ( strncmp(*data,"Unknown: {TRIGGER.NAME}",strlen("Unknown: {TRIGGER.NAME}")) ) + { + zabbix_log(LOG_LEVEL_DEBUG, "unknow happen %s() data:'%s'", __function_name, *data); + + zbx_free(expression); + zbx_vector_uint64_destroy(&hostids); + + zabbix_log(LOG_LEVEL_DEBUG, "End %s() data:'%s'", __function_name, *data); + + return res; + } if (0 != (macro_type & (MACRO_TYPE_MESSAGE_NORMAL | MACRO_TYPE_MESSAGE_RECOVERY | MACRO_TYPE_MESSAGE_ACK)))
I never bothered to make "objdump", because the source of segfault is clearly in next 20-screens length nested `if`, but I've kept both the binary and 1.7G database dump, if someone want more clarification.
For me it seems clear enough - it fails in expanding macro values from already unexisting items in already unexisting trigger.
hope it will attract attention from someone with knowledge of this function internals.
- duplicates
-
ZBX-14908 Crashing escalator process zabbix_server 4.0
- Closed