[ZBX-14500] zabbix_server crash on macro expansion with invalid value Created: 2018 Jun 19 Updated: 2018 Jul 10 Resolved: 2018 Jun 29 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 3.4.11, 4.0.0alpha8 |
Fix Version/s: | 3.4.12rc1, 4.0.0alpha9, 4.0 (plan) |
Type: | Problem report | Priority: | Minor |
Reporter: | Andy Bertier | Assignee: | Vladislavs Sokurenko |
Resolution: | Fixed | Votes: | 0 |
Labels: | crash, macros, triggers | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
"zabbix appliance image (Ubuntu)" on HyperV |
Attachments: | objdump_DSswx.gz zabbix_server_backtrace.log zabbix_server_logAtCrash.log | ||||||||||||
Issue Links: |
|
||||||||||||
Team: | Team A | ||||||||||||
Sprint: | Sprint 37 | ||||||||||||
Story Points: | 0.125 |
Description |
Steps to reproduce:
I didn't test these exact steps, but I came across this problem. suddenly zabbix_server crashed and after restart it immediately crashes again. After thinking and double-checking the changes I made, and the 'DCexpression_expand_user_macros' in the backtrace, I noticed this typo. Removing the double 'G' fixed the issue, the server started again without issues. Result: Expected: |
Comments |
Comment by Glebs Ivanovskis [ 2018 Jun 21 ] |
Could you please attach the result of objdump -DSswx zabbix_server as well? |
Comment by Glebs Ivanovskis [ 2018 Jun 25 ] |
If you still have the log file with a crash, could you please show more of it? Starting with line that says "Crashing...", not just the backtrace, but also program counter, registries, etc. |
Comment by Andy Bertier [ 2018 Jun 25 ] |
Added extract from log starting from "Crashing..." till "server stopped" |
Comment by Glebs Ivanovskis [ 2018 Jun 25 ] |
Great! Thank you! 1221:20180611:095108.711 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ... 1221:20180611:095108.711 ====== Fatal information: ====== 1221:20180611:095108.711 Program counter: 0x561dc9d13ed6 1221:20180611:095108.711 === Registers: === 1221:20180611:095108.711 r8 = 100 = 256 = 256 1221:20180611:095108.711 r9 = 0 = 0 = 0 1221:20180611:095108.711 r10 = 2810 = 10256 = 10256 1221:20180611:095108.711 r11 = 217 = 535 = 535 1221:20180611:095108.711 r12 = 1 = 1 = 1 1221:20180611:095108.711 r13 = 561dc9de9d1c = 94685940849948 = 94685940849948 1221:20180611:095108.711 r14 = 0 = 0 = 0 1221:20180611:095108.711 r15 = 0 = 0 = 0 1221:20180611:095108.711 rdi = 561dcb340140 = 94685963223360 = 94685963223360 1221:20180611:095108.711 rsi = 561dc9de6e03 = 94685940837891 = 94685940837891 1221:20180611:095108.711 rbp = 7ffe417cf9d0 = 140729997130192 = 140729997130192 1221:20180611:095108.711 rbx = 1 = 1 = 1 1221:20180611:095108.711 rdx = 0 = 0 = 0 1221:20180611:095108.711 rax = 0 = 0 = 0 1221:20180611:095108.711 rcx = 0 = 0 = 0 1221:20180611:095108.712 rsp = 7ffe417cf950 = 140729997130064 = 140729997130064 1221:20180611:095108.712 rip = 561dc9d13ed6 = 94685939973846 = 94685939973846 1221:20180611:095108.712 efl = 10206 = 66054 = 66054 1221:20180611:095108.712 csgsfs = 33 = 51 = 51 1221:20180611:095108.712 err = 4 = 4 = 4 1221:20180611:095108.712 trapno = e = 14 = 14 1221:20180611:095108.712 oldmask = 0 = 0 = 0 1221:20180611:095108.712 cr2 = 0 = 0 = 0 1221:20180611:095108.712 === Backtrace: === 1221:20180611:095109.278 19: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](zbx_log_fatal_info+0x176) [0x561dc9d2c076] 1221:20180611:095109.278 18: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0xf7505) [0x561dc9d2c505] 1221:20180611:095109.278 17: /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f731142d4b0] 1221:20180611:095109.278 16: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0xdeed6) [0x561dc9d13ed6] 1221:20180611:095109.278 15: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](DCexpression_expand_user_macros+0x45) [0x561dc9d13f90] 1221:20180611:095109.278 14: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0xb27de) [0x561dc9ce77de] 1221:20180611:095109.278 13: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](substitute_simple_macros+0x4337) [0x561dc9cf0197] 1221:20180611:095109.278 12: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](substitute_simple_macros+0x2369) [0x561dc9cee1c9] 1221:20180611:095109.278 11: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0x7a089) [0x561dc9caf089] 1221:20180611:095109.278 10: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0x7cb26) [0x561dc9cb1b26] 1221:20180611:095109.278 9: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0x7e012) [0x561dc9cb3012] 1221:20180611:095109.278 8: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0x7eade) [0x561dc9cb3ade] 1221:20180611:095109.278 7: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0x7f70d) [0x561dc9cb470d] 1221:20180611:095109.278 6: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](escalator_thread+0x192) [0x561dc9cb4921] 1221:20180611:095109.278 5: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](zbx_thread_start+0x37) [0x561dc9d39b15] 1221:20180611:095109.278 4: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](MAIN_ZABBIX_ENTRY+0xa94) [0x561dc9c75679] 1221:20180611:095109.278 3: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](daemon_start+0x31f) [0x561dc9d2b6ce] 1221:20180611:095109.278 2: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](main+0x30a) [0x561dc9c74bcf] 1221:20180611:095109.278 1: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f7311418830] 1221:20180611:095109.278 0: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](_start+0x29) [0x561dc9c67e99] |
Comment by Glebs Ivanovskis [ 2018 Jun 26 ] |
Crash happens in this function, when someone passes NULL in error: static char *dc_expression_expand_user_macros(const char *expression, char **error) { zbx_vector_uint64_t functionids, hostids; char *out; zbx_vector_uint64_create(&functionids); zbx_vector_uint64_create(&hostids); get_functionids(&functionids, expression); zbx_dc_get_hostids_by_functionids(functionids.values, functionids.values_num, &hostids); out = zbx_dc_expand_user_macros(expression, hostids.values, hostids.values_num, dc_expression_user_macro_validator); if (NULL != strstr(out, "{$")) { *error = zbx_strdup(*error, "cannot evaluate expression: invalid macro value"); /* line 8778 in dbconfig.c */ zbx_free(out); } zbx_vector_uint64_destroy(&hostids); zbx_vector_uint64_destroy(&functionids); return out; } Here is the relevant piece of objdump with few annotations (according to System V AMD64 ABI calling conventions second parameter error is passed in %rsi register): dee3d: 55 push %rbp dee3e: 48 89 e5 mov %rsp,%rbp dee41: 48 83 c4 80 add $0xffffffffffffff80,%rsp dee45: 48 89 7d 88 mov %rdi,-0x78(%rbp) dee49: 48 89 75 80 mov %rsi,-0x80(%rbp) <--- %rsi contains <error>, store it into -0x80(%rbp) dee4d: 64 48 8b 04 25 28 00 00 00 mov %fs:0x28,%rax dee56: 48 89 45 f8 mov %rax,-0x8(%rbp) dee5a: 31 c0 xor %eax,%eax dee5c: 48 8d 45 a0 lea -0x60(%rbp),%rax dee60: 48 89 c7 mov %rax,%rdi dee63: e8 ad 26 02 00 callq 101515 <zbx_vector_uint64_create@@Base> dee68: 48 8d 45 d0 lea -0x30(%rbp),%rax dee6c: 48 89 c7 mov %rax,%rdi dee6f: e8 a1 26 02 00 callq 101515 <zbx_vector_uint64_create@@Base> dee74: 48 8b 55 88 mov -0x78(%rbp),%rdx dee78: 48 8d 45 a0 lea -0x60(%rbp),%rax dee7c: 48 89 d6 mov %rdx,%rsi dee7f: 48 89 c7 mov %rax,%rdi dee82: e8 15 37 fd ff callq b259c <get_functionids@@Base> dee87: 8b 4d a8 mov -0x58(%rbp),%ecx dee8a: 48 8b 45 a0 mov -0x60(%rbp),%rax dee8e: 48 8d 55 d0 lea -0x30(%rbp),%rdx dee92: 89 ce mov %ecx,%esi dee94: 48 89 c7 mov %rax,%rdi dee97: e8 64 17 00 00 callq e0600 <zbx_dc_get_hostids_by_functionids@@Base> dee9c: 8b 55 d8 mov -0x28(%rbp),%edx dee9f: 48 8b 75 d0 mov -0x30(%rbp),%rsi deea3: 48 8b 45 88 mov -0x78(%rbp),%rax deea7: 48 8d 0d 6c fc ff ff lea -0x394(%rip),%rcx # deb1a <DCget_user_macro@@Base+0x159> deeae: 48 89 c7 mov %rax,%rdi deeb1: e8 93 fc ff ff callq deb49 <zbx_dc_expand_user_macros@@Base> deeb6: 48 89 45 98 mov %rax,-0x68(%rbp) deeba: 48 8b 45 98 mov -0x68(%rbp),%rax deebe: 48 8d 35 3e 2f 0d 00 lea 0xd2f3e(%rip),%rsi # 1b1e03 <syslog_app_name@@Base+0x1c6a3> deec5: 48 89 c7 mov %rax,%rdi deec8: e8 a3 37 f5 ff callq 32670 <time@plt+0x480> deecd: 48 85 c0 test %rax,%rax deed0: 74 47 je def19 <zbx_dc_expand_user_macros@@Base+0x3d0> deed2: 48 8b 45 80 mov -0x80(%rbp),%rax <--- -0x80(%rbp) contains <error>, store it into %rax deed6: 48 8b 00 mov (%rax),%rax <--- crashing here while dereferencing <error> deed9: 48 8d 0d 28 2f 0d 00 lea 0xd2f28(%rip),%rcx # 1b1e08 <syslog_app_name@@Base+0x1c6a8> deee0: 48 89 c2 mov %rax,%rdx deee3: be 4a 22 00 00 mov $0x224a,%esi <--- LINE equals 8778 deee8: 48 8d 3d 21 16 0d 00 lea 0xd1621(%rip),%rdi # 1b0510 <syslog_app_name@@Base+0x1adb0> <--- FILE equals "dbconfig.c" deeef: e8 fb d5 02 00 callq 10c4ef <zbx_strdup2@@Base> deef4: 48 89 c2 mov %rax,%rdx deef7: 48 8b 45 80 mov -0x80(%rbp),%rax deefb: 48 89 10 mov %rdx,(%rax) deefe: 48 83 7d 98 00 cmpq $0x0,-0x68(%rbp) def03: 74 14 je def19 <zbx_dc_expand_user_macros@@Base+0x3d0> def05: 48 8b 45 98 mov -0x68(%rbp),%rax def09: 48 89 c7 mov %rax,%rdi def0c: e8 77 34 f5 ff callq 32388 <time@plt+0x198> def11: 48 c7 45 98 00 00 00 00 movq $0x0,-0x68(%rbp) def19: 48 8d 45 d0 lea -0x30(%rbp),%rax def1d: 48 89 c7 mov %rax,%rdi def20: e8 7f 26 02 00 callq 1015a4 <zbx_vector_uint64_destroy@@Base> def25: 48 8d 45 a0 lea -0x60(%rbp),%rax def29: 48 89 c7 mov %rax,%rdi def2c: e8 73 26 02 00 callq 1015a4 <zbx_vector_uint64_destroy@@Base> def31: 48 8b 45 98 mov -0x68(%rbp),%rax def35: 48 8b 4d f8 mov -0x8(%rbp),%rcx def39: 64 48 33 0c 25 28 00 00 00 xor %fs:0x28,%rcx def42: 74 05 je def49 <zbx_dc_expand_user_macros@@Base+0x400> def44: e8 87 35 f5 ff callq 324d0 <time@plt+0x2e0> def49: c9 leaveq def4a: c3 retq From here we need to find out who and when passes NULL instead of valid error and whether crash is reproducible in latest version (so far I've been looking into 3.4.2 sources). |
Comment by Glebs Ivanovskis [ 2018 Jun 26 ] |
Hm, it is always called like this...
if (NULL != (expression_ex = DCexpression_expand_user_macros(expression, NULL)))
Seems that latest |
Comment by Andy Bertier [ 2018 Jun 27 ] |
Wow, nice crash analysis (you read asm like I watch movies ) If it's always called like this, crashes should be more frequent on invalid user macro's I guess. I see the usage is still different from the function itself, but maybe it's just an unimportant 'wrapping'. DCexpression_expand_user_macros vs dc_expression_expand_user_macros Another thing I see: the 'out' variable is returned, but in case of valid error there was a call 'zbx_free(out)'. I hope zbx_free is also setting it's argument (passed as reference?) to NULL. Otherwise it can result in another access violation. (I don't have the zabbix sources with me, so cannot directly validate my assumptions) |
Comment by Glebs Ivanovskis [ 2018 Jun 27 ] |
Yes, this is a bit surprising for me that no one has complained about this for so long. To my understanding Zabbix server will crash whenever it needs to expand trigger expression containing non-numeric user macro. But if there is such a trigger, but Zabbix server does not need to recalculate it and does not need it for notifications, there will be no crashes - this could explain why no crashes were reported so far. Or everyone just gets their trigger expressions spot on!
Yes, DCexpression_expand_user_macros() is just a "public" wrapper of dc_expression_expand_user_macros() which needs configuration cache to be locked, but this can only be done from within dbconfig.c file.
It does so (thanks to some macro magic, not a reference). |
Comment by Glebs Ivanovskis [ 2018 Jun 27 ] |
Dear vso, I believe |
Comment by Vladislavs Sokurenko [ 2018 Jun 27 ] |
Yes, it was broken by 3.0.20rc1 is not affected as it has all the necessary NULL checks. if (NULL != error) *error = zbx_dsprintf(*error, "macro '%.*s' is not found", len, ptr); if (NULL != error) *error = zbx_dsprintf(*error, "macro '%s' value is not numeric", macro); Also it looks like only affect references for example $1 etc when used in internal notifications for triggers going to unsupported state. |
Comment by Vladislavs Sokurenko [ 2018 Jun 27 ] |
Fixed in development branch: |
Comment by Vladislavs Sokurenko [ 2018 Jun 29 ] |
Fixed in:
|