[ZBX-14500] zabbix_server crash on macro expansion with invalid value Created: 2018 Jun 19  Updated: 2024 Apr 10  Resolved: 2018 Jun 29

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.4.11, 4.0.0alpha8
Fix Version/s: 3.4.12rc1, 4.0.0alpha9, 4.0 (plan)

Type: Problem report Priority: Minor
Reporter: Andy Bertier Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 0
Labels: crash, macros, triggers
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

"zabbix appliance image (Ubuntu)" on HyperV
/usr/sbin/zabbix_server --version
zabbix_server (Zabbix) 3.4.2
Revision 72885 25 September 2017, compilation time: Sep 26 2017 11:11:28


Attachments: File objdump_DSswx.gz     Text File zabbix_server_backtrace.log     Text File zabbix_server_logAtCrash.log    
Issue Links:
Causes
caused by ZBXNEXT-3659 More efficient sync of configuration ... Closed
caused by ZBXNEXT-3210 Replace user macro cache with caching... Closed
Team: Team A
Sprint: Sprint 37
Story Points: 0.125

 Description   

Steps to reproduce:

  1. Have some process discovery and item prototypes for monitoring memory usage
  2. Have a macro {$PROC_MAX_MEMORY:someProcessName} with value '1G'
  3. Have some trigger(prototype) using an expression containing the context-macro ie. '{discovered item} > {$PROC_MAX_MEMORY:"{#PROCNAME}"}'
  4. Change the amount of allowed memoryUsage and make some typo by changing the value to '1GG'

I didn't test these exact steps, but I came across this problem. suddenly zabbix_server crashed and after restart it immediately crashes again. After thinking and double-checking the changes I made, and the 'DCexpression_expand_user_macros' in the backtrace, I noticed this typo. Removing the double 'G' fixed the issue, the server started again without issues.

Result:
zabbix_server crashes
See backtrace (attachment)

Expected:
Some 'Not Supported' state with some indication there is an issue with a trigger expression



 Comments   
Comment by Glebs Ivanovskis [ 2018 Jun 21 ]

Could you please attach the result of

objdump -DSswx zabbix_server

as well?

Comment by Glebs Ivanovskis [ 2018 Jun 25 ]

If you still have the log file with a crash, could you please show more of it? Starting with line that says "Crashing...", not just the backtrace, but also program counter, registries, etc.

Comment by Andy Bertier [ 2018 Jun 25 ]

Added extract from log starting from "Crashing..." till "server stopped"

Comment by Glebs Ivanovskis [ 2018 Jun 25 ]

Great! Thank you!
Putting it here for easier searching:

  1221:20180611:095108.711 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ...
  1221:20180611:095108.711 ====== Fatal information: ======
  1221:20180611:095108.711 Program counter: 0x561dc9d13ed6
  1221:20180611:095108.711 === Registers: ===
  1221:20180611:095108.711 r8      =              100 =                  256 =                  256
  1221:20180611:095108.711 r9      =                0 =                    0 =                    0
  1221:20180611:095108.711 r10     =             2810 =                10256 =                10256
  1221:20180611:095108.711 r11     =              217 =                  535 =                  535
  1221:20180611:095108.711 r12     =                1 =                    1 =                    1
  1221:20180611:095108.711 r13     =     561dc9de9d1c =       94685940849948 =       94685940849948
  1221:20180611:095108.711 r14     =                0 =                    0 =                    0
  1221:20180611:095108.711 r15     =                0 =                    0 =                    0
  1221:20180611:095108.711 rdi     =     561dcb340140 =       94685963223360 =       94685963223360
  1221:20180611:095108.711 rsi     =     561dc9de6e03 =       94685940837891 =       94685940837891
  1221:20180611:095108.711 rbp     =     7ffe417cf9d0 =      140729997130192 =      140729997130192
  1221:20180611:095108.711 rbx     =                1 =                    1 =                    1
  1221:20180611:095108.711 rdx     =                0 =                    0 =                    0
  1221:20180611:095108.711 rax     =                0 =                    0 =                    0
  1221:20180611:095108.711 rcx     =                0 =                    0 =                    0
  1221:20180611:095108.712 rsp     =     7ffe417cf950 =      140729997130064 =      140729997130064
  1221:20180611:095108.712 rip     =     561dc9d13ed6 =       94685939973846 =       94685939973846
  1221:20180611:095108.712 efl     =            10206 =                66054 =                66054
  1221:20180611:095108.712 csgsfs  =               33 =                   51 =                   51
  1221:20180611:095108.712 err     =                4 =                    4 =                    4
  1221:20180611:095108.712 trapno  =                e =                   14 =                   14
  1221:20180611:095108.712 oldmask =                0 =                    0 =                    0
  1221:20180611:095108.712 cr2     =                0 =                    0 =                    0
  1221:20180611:095108.712 === Backtrace: ===
  1221:20180611:095109.278 19: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](zbx_log_fatal_info+0x176) [0x561dc9d2c076]
  1221:20180611:095109.278 18: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0xf7505) [0x561dc9d2c505]
  1221:20180611:095109.278 17: /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f731142d4b0]
  1221:20180611:095109.278 16: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0xdeed6) [0x561dc9d13ed6]
  1221:20180611:095109.278 15: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](DCexpression_expand_user_macros+0x45) [0x561dc9d13f90]
  1221:20180611:095109.278 14: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0xb27de) [0x561dc9ce77de]
  1221:20180611:095109.278 13: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](substitute_simple_macros+0x4337) [0x561dc9cf0197]
  1221:20180611:095109.278 12: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](substitute_simple_macros+0x2369) [0x561dc9cee1c9]
  1221:20180611:095109.278 11: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0x7a089) [0x561dc9caf089]
  1221:20180611:095109.278 10: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0x7cb26) [0x561dc9cb1b26]
  1221:20180611:095109.278 9: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0x7e012) [0x561dc9cb3012]
  1221:20180611:095109.278 8: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0x7eade) [0x561dc9cb3ade]
  1221:20180611:095109.278 7: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](+0x7f70d) [0x561dc9cb470d]
  1221:20180611:095109.278 6: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](escalator_thread+0x192) [0x561dc9cb4921]
  1221:20180611:095109.278 5: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](zbx_thread_start+0x37) [0x561dc9d39b15]
  1221:20180611:095109.278 4: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](MAIN_ZABBIX_ENTRY+0xa94) [0x561dc9c75679]
  1221:20180611:095109.278 3: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](daemon_start+0x31f) [0x561dc9d2b6ce]
  1221:20180611:095109.278 2: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](main+0x30a) [0x561dc9c74bcf]
  1221:20180611:095109.278 1: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f7311418830]
  1221:20180611:095109.278 0: /usr/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.010709 sec, processing escalations](_start+0x29) [0x561dc9c67e99]
Comment by Glebs Ivanovskis [ 2018 Jun 26 ]

Crash happens in this function, when someone passes NULL in error:

static char	*dc_expression_expand_user_macros(const char *expression, char **error)
{
	zbx_vector_uint64_t	functionids, hostids;
	char			*out;

	zbx_vector_uint64_create(&functionids);
	zbx_vector_uint64_create(&hostids);

	get_functionids(&functionids, expression);
	zbx_dc_get_hostids_by_functionids(functionids.values, functionids.values_num, &hostids);

	out = zbx_dc_expand_user_macros(expression, hostids.values, hostids.values_num,
			dc_expression_user_macro_validator);

	if (NULL != strstr(out, "{$"))
	{
		*error = zbx_strdup(*error, "cannot evaluate expression: invalid macro value");	/* line 8778 in dbconfig.c */
		zbx_free(out);
	}

	zbx_vector_uint64_destroy(&hostids);
	zbx_vector_uint64_destroy(&functionids);

	return out;
}

Here is the relevant piece of objdump with few annotations (according to System V AMD64 ABI calling conventions second parameter error is passed in %rsi register):

   dee3d:	55                   	push   %rbp
   dee3e:	48 89 e5             	mov    %rsp,%rbp
   dee41:	48 83 c4 80          	add    $0xffffffffffffff80,%rsp
   dee45:	48 89 7d 88          	mov    %rdi,-0x78(%rbp)
   dee49:	48 89 75 80          	mov    %rsi,-0x80(%rbp)								<--- %rsi contains <error>, store it into -0x80(%rbp)
   dee4d:	64 48 8b 04 25 28 00 00 00 	mov    %fs:0x28,%rax
   dee56:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
   dee5a:	31 c0                	xor    %eax,%eax
   dee5c:	48 8d 45 a0          	lea    -0x60(%rbp),%rax
   dee60:	48 89 c7             	mov    %rax,%rdi
   dee63:	e8 ad 26 02 00       	callq  101515 <zbx_vector_uint64_create@@Base>
   dee68:	48 8d 45 d0          	lea    -0x30(%rbp),%rax
   dee6c:	48 89 c7             	mov    %rax,%rdi
   dee6f:	e8 a1 26 02 00       	callq  101515 <zbx_vector_uint64_create@@Base>
   dee74:	48 8b 55 88          	mov    -0x78(%rbp),%rdx
   dee78:	48 8d 45 a0          	lea    -0x60(%rbp),%rax
   dee7c:	48 89 d6             	mov    %rdx,%rsi
   dee7f:	48 89 c7             	mov    %rax,%rdi
   dee82:	e8 15 37 fd ff       	callq  b259c <get_functionids@@Base>
   dee87:	8b 4d a8             	mov    -0x58(%rbp),%ecx
   dee8a:	48 8b 45 a0          	mov    -0x60(%rbp),%rax
   dee8e:	48 8d 55 d0          	lea    -0x30(%rbp),%rdx
   dee92:	89 ce                	mov    %ecx,%esi
   dee94:	48 89 c7             	mov    %rax,%rdi
   dee97:	e8 64 17 00 00       	callq  e0600 <zbx_dc_get_hostids_by_functionids@@Base>
   dee9c:	8b 55 d8             	mov    -0x28(%rbp),%edx
   dee9f:	48 8b 75 d0          	mov    -0x30(%rbp),%rsi
   deea3:	48 8b 45 88          	mov    -0x78(%rbp),%rax
   deea7:	48 8d 0d 6c fc ff ff 	lea    -0x394(%rip),%rcx        # deb1a <DCget_user_macro@@Base+0x159>
   deeae:	48 89 c7             	mov    %rax,%rdi
   deeb1:	e8 93 fc ff ff       	callq  deb49 <zbx_dc_expand_user_macros@@Base>
   deeb6:	48 89 45 98          	mov    %rax,-0x68(%rbp)
   deeba:	48 8b 45 98          	mov    -0x68(%rbp),%rax
   deebe:	48 8d 35 3e 2f 0d 00 	lea    0xd2f3e(%rip),%rsi        # 1b1e03 <syslog_app_name@@Base+0x1c6a3>
   deec5:	48 89 c7             	mov    %rax,%rdi
   deec8:	e8 a3 37 f5 ff       	callq  32670 <time@plt+0x480>
   deecd:	48 85 c0             	test   %rax,%rax
   deed0:	74 47                	je     def19 <zbx_dc_expand_user_macros@@Base+0x3d0>
   deed2:	48 8b 45 80          	mov    -0x80(%rbp),%rax								<--- -0x80(%rbp) contains <error>, store it into %rax
   deed6:	48 8b 00             	mov    (%rax),%rax								<--- crashing here while dereferencing <error>
   deed9:	48 8d 0d 28 2f 0d 00 	lea    0xd2f28(%rip),%rcx        # 1b1e08 <syslog_app_name@@Base+0x1c6a8>
   deee0:	48 89 c2             	mov    %rax,%rdx
   deee3:	be 4a 22 00 00       	mov    $0x224a,%esi								<--- LINE equals 8778
   deee8:	48 8d 3d 21 16 0d 00 	lea    0xd1621(%rip),%rdi        # 1b0510 <syslog_app_name@@Base+0x1adb0>	<--- FILE equals "dbconfig.c"
   deeef:	e8 fb d5 02 00       	callq  10c4ef <zbx_strdup2@@Base>
   deef4:	48 89 c2             	mov    %rax,%rdx
   deef7:	48 8b 45 80          	mov    -0x80(%rbp),%rax
   deefb:	48 89 10             	mov    %rdx,(%rax)
   deefe:	48 83 7d 98 00       	cmpq   $0x0,-0x68(%rbp)
   def03:	74 14                	je     def19 <zbx_dc_expand_user_macros@@Base+0x3d0>
   def05:	48 8b 45 98          	mov    -0x68(%rbp),%rax
   def09:	48 89 c7             	mov    %rax,%rdi
   def0c:	e8 77 34 f5 ff       	callq  32388 <time@plt+0x198>
   def11:	48 c7 45 98 00 00 00 00 	movq   $0x0,-0x68(%rbp)
   def19:	48 8d 45 d0          	lea    -0x30(%rbp),%rax
   def1d:	48 89 c7             	mov    %rax,%rdi
   def20:	e8 7f 26 02 00       	callq  1015a4 <zbx_vector_uint64_destroy@@Base>
   def25:	48 8d 45 a0          	lea    -0x60(%rbp),%rax
   def29:	48 89 c7             	mov    %rax,%rdi
   def2c:	e8 73 26 02 00       	callq  1015a4 <zbx_vector_uint64_destroy@@Base>
   def31:	48 8b 45 98          	mov    -0x68(%rbp),%rax
   def35:	48 8b 4d f8          	mov    -0x8(%rbp),%rcx
   def39:	64 48 33 0c 25 28 00 00 00 	xor    %fs:0x28,%rcx
   def42:	74 05                	je     def49 <zbx_dc_expand_user_macros@@Base+0x400>
   def44:	e8 87 35 f5 ff       	callq  324d0 <time@plt+0x2e0>
   def49:	c9                   	leaveq 
   def4a:	c3                   	retq   

From here we need to find out who and when passes NULL instead of valid error and whether crash is reproducible in latest version (so far I've been looking into 3.4.2 sources).

Comment by Glebs Ivanovskis [ 2018 Jun 26 ]

Hm, it is always called like this...

	if (NULL != (expression_ex = DCexpression_expand_user_macros(expression, NULL)))

Seems that latest 3.0, 3.4 and trunk are all affected.

Comment by Andy Bertier [ 2018 Jun 27 ]

Wow, nice crash analysis (you read asm like I watch movies )

If it's always called like this, crashes should be more frequent on invalid user macro's I guess. I see the usage is still different from the function itself, but maybe it's just an unimportant 'wrapping'.

DCexpression_expand_user_macros vs dc_expression_expand_user_macros

Another thing I see: the 'out' variable is returned, but in case of valid error there was a call 'zbx_free(out)'. I hope zbx_free is also setting it's argument (passed as reference?) to NULL. Otherwise it can result in another access violation. (I don't have the zabbix sources with me, so cannot directly validate my assumptions)

Comment by Glebs Ivanovskis [ 2018 Jun 27 ]

If it's always called like this, crashes should be more frequent on invalid user macro's I guess.

Yes, this is a bit surprising for me that no one has complained about this for so long. To my understanding Zabbix server will crash whenever it needs to expand trigger expression containing non-numeric user macro. But if there is such a trigger, but Zabbix server does not need to recalculate it and does not need it for notifications, there will be no crashes - this could explain why no crashes were reported so far. Or everyone just gets their trigger expressions spot on!

maybe it's just an unimportant 'wrapping'.

Yes, DCexpression_expand_user_macros() is just a "public" wrapper of dc_expression_expand_user_macros() which needs configuration cache to be locked, but this can only be done from within dbconfig.c file.

I hope zbx_free() is also setting it's argument (passed as reference?) to NULL.

It does so (thanks to some macro magic, not a reference).

Comment by Glebs Ivanovskis [ 2018 Jun 27 ]

Dear vso, I believe ZBXNEXT-3210 is to blame.

Comment by Vladislavs Sokurenko [ 2018 Jun 27 ]

Yes, it was broken by ZBXNEXT-3210 but later fixed and now broken again in ZBXNEXT-3659.

3.0.20rc1 is not affected as it has all the necessary NULL checks.

	if (NULL != error)
		*error = zbx_dsprintf(*error, "macro '%.*s' is not found", len, ptr);
	if (NULL != error)
		*error = zbx_dsprintf(*error, "macro '%s' value is not numeric", macro);

Also it looks like only affect references for example $1 etc when used in internal notifications for triggers going to unsupported state.

Comment by Vladislavs Sokurenko [ 2018 Jun 27 ]

Fixed in development branch:
svn://svn.zabbix.com/branches/dev/ZBX-14500

Comment by Vladislavs Sokurenko [ 2018 Jun 29 ]

Fixed in:

  • pre-3.4.12rc1 r82357
  • pre-4.0.0alpha9 (trunk) r82358
Generated at Fri Apr 26 08:36:58 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.