[ZBX-10239] agent collector crashes in zbx_proc_get_matching_pids() Created: 2016 Jan 07  Updated: 2017 May 30  Resolved: 2016 Jan 19

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 3.0.0alpha5
Fix Version/s: 3.0.0beta1

Type: Incident report Priority: Blocker
Reporter: Aleksandrs Saveljevs Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: collector, crash, proc.cpu.util
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File zabbix_agentd.txt.xz    

 Description   

Current r57474 of trunk crashes with the following backtrace:

  9024:20160107:182705.365 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ...
  9024:20160107:182705.365 ====== Fatal information: ======
  9024:20160107:182705.365 Program counter: 0x7eff3af8b684
  9024:20160107:182705.365 === Registers: ===
  9024:20160107:182705.365 r8      =           442900 =              4466944 =              4466944
  9024:20160107:182705.365 r9      =          1558ed0 =             22384336 =             22384336
  9024:20160107:182705.365 r10     =               10 =                   16 =                   16
  9024:20160107:182705.365 r11     =              206 =                  518 =                  518
  9024:20160107:182705.365 r12     =          1530510 =             22218000 =             22218000
  9024:20160107:182705.365 r13     =          155b520 =             22394144 =             22394144
  9024:20160107:182705.365 r14     =     7eff3bb2afb8 =      139634683326392 =      139634683326392
  9024:20160107:182705.365 r15     =                0 =                    0 =                    0
  9024:20160107:182705.365 rdi     =     7eff3b2b2620 =      139634674443808 =      139634674443808
  9024:20160107:182705.365 rsi     =          1552620 =             22357536 =             22357536
  9024:20160107:182705.365 rbp     =               22 =                   34 =                   34
  9024:20160107:182705.365 rbx     =               23 =                   35 =                   35
  9024:20160107:182705.365 rdx     =              220 =                  544 =                  544
  9024:20160107:182705.365 rax     =                0 =                    0 =                    0
  9024:20160107:182705.365 rcx     =          155b0e0 =             22393056 =             22393056
  9024:20160107:182705.365 rsp     =     7fffb1ae1068 =      140736174362728 =      140736174362728
  9024:20160107:182705.365 rip     =     7eff3af8b684 =      139634671138436 =      139634671138436
  9024:20160107:182705.365 efl     =            10246 =                66118 =                66118
  9024:20160107:182705.365 csgsfs  =               33 =                   51 =                   51
  9024:20160107:182705.365 err     =                4 =                    4 =                    4
  9024:20160107:182705.365 trapno  =                e =                   14 =                   14
  9024:20160107:182705.365 oldmask =                0 =                    0 =                    0
  9024:20160107:182705.365 cr2     =                0 =                    0 =                    0
  9024:20160107:182705.365 === Backtrace: ===
  9024:20160107:182705.366 15: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](print_fatal_info+0x9e) [0x42879e]
  9024:20160107:182705.366 14: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data]() [0x4289d2]
  9024:20160107:182705.366 13: /lib/x86_64-linux-gnu/libc.so.6(+0x35180) [0x7eff3af44180]
  9024:20160107:182705.366 12: /lib/x86_64-linux-gnu/libc.so.6(cfree+0x34) [0x7eff3af8b684]
  9024:20160107:182705.366 11: /lib/x86_64-linux-gnu/libc.so.6(+0xc1100) [0x7eff3afd0100]
  9024:20160107:182705.366 10: /lib/x86_64-linux-gnu/libc.so.6(regfree+0x11) [0x7eff3afdc421]
  9024:20160107:182705.366 9: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data]() [0x424672]
  9024:20160107:182705.366 8: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](zbx_proc_get_matching_pids+0x127) [0x41a2f7]
  9024:20160107:182705.366 7: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](zbx_procstat_collect+0x3a9) [0x412a69]
  9024:20160107:182705.366 6: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](collector_thread+0x98) [0x40dea8]
  9024:20160107:182705.366 5: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](zbx_thread_start+0x45) [0x427535]
  9024:20160107:182705.366 4: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](MAIN_ZABBIX_ENTRY+0x1a6) [0x414146]
  9024:20160107:182705.366 3: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](daemon_start+0x185) [0x428115]
  9024:20160107:182705.366 2: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](main+0x91) [0x40b161]
  9024:20160107:182705.366 1: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7eff3af30b45]
  9024:20160107:182705.366 0: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data]() [0x40b266]
  9024:20160107:182705.366 === Memory map: ===
  9024:20160107:182705.366 00400000-00450000 r-xp 00000000 08:02 689264                             /home/zabbix/zabbix-bin/sbin/zabbix_agentd
  9024:20160107:182705.366 0064f000-00651000 rw-p 0004f000 08:02 689264                             /home/zabbix/zabbix-bin/sbin/zabbix_agentd
  9024:20160107:182705.366 00651000-00657000 rw-p 00000000 00:00 0 
  9024:20160107:182705.366 01512000-01533000 rw-p 00000000 00:00 0                                  [heap]
  9024:20160107:182705.366 01533000-01586000 rw-p 00000000 00:00 0                                  [heap]
  9024:20160107:182705.366 7eff3acf9000-7eff3ad0f000 r-xp 00000000 08:02 6293377                    /lib/x86_64-linux-gnu/libgcc_s.so.1
  9024:20160107:182705.366 7eff3ad0f000-7eff3af0e000 ---p 00016000 08:02 6293377                    /lib/x86_64-linux-gnu/libgcc_s.so.1
  9024:20160107:182705.366 7eff3af0e000-7eff3af0f000 rw-p 00015000 08:02 6293377                    /lib/x86_64-linux-gnu/libgcc_s.so.1
  9024:20160107:182705.366 7eff3af0f000-7eff3b0ae000 r-xp 00000000 08:02 6293657                    /lib/x86_64-linux-gnu/libc-2.19.so
  9024:20160107:182705.366 7eff3b0ae000-7eff3b2ae000 ---p 0019f000 08:02 6293657                    /lib/x86_64-linux-gnu/libc-2.19.so
  9024:20160107:182705.366 7eff3b2ae000-7eff3b2b2000 r--p 0019f000 08:02 6293657                    /lib/x86_64-linux-gnu/libc-2.19.so
  9024:20160107:182705.366 7eff3b2b2000-7eff3b2b4000 rw-p 001a3000 08:02 6293657                    /lib/x86_64-linux-gnu/libc-2.19.so
  9024:20160107:182705.366 7eff3b2b4000-7eff3b2b8000 rw-p 00000000 00:00 0 
  9024:20160107:182705.366 7eff3b2b8000-7eff3b2cc000 r-xp 00000000 08:02 6298741                    /lib/x86_64-linux-gnu/libresolv-2.19.so
  9024:20160107:182705.366 7eff3b2cc000-7eff3b4cb000 ---p 00014000 08:02 6298741                    /lib/x86_64-linux-gnu/libresolv-2.19.so
  9024:20160107:182705.366 7eff3b4cb000-7eff3b4cc000 r--p 00013000 08:02 6298741                    /lib/x86_64-linux-gnu/libresolv-2.19.so
  9024:20160107:182705.366 7eff3b4cc000-7eff3b4cd000 rw-p 00014000 08:02 6298741                    /lib/x86_64-linux-gnu/libresolv-2.19.so
  9024:20160107:182705.366 7eff3b4cd000-7eff3b4cf000 rw-p 00000000 00:00 0 
  9024:20160107:182705.366 7eff3b4cf000-7eff3b4d2000 r-xp 00000000 08:02 6296407                    /lib/x86_64-linux-gnu/libdl-2.19.so
  9024:20160107:182705.366 7eff3b4d2000-7eff3b6d1000 ---p 00003000 08:02 6296407                    /lib/x86_64-linux-gnu/libdl-2.19.so
  9024:20160107:182705.367 7eff3b6d1000-7eff3b6d2000 r--p 00002000 08:02 6296407                    /lib/x86_64-linux-gnu/libdl-2.19.so
  9024:20160107:182705.367 7eff3b6d2000-7eff3b6d3000 rw-p 00003000 08:02 6296407                    /lib/x86_64-linux-gnu/libdl-2.19.so
  9024:20160107:182705.367 7eff3b6d3000-7eff3b7d3000 r-xp 00000000 08:02 6297828                    /lib/x86_64-linux-gnu/libm-2.19.so
  9024:20160107:182705.367 7eff3b7d3000-7eff3b9d2000 ---p 00100000 08:02 6297828                    /lib/x86_64-linux-gnu/libm-2.19.so
  9024:20160107:182705.367 7eff3b9d2000-7eff3b9d3000 r--p 000ff000 08:02 6297828                    /lib/x86_64-linux-gnu/libm-2.19.so
  9024:20160107:182705.367 7eff3b9d3000-7eff3b9d4000 rw-p 00100000 08:02 6297828                    /lib/x86_64-linux-gnu/libm-2.19.so
  9024:20160107:182705.367 7eff3b9d4000-7eff3b9f4000 r-xp 00000000 08:02 6292412                    /lib/x86_64-linux-gnu/ld-2.19.so
  9024:20160107:182705.367 7eff3bad6000-7eff3bb2b000 rw-s 00000000 00:04 12222524                   /SYSV70026aa5 (deleted)
  9024:20160107:182705.367 7eff3bb2b000-7eff3bbcc000 rw-s 00000000 00:04 10453006                   /SYSV6c026aa5 (deleted)
  9024:20160107:182705.367 7eff3bbcc000-7eff3bbd0000 rw-p 00000000 00:00 0 
  9024:20160107:182705.367 7eff3bbf0000-7eff3bbf1000 rw-p 00000000 00:00 0 
  9024:20160107:182705.367 7eff3bbf1000-7eff3bbf2000 rw-p 00000000 00:00 0 
  9024:20160107:182705.367 7eff3bbf2000-7eff3bbf4000 rw-p 00000000 00:00 0 
  9024:20160107:182705.367 7eff3bbf4000-7eff3bbf5000 r--p 00020000 08:02 6292412                    /lib/x86_64-linux-gnu/ld-2.19.so
  9024:20160107:182705.367 7eff3bbf5000-7eff3bbf6000 rw-p 00021000 08:02 6292412                    /lib/x86_64-linux-gnu/ld-2.19.so
  9024:20160107:182705.367 7eff3bbf6000-7eff3bbf7000 rw-p 00000000 00:00 0 
  9024:20160107:182705.367 7fffb1ac2000-7fffb1ae3000 rw-p 00000000 00:00 0                          [stack]
  9024:20160107:182705.367 7fffb1b4e000-7fffb1b50000 r-xp 00000000 00:00 0                          [vdso]
  9024:20160107:182705.367 7fffb1b50000-7fffb1b52000 r--p 00000000 00:00 0                          [vvar]
  9024:20160107:182705.367 ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
  9024:20160107:182705.367 ================================
  9024:20160107:182705.367 Please consider attaching a disassembly listing to your bug report.
  9024:20160107:182705.367 This listing can be produced with, e.g., objdump -DSswx zabbix_agentd.
  9024:20160107:182705.367 ================================
  9023:20160107:182705.367 One child process died (PID:9024,exitcode/signal:1). Exiting ...
zabbix_agentd [9023]: Error on thread waiting.
  9023:20160107:182705.368 Zabbix Agent stopped. Zabbix 3.0.0alpha6 (revision {ZABBIX_REVISION}).


 Comments   
Comment by Aleksandrs Saveljevs [ 2016 Jan 08 ]

Today it crashed with the following error:

 28055:20160108:100131.762 In update_cpustats()
 28055:20160108:100131.762 End of update_cpustats()
 28055:20160108:100131.767 __zbx_zbx_setproctitle() title:'collector [idle 1 sec]'
 28055:20160108:100132.767 __zbx_zbx_setproctitle() title:'collector [processing data]'
 28055:20160108:100132.767 In update_cpustats()
 28055:20160108:100132.768 End of update_cpustats()
*** Error in `/home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data]': free(): invalid next size (normal): 0x00000000008ed350 ***
 28054:20160108:100132.772 One child process died (PID:28055,exitcode/signal:6). Exiting ...
 28054:20160108:100132.772 Zabbix Agent stopped. Zabbix 3.0.0alpha6 (revision {ZABBIX_REVISION}).
Comment by Andris Zeila [ 2016 Jan 08 ]

(1) While not related to this crash, valgrind check discovered another problem.

==24311== Conditional jump or move depends on uninitialised value(s)
==24311==    at 0x580E67A: vfprintf (vfprintf.c:1641)
==24311==    by 0x5815D46: fprintf (fprintf.c:32)
==24311==    by 0x42FB3A: __zbx_zabbix_log (log.c:436)
==24311==    by 0x40BCA2: parse_list_of_checks (active.c:297)
==24311==    by 0x40C845: refresh_active_checks (active.c:597)
==24311==    by 0x40E3EC: active_checks_thread (active.c:1602)
==24311==    by 0x434154: zbx_thread_start (threads.c:127)
==24311==    by 0x417E6D: MAIN_ZABBIX_ENTRY (zabbix_agentd.c:917)
==24311==    by 0x43516F: daemon_start (daemon.c:383)
==24311==    by 0x418299: main (zabbix_agentd.c:1149)
==24311==  Uninitialised value was created by a stack allocation
==24311==    at 0x42FAA3: __zbx_zabbix_log (log.c:436)

This happens because in log.c:get_time(struct tm **tm, long *milliseconds) we are using stack variable when calling localtime_r() function and then returning reference to this variable.

Compiler options:

  Compiler:              gcc
  Compiler flags:         -g -Wall -Wextra -Wno-missing-field-initializers 		 -Wno-unused-parameter -Wdeclaration-after-statement -Wpointer-arith -Wempty-body 		 -Wno-error=sign-compare -Wno-error=unused-variable 		 -Wno-error=pointer-sign -Wno-error=uninitialized 		 -I/home/wiper/git/zabbix  -I/usr/include/mysql -DBIG_JOINS=1  -fno-strict-aliasing    -g -DNDEBUG     -I/usr/include/libxml2      

sandis.neilands This issue was introduced with correction for ZBX-6028.

sandis.neilands RESOLVED in r57543.

wiper CLOSED

Comment by Sandis Neilands (Inactive) [ 2016 Jan 08 ]

In proc_get_process_cmdline() when reading command line from /proc/<pid>/cmdline we must add terminating '\0' to the *cmdline. Turns out that the command line is not always terminated by '\0' in the cmdline file.

The relevant output from Valgrind.

==20319== Conditional jump or move depends on uninitialised value(s)
==20319==    at 0x4C2FD7E: __GI_memcpy (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20319==    by 0x44A869: zbx_strdup2 (misc.c:369)
==20319==    by 0x427048: proc_create (proc.c:1173)
==20319==    by 0x426E4D: zbx_proc_get_processes (proc.c:1243)
==20319==    by 0x419856: zbx_procstat_collect (procstat.c:1158)
==20319==    by 0x411847: collector_thread (stats.c:457)
==20319==    by 0x4404C0: zbx_thread_start (threads.c:127)
==20319==    by 0x41AE39: MAIN_ZABBIX_ENTRY (zabbix_agentd.c:909)
==20319==    by 0x440EDC: daemon_start (daemon.c:383)
==20319==    by 0x41B3CF: main (zabbix_agentd.c:1149)
==20319==  Uninitialised value was created by a heap allocation
==20319==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20319==    by 0x44A69A: zbx_malloc2 (misc.c:318)
==20319==    by 0x427DA8: proc_get_process_cmdline (proc.c:891)
==20319==    by 0x426F61: proc_create (proc.c:1156)
==20319==    by 0x426E4D: zbx_proc_get_processes (proc.c:1243)
==20319==    by 0x419856: zbx_procstat_collect (procstat.c:1158)
==20319==    by 0x411847: collector_thread (stats.c:457)
==20319==    by 0x4404C0: zbx_thread_start (threads.c:127)
==20319==    by 0x41AE39: MAIN_ZABBIX_ENTRY (zabbix_agentd.c:909)
==20319==    by 0x440EDC: daemon_start (daemon.c:383)
==20319==    by 0x41B3CF: main (zabbix_agentd.c:1149)

And Valgrind output related to the second case reported by asaveljevs.

==20319== Conditional jump or move depends on uninitialised value(s)
==20319==    at 0x65962B2: re_string_context_at (regex_internal.c:958)
==20319==    by 0x65962B2: re_string_reconstruct (regex_internal.c:671)
==20319==    by 0x659BBA2: re_search_internal (regexec.c:815)
==20319==    by 0x65A2D84: regexec@@GLIBC_2.3.4 (regexec.c:253)
==20319==    by 0x43B42C: zbx_regexp (zbxregexp.c:68)
==20319==    by 0x43B2A9: zbx_regexp_match (zbxregexp.c:81)
==20319==    by 0x4275B2: proc_match_cmdline (proc.c:1103)
==20319==    by 0x4273C7: zbx_proc_get_matching_pids (proc.c:1320)
==20319==    by 0x419DB5: procstat_scan_query_pids (procstat.c:731)
==20319==    by 0x419875: zbx_procstat_collect (procstat.c:1161)
==20319==    by 0x411847: collector_thread (stats.c:457)
==20319==    by 0x4404C0: zbx_thread_start (threads.c:127)
==20319==    by 0x41AE39: MAIN_ZABBIX_ENTRY (zabbix_agentd.c:909)
==20319==  Uninitialised value was created by a heap allocation
==20319==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20319==    by 0x44A69A: zbx_malloc2 (misc.c:318)
==20319==    by 0x427DA8: proc_get_process_cmdline (proc.c:891)
==20319==    by 0x426F61: proc_create (proc.c:1156)
==20319==    by 0x426E4D: zbx_proc_get_processes (proc.c:1243)
==20319==    by 0x419856: zbx_procstat_collect (procstat.c:1158)
==20319==    by 0x411847: collector_thread (stats.c:457)
==20319==    by 0x4404C0: zbx_thread_start (threads.c:127)
==20319==    by 0x41AE39: MAIN_ZABBIX_ENTRY (zabbix_agentd.c:909)
==20319==    by 0x440EDC: daemon_start (daemon.c:383)
==20319==    by 0x41B3CF: main (zabbix_agentd.c:1149)
Comment by Sandis Neilands (Inactive) [ 2016 Jan 11 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-10239

Comment by Sandis Neilands (Inactive) [ 2016 Jan 12 ]

(2) While checking for similar issues on Solaris found that we are not including <zone.h>.

...
#include <procfs.h>
 
/* HAVE_ZONE_H not defined yet so header is not included */
#ifdef HAVE_ZONE_H
#	include <zone.h>
#endif

#include "common.h" /* HAVE_ZONE_H defined in sysinc.h --> config,h */
#include "sysinfo.h"
#include "zbxregexp.h"

...
RESOLVED in r57598, r57610.

wiper CLOSED

Comment by Sandis Neilands (Inactive) [ 2016 Jan 13 ]

(3) Our documentation was slightly incomplete regarding examining process' titles on Solaris 11. Thanks andris for fixing that! CLOSED.

Comment by Sandis Neilands (Inactive) [ 2016 Jan 18 ]

Released in:

  • pre-3.0.0beta1 r57745.
Generated at Fri Apr 19 12:03:33 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.