[ZBX-10239] agent collector crashes in zbx_proc_get_matching_pids() Created: 2016 Jan 07 Updated: 2017 May 30 Resolved: 2016 Jan 19 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 3.0.0alpha5 |
Fix Version/s: | 3.0.0beta1 |
Type: | Incident report | Priority: | Blocker |
Reporter: | Aleksandrs Saveljevs | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | collector, crash, proc.cpu.util | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: | zabbix_agentd.txt.xz |
Description |
Current r57474 of trunk crashes with the following backtrace: 9024:20160107:182705.365 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ... 9024:20160107:182705.365 ====== Fatal information: ====== 9024:20160107:182705.365 Program counter: 0x7eff3af8b684 9024:20160107:182705.365 === Registers: === 9024:20160107:182705.365 r8 = 442900 = 4466944 = 4466944 9024:20160107:182705.365 r9 = 1558ed0 = 22384336 = 22384336 9024:20160107:182705.365 r10 = 10 = 16 = 16 9024:20160107:182705.365 r11 = 206 = 518 = 518 9024:20160107:182705.365 r12 = 1530510 = 22218000 = 22218000 9024:20160107:182705.365 r13 = 155b520 = 22394144 = 22394144 9024:20160107:182705.365 r14 = 7eff3bb2afb8 = 139634683326392 = 139634683326392 9024:20160107:182705.365 r15 = 0 = 0 = 0 9024:20160107:182705.365 rdi = 7eff3b2b2620 = 139634674443808 = 139634674443808 9024:20160107:182705.365 rsi = 1552620 = 22357536 = 22357536 9024:20160107:182705.365 rbp = 22 = 34 = 34 9024:20160107:182705.365 rbx = 23 = 35 = 35 9024:20160107:182705.365 rdx = 220 = 544 = 544 9024:20160107:182705.365 rax = 0 = 0 = 0 9024:20160107:182705.365 rcx = 155b0e0 = 22393056 = 22393056 9024:20160107:182705.365 rsp = 7fffb1ae1068 = 140736174362728 = 140736174362728 9024:20160107:182705.365 rip = 7eff3af8b684 = 139634671138436 = 139634671138436 9024:20160107:182705.365 efl = 10246 = 66118 = 66118 9024:20160107:182705.365 csgsfs = 33 = 51 = 51 9024:20160107:182705.365 err = 4 = 4 = 4 9024:20160107:182705.365 trapno = e = 14 = 14 9024:20160107:182705.365 oldmask = 0 = 0 = 0 9024:20160107:182705.365 cr2 = 0 = 0 = 0 9024:20160107:182705.365 === Backtrace: === 9024:20160107:182705.366 15: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](print_fatal_info+0x9e) [0x42879e] 9024:20160107:182705.366 14: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data]() [0x4289d2] 9024:20160107:182705.366 13: /lib/x86_64-linux-gnu/libc.so.6(+0x35180) [0x7eff3af44180] 9024:20160107:182705.366 12: /lib/x86_64-linux-gnu/libc.so.6(cfree+0x34) [0x7eff3af8b684] 9024:20160107:182705.366 11: /lib/x86_64-linux-gnu/libc.so.6(+0xc1100) [0x7eff3afd0100] 9024:20160107:182705.366 10: /lib/x86_64-linux-gnu/libc.so.6(regfree+0x11) [0x7eff3afdc421] 9024:20160107:182705.366 9: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data]() [0x424672] 9024:20160107:182705.366 8: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](zbx_proc_get_matching_pids+0x127) [0x41a2f7] 9024:20160107:182705.366 7: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](zbx_procstat_collect+0x3a9) [0x412a69] 9024:20160107:182705.366 6: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](collector_thread+0x98) [0x40dea8] 9024:20160107:182705.366 5: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](zbx_thread_start+0x45) [0x427535] 9024:20160107:182705.366 4: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](MAIN_ZABBIX_ENTRY+0x1a6) [0x414146] 9024:20160107:182705.366 3: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](daemon_start+0x185) [0x428115] 9024:20160107:182705.366 2: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data](main+0x91) [0x40b161] 9024:20160107:182705.366 1: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7eff3af30b45] 9024:20160107:182705.366 0: /home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data]() [0x40b266] 9024:20160107:182705.366 === Memory map: === 9024:20160107:182705.366 00400000-00450000 r-xp 00000000 08:02 689264 /home/zabbix/zabbix-bin/sbin/zabbix_agentd 9024:20160107:182705.366 0064f000-00651000 rw-p 0004f000 08:02 689264 /home/zabbix/zabbix-bin/sbin/zabbix_agentd 9024:20160107:182705.366 00651000-00657000 rw-p 00000000 00:00 0 9024:20160107:182705.366 01512000-01533000 rw-p 00000000 00:00 0 [heap] 9024:20160107:182705.366 01533000-01586000 rw-p 00000000 00:00 0 [heap] 9024:20160107:182705.366 7eff3acf9000-7eff3ad0f000 r-xp 00000000 08:02 6293377 /lib/x86_64-linux-gnu/libgcc_s.so.1 9024:20160107:182705.366 7eff3ad0f000-7eff3af0e000 ---p 00016000 08:02 6293377 /lib/x86_64-linux-gnu/libgcc_s.so.1 9024:20160107:182705.366 7eff3af0e000-7eff3af0f000 rw-p 00015000 08:02 6293377 /lib/x86_64-linux-gnu/libgcc_s.so.1 9024:20160107:182705.366 7eff3af0f000-7eff3b0ae000 r-xp 00000000 08:02 6293657 /lib/x86_64-linux-gnu/libc-2.19.so 9024:20160107:182705.366 7eff3b0ae000-7eff3b2ae000 ---p 0019f000 08:02 6293657 /lib/x86_64-linux-gnu/libc-2.19.so 9024:20160107:182705.366 7eff3b2ae000-7eff3b2b2000 r--p 0019f000 08:02 6293657 /lib/x86_64-linux-gnu/libc-2.19.so 9024:20160107:182705.366 7eff3b2b2000-7eff3b2b4000 rw-p 001a3000 08:02 6293657 /lib/x86_64-linux-gnu/libc-2.19.so 9024:20160107:182705.366 7eff3b2b4000-7eff3b2b8000 rw-p 00000000 00:00 0 9024:20160107:182705.366 7eff3b2b8000-7eff3b2cc000 r-xp 00000000 08:02 6298741 /lib/x86_64-linux-gnu/libresolv-2.19.so 9024:20160107:182705.366 7eff3b2cc000-7eff3b4cb000 ---p 00014000 08:02 6298741 /lib/x86_64-linux-gnu/libresolv-2.19.so 9024:20160107:182705.366 7eff3b4cb000-7eff3b4cc000 r--p 00013000 08:02 6298741 /lib/x86_64-linux-gnu/libresolv-2.19.so 9024:20160107:182705.366 7eff3b4cc000-7eff3b4cd000 rw-p 00014000 08:02 6298741 /lib/x86_64-linux-gnu/libresolv-2.19.so 9024:20160107:182705.366 7eff3b4cd000-7eff3b4cf000 rw-p 00000000 00:00 0 9024:20160107:182705.366 7eff3b4cf000-7eff3b4d2000 r-xp 00000000 08:02 6296407 /lib/x86_64-linux-gnu/libdl-2.19.so 9024:20160107:182705.366 7eff3b4d2000-7eff3b6d1000 ---p 00003000 08:02 6296407 /lib/x86_64-linux-gnu/libdl-2.19.so 9024:20160107:182705.367 7eff3b6d1000-7eff3b6d2000 r--p 00002000 08:02 6296407 /lib/x86_64-linux-gnu/libdl-2.19.so 9024:20160107:182705.367 7eff3b6d2000-7eff3b6d3000 rw-p 00003000 08:02 6296407 /lib/x86_64-linux-gnu/libdl-2.19.so 9024:20160107:182705.367 7eff3b6d3000-7eff3b7d3000 r-xp 00000000 08:02 6297828 /lib/x86_64-linux-gnu/libm-2.19.so 9024:20160107:182705.367 7eff3b7d3000-7eff3b9d2000 ---p 00100000 08:02 6297828 /lib/x86_64-linux-gnu/libm-2.19.so 9024:20160107:182705.367 7eff3b9d2000-7eff3b9d3000 r--p 000ff000 08:02 6297828 /lib/x86_64-linux-gnu/libm-2.19.so 9024:20160107:182705.367 7eff3b9d3000-7eff3b9d4000 rw-p 00100000 08:02 6297828 /lib/x86_64-linux-gnu/libm-2.19.so 9024:20160107:182705.367 7eff3b9d4000-7eff3b9f4000 r-xp 00000000 08:02 6292412 /lib/x86_64-linux-gnu/ld-2.19.so 9024:20160107:182705.367 7eff3bad6000-7eff3bb2b000 rw-s 00000000 00:04 12222524 /SYSV70026aa5 (deleted) 9024:20160107:182705.367 7eff3bb2b000-7eff3bbcc000 rw-s 00000000 00:04 10453006 /SYSV6c026aa5 (deleted) 9024:20160107:182705.367 7eff3bbcc000-7eff3bbd0000 rw-p 00000000 00:00 0 9024:20160107:182705.367 7eff3bbf0000-7eff3bbf1000 rw-p 00000000 00:00 0 9024:20160107:182705.367 7eff3bbf1000-7eff3bbf2000 rw-p 00000000 00:00 0 9024:20160107:182705.367 7eff3bbf2000-7eff3bbf4000 rw-p 00000000 00:00 0 9024:20160107:182705.367 7eff3bbf4000-7eff3bbf5000 r--p 00020000 08:02 6292412 /lib/x86_64-linux-gnu/ld-2.19.so 9024:20160107:182705.367 7eff3bbf5000-7eff3bbf6000 rw-p 00021000 08:02 6292412 /lib/x86_64-linux-gnu/ld-2.19.so 9024:20160107:182705.367 7eff3bbf6000-7eff3bbf7000 rw-p 00000000 00:00 0 9024:20160107:182705.367 7fffb1ac2000-7fffb1ae3000 rw-p 00000000 00:00 0 [stack] 9024:20160107:182705.367 7fffb1b4e000-7fffb1b50000 r-xp 00000000 00:00 0 [vdso] 9024:20160107:182705.367 7fffb1b50000-7fffb1b52000 r--p 00000000 00:00 0 [vvar] 9024:20160107:182705.367 ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] 9024:20160107:182705.367 ================================ 9024:20160107:182705.367 Please consider attaching a disassembly listing to your bug report. 9024:20160107:182705.367 This listing can be produced with, e.g., objdump -DSswx zabbix_agentd. 9024:20160107:182705.367 ================================ 9023:20160107:182705.367 One child process died (PID:9024,exitcode/signal:1). Exiting ... zabbix_agentd [9023]: Error on thread waiting. 9023:20160107:182705.368 Zabbix Agent stopped. Zabbix 3.0.0alpha6 (revision {ZABBIX_REVISION}). |
Comments |
Comment by Aleksandrs Saveljevs [ 2016 Jan 08 ] |
Today it crashed with the following error: 28055:20160108:100131.762 In update_cpustats() 28055:20160108:100131.762 End of update_cpustats() 28055:20160108:100131.767 __zbx_zbx_setproctitle() title:'collector [idle 1 sec]' 28055:20160108:100132.767 __zbx_zbx_setproctitle() title:'collector [processing data]' 28055:20160108:100132.767 In update_cpustats() 28055:20160108:100132.768 End of update_cpustats() *** Error in `/home/zabbix/zabbix-bin/sbin/zabbix_agentd: collector [processing data]': free(): invalid next size (normal): 0x00000000008ed350 *** 28054:20160108:100132.772 One child process died (PID:28055,exitcode/signal:6). Exiting ... 28054:20160108:100132.772 Zabbix Agent stopped. Zabbix 3.0.0alpha6 (revision {ZABBIX_REVISION}). |
Comment by Andris Zeila [ 2016 Jan 08 ] |
(1) While not related to this crash, valgrind check discovered another problem. ==24311== Conditional jump or move depends on uninitialised value(s) ==24311== at 0x580E67A: vfprintf (vfprintf.c:1641) ==24311== by 0x5815D46: fprintf (fprintf.c:32) ==24311== by 0x42FB3A: __zbx_zabbix_log (log.c:436) ==24311== by 0x40BCA2: parse_list_of_checks (active.c:297) ==24311== by 0x40C845: refresh_active_checks (active.c:597) ==24311== by 0x40E3EC: active_checks_thread (active.c:1602) ==24311== by 0x434154: zbx_thread_start (threads.c:127) ==24311== by 0x417E6D: MAIN_ZABBIX_ENTRY (zabbix_agentd.c:917) ==24311== by 0x43516F: daemon_start (daemon.c:383) ==24311== by 0x418299: main (zabbix_agentd.c:1149) ==24311== Uninitialised value was created by a stack allocation ==24311== at 0x42FAA3: __zbx_zabbix_log (log.c:436) This happens because in log.c:get_time(struct tm **tm, long *milliseconds) we are using stack variable when calling localtime_r() function and then returning reference to this variable. Compiler options: Compiler: gcc Compiler flags: -g -Wall -Wextra -Wno-missing-field-initializers -Wno-unused-parameter -Wdeclaration-after-statement -Wpointer-arith -Wempty-body -Wno-error=sign-compare -Wno-error=unused-variable -Wno-error=pointer-sign -Wno-error=uninitialized -I/home/wiper/git/zabbix -I/usr/include/mysql -DBIG_JOINS=1 -fno-strict-aliasing -g -DNDEBUG -I/usr/include/libxml2 sandis.neilands This issue was introduced with correction for sandis.neilands RESOLVED in r57543. wiper CLOSED |
Comment by Sandis Neilands (Inactive) [ 2016 Jan 08 ] |
In proc_get_process_cmdline() when reading command line from /proc/<pid>/cmdline we must add terminating '\0' to the *cmdline. Turns out that the command line is not always terminated by '\0' in the cmdline file. The relevant output from Valgrind. ==20319== Conditional jump or move depends on uninitialised value(s) ==20319== at 0x4C2FD7E: __GI_memcpy (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==20319== by 0x44A869: zbx_strdup2 (misc.c:369) ==20319== by 0x427048: proc_create (proc.c:1173) ==20319== by 0x426E4D: zbx_proc_get_processes (proc.c:1243) ==20319== by 0x419856: zbx_procstat_collect (procstat.c:1158) ==20319== by 0x411847: collector_thread (stats.c:457) ==20319== by 0x4404C0: zbx_thread_start (threads.c:127) ==20319== by 0x41AE39: MAIN_ZABBIX_ENTRY (zabbix_agentd.c:909) ==20319== by 0x440EDC: daemon_start (daemon.c:383) ==20319== by 0x41B3CF: main (zabbix_agentd.c:1149) ==20319== Uninitialised value was created by a heap allocation ==20319== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==20319== by 0x44A69A: zbx_malloc2 (misc.c:318) ==20319== by 0x427DA8: proc_get_process_cmdline (proc.c:891) ==20319== by 0x426F61: proc_create (proc.c:1156) ==20319== by 0x426E4D: zbx_proc_get_processes (proc.c:1243) ==20319== by 0x419856: zbx_procstat_collect (procstat.c:1158) ==20319== by 0x411847: collector_thread (stats.c:457) ==20319== by 0x4404C0: zbx_thread_start (threads.c:127) ==20319== by 0x41AE39: MAIN_ZABBIX_ENTRY (zabbix_agentd.c:909) ==20319== by 0x440EDC: daemon_start (daemon.c:383) ==20319== by 0x41B3CF: main (zabbix_agentd.c:1149) And Valgrind output related to the second case reported by asaveljevs. ==20319== Conditional jump or move depends on uninitialised value(s) ==20319== at 0x65962B2: re_string_context_at (regex_internal.c:958) ==20319== by 0x65962B2: re_string_reconstruct (regex_internal.c:671) ==20319== by 0x659BBA2: re_search_internal (regexec.c:815) ==20319== by 0x65A2D84: regexec@@GLIBC_2.3.4 (regexec.c:253) ==20319== by 0x43B42C: zbx_regexp (zbxregexp.c:68) ==20319== by 0x43B2A9: zbx_regexp_match (zbxregexp.c:81) ==20319== by 0x4275B2: proc_match_cmdline (proc.c:1103) ==20319== by 0x4273C7: zbx_proc_get_matching_pids (proc.c:1320) ==20319== by 0x419DB5: procstat_scan_query_pids (procstat.c:731) ==20319== by 0x419875: zbx_procstat_collect (procstat.c:1161) ==20319== by 0x411847: collector_thread (stats.c:457) ==20319== by 0x4404C0: zbx_thread_start (threads.c:127) ==20319== by 0x41AE39: MAIN_ZABBIX_ENTRY (zabbix_agentd.c:909) ==20319== Uninitialised value was created by a heap allocation ==20319== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==20319== by 0x44A69A: zbx_malloc2 (misc.c:318) ==20319== by 0x427DA8: proc_get_process_cmdline (proc.c:891) ==20319== by 0x426F61: proc_create (proc.c:1156) ==20319== by 0x426E4D: zbx_proc_get_processes (proc.c:1243) ==20319== by 0x419856: zbx_procstat_collect (procstat.c:1158) ==20319== by 0x411847: collector_thread (stats.c:457) ==20319== by 0x4404C0: zbx_thread_start (threads.c:127) ==20319== by 0x41AE39: MAIN_ZABBIX_ENTRY (zabbix_agentd.c:909) ==20319== by 0x440EDC: daemon_start (daemon.c:383) ==20319== by 0x41B3CF: main (zabbix_agentd.c:1149) |
Comment by Sandis Neilands (Inactive) [ 2016 Jan 11 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-10239 |
Comment by Sandis Neilands (Inactive) [ 2016 Jan 12 ] |
(2) While checking for similar issues on Solaris found that we are not including <zone.h>. ... #include <procfs.h> /* HAVE_ZONE_H not defined yet so header is not included */ #ifdef HAVE_ZONE_H # include <zone.h> #endif #include "common.h" /* HAVE_ZONE_H defined in sysinc.h --> config,h */ #include "sysinfo.h" #include "zbxregexp.h" ... wiper CLOSED |
Comment by Sandis Neilands (Inactive) [ 2016 Jan 13 ] |
(3) Our documentation was slightly incomplete regarding examining process' titles on Solaris 11. Thanks andris for fixing that! CLOSED. |
Comment by Sandis Neilands (Inactive) [ 2016 Jan 18 ] |
Released in:
|