[ZBX-10497] zabbix_proxy 3.0.1 crashing when the log monitoring Created: 2016 Mar 04 Updated: 2017 May 30 Resolved: 2016 Mar 16 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Proxy (P) |
Affects Version/s: | 3.0.1 |
Fix Version/s: | 3.0.2rc1, 3.2.0alpha1 |
Type: | Incident report | Priority: | Blocker |
Reporter: | Yoshihisa Wakui | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | crash, dbsyncer, performance | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
CentOS 6.7 |
Attachments: |
![]() |
Description |
when log file monitoring 1000 items on a monitored server by zabbix proxy, zabbix proxy crashes. |
Comments |
Comment by Sandis Neilands (Inactive) [ 2016 Mar 04 ] |
Hello, Thanks for the error report! 1. Does the issue occur regularly? BR, |
Comment by Sandis Neilands (Inactive) [ 2016 Mar 04 ] |
Also, if possible, save the relevant logs from the time of crash. They might be useful later in the investigation. |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Mar 04 ] |
From objdump we see that there is comparison to NULL, then jump, then dereference operation. Why it crashes due to NULL dereference when it should have jumped? 00000000004ca550 <zbx_db_dyn_escape_string_len>: ... 4ca552: 48 85 ff test %rdi,%rdi ... 4ca55a: 74 7c je 4ca5d8 <zbx_db_dyn_escape_string_len+0x88> ... 4ca560: 0f b6 07 movzbl (%rdi),%eax And from log we see that it happened two times. Is your CPU healthy at all? |
Comment by Oleksii Zagorskyi [ 2016 Mar 05 ] |
For easing searching, backtraces from attached log: 2518:20160303:165221.040 Got signal [signal:11(SIGSEGV),reason:128,refaddr:(nil)]. Crashing ... ... 2518:20160303:165221.041 === Backtrace: === 2518:20160303:165221.042 13: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](print_fatal_info+0x280) [0x47a0a0] 2518:20160303:165221.042 12: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history]() [0x47a85c] 2518:20160303:165221.042 11: /lib64/libc.so.6(+0x326a0) [0x7f456ce8e6a0] 2518:20160303:165221.042 10: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](zbx_db_dyn_escape_string_len+0x10) [0x4ca560] 2518:20160303:165221.042 9: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](zbx_db_insert_add_values_dyn+0xa3) [0x49d843] 2518:20160303:165221.042 8: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](zbx_db_insert_add_values+0x15a) [0x49da0a] 2518:20160303:165221.042 7: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](DCsync_history+0x141d) [0x45abdd] 2518:20160303:165221.042 6: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](dbsyncer_thread+0xe7) [0x418697] 2518:20160303:165221.042 5: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](zbx_thread_start+0x64) [0x47ae14] 2518:20160303:165221.042 4: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](MAIN_ZABBIX_ENTRY+0x3c2) [0x417642] 2518:20160303:165221.042 3: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](daemon_start+0x1c1) [0x4790c1] 2518:20160303:165221.042 2: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](main+0x456) [0x417e16] 2518:20160303:165221.042 1: /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f456ce7ad5d] 2518:20160303:165221.042 0: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history]() [0x4169f9] 2493:20160303:165221.046 One child process died (PID:2518,exitcode/signal:1). Exiting ... 2493:20160303:165223.047 syncing history data... 2493:20160303:165223.048 Got signal [signal:11(SIGSEGV),reason:128,refaddr:(nil)]. Crashing ... ... 2493:20160303:165223.049 === Backtrace: === 2493:20160303:165223.049 15: /usr/sbin/zabbix_proxy(print_fatal_info+0x280) [0x47a0a0] 2493:20160303:165223.049 14: /usr/sbin/zabbix_proxy() [0x47a85c] 2493:20160303:165223.049 13: /lib64/libc.so.6(+0x326a0) [0x7f456ce8e6a0] 2493:20160303:165223.049 12: /usr/sbin/zabbix_proxy(zbx_db_dyn_escape_string_len+0x10) [0x4ca560] 2493:20160303:165223.049 11: /usr/sbin/zabbix_proxy(zbx_db_insert_add_values_dyn+0xa3) [0x49d843] 2493:20160303:165223.049 10: /usr/sbin/zabbix_proxy(zbx_db_insert_add_values+0x15a) [0x49da0a] 2493:20160303:165223.049 9: /usr/sbin/zabbix_proxy(DCsync_history+0x141d) [0x45abdd] 2493:20160303:165223.049 8: /usr/sbin/zabbix_proxy(free_database_cache+0x48) [0x45b478] 2493:20160303:165223.049 7: /usr/sbin/zabbix_proxy(zbx_on_exit+0xdb) [0x41720b] 2493:20160303:165223.049 6: /lib64/libc.so.6(+0x326a0) [0x7f456ce8e6a0] 2493:20160303:165223.049 5: /lib64/libc.so.6(wait+0x24) [0x7f456cf085c4] 2493:20160303:165223.049 4: /usr/sbin/zabbix_proxy(MAIN_ZABBIX_ENTRY+0x3e8) [0x417668] 2493:20160303:165223.049 3: /usr/sbin/zabbix_proxy(daemon_start+0x1c1) [0x4790c1] 2493:20160303:165223.050 2: /usr/sbin/zabbix_proxy(main+0x456) [0x417e16] 2493:20160303:165223.050 1: /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f456ce7ad5d] 2493:20160303:165223.050 0: /usr/sbin/zabbix_proxy() [0x4169f9] |
Comment by Oleksii Zagorskyi [ 2016 Mar 05 ] |
2nd crash has happened in a main process when it tried to sync write cache. |
Comment by Yoshihisa Wakui [ 2016 Mar 07 ] |
Thank you for your reply. The cause was a mismatch of character code. Monitoring items on Zabbix Server had been set in UTF8. 1. Does the issue occur regularly? 2. Could you provide the build configuration (./configure script parameters) that you used to build Zabbix (proxy)? Best regards, |
Comment by Sandis Neilands (Inactive) [ 2016 Mar 07 ] |
Hello stereoType, Thanks you for the clarifications. While we are trying to reproduce the issue, could you provide some additional information? 1. We are assuming that the monitored files contain lines in Japanese? BR, |
Comment by Sandis Neilands (Inactive) [ 2016 Mar 07 ] |
Also, if possible, please attach the log file in question. If you do not want to disclose it on public site you can send it to me personally at [email protected]. |
Comment by Yoshihisa Wakui [ 2016 Mar 08 ] |
Hello Sandis Neilands, I will send the log file to you. 1. We are assuming that the monitored files contain lines in Japanese? 2. Do the crashes occur after some specific line in log file or just at random? 3. How many items per a single log file do you have? E.g. do you have 1 log file and 1000 different items on that file or you have 1000 different log files and each is monitored by an item? 4. You had no such issue before 3.0.1? Best regards, |
Comment by Sandis Neilands (Inactive) [ 2016 Mar 09 ] |
The crash is a regression since Some fields of history object are no longer initialized in hc_copy_history_data() when ZBX_DC_FLAG_NOVALUE flag is set. Later, in dc_add_proxy_history_log(), pvalue is set to empty string in case the flag is set however the uninitialized h->value.str is passed to zbx_db_insert_add_values() as is in case it was not NULL. The value of h->value.str can be either NULL (no crash), address out of bounds (crash) or address within bounds (possible memory corruption). Valgrind report: ==16181== Use of uninitialised value of size 8 ==16181== at 0x4CA820: zbx_db_dyn_escape_string_len (db.c:2106) ==16181== by 0x49DB02: zbx_db_insert_add_values_dyn (db.c:2080) ==16181== by 0x49DCC9: zbx_db_insert_add_values (db.c:2156) ==16181== by 0x45AD6C: DCsync_history (dbcache.c:1931) ==16181== by 0x4186D6: dbsyncer_thread (dbsyncer.c:77) ==16181== by 0x47B0C3: zbx_thread_start (threads.c:128) ==16181== by 0x417681: MAIN_ZABBIX_ENTRY (proxy.c:1016) ==16181== by 0x479370: daemon_start (daemon.c:392) ==16181== by 0x417E55: main (proxy.c:804) ==16181== Uninitialised value was created by a heap allocation ==16181== at 0x4A06A2E: malloc (vg_replace_malloc.c:270) ==16181== by 0x47DF58: zbx_malloc2 (misc.c:320) ==16181== by 0x45B4E0: DCsync_history (dbcache.c:2084) ==16181== by 0x4186D6: dbsyncer_thread (dbsyncer.c:77) ==16181== by 0x47B0C3: zbx_thread_start (threads.c:128) ==16181== by 0x417681: MAIN_ZABBIX_ENTRY (proxy.c:1016) ==16181== by 0x479370: daemon_start (daemon.c:392) ==16181== by 0x417E55: main (proxy.c:804) |
Comment by Sandis Neilands (Inactive) [ 2016 Mar 09 ] |
Comment by Sandis Neilands (Inactive) [ 2016 Mar 10 ] |
In case there was no crash the proxy can send other uninitialized values to the server: timestamp, severity, logeventid. Such log entries can be identified in the proxy DB by the flags field of proxy_history table. |
Comment by Sandis Neilands (Inactive) [ 2016 Mar 10 ] |
The 3.0 server will ignore the possibly corrupt timestamp, severity, logeventid values if the proxy doesn't send lastlogsize (it will not in case of PROXY_HISTORY_FLAG_NOVALUE. According to the documentation (https://www.zabbix.com/documentation/3.0/manual/appendix/compatibility) proxy 3.0 should only be used with server 3.0. |
Comment by Sandis Neilands (Inactive) [ 2016 Mar 10 ] |
Resolved in development branch svn://svn.zabbix.com/branches/dev/ZBX-10497 . The correction is conservative to reduce the risk of further regressions in a minor release. Decided against zeroing out the history objects before use in order to be able to spot with Valgrind if something else is used unintentionally. Testing scenario
Expected: proxy doesn't crash, Valgrind doesn't complain about use of uninitialized variables. |
Comment by richlv [ 2016 Mar 10 ] |
support for 2.4 ends, but is there something to fix for 2.2 - for example, if somebody accidentally points 3.0 proxy at a 2.2 server, can they get corrupt entries ? |
Comment by Sandis Neilands (Inactive) [ 2016 Mar 11 ] |
The 2.2 and 2.4 servers ignore received history record if it doesn't contain a value. See process_hist_data() for details. |
Comment by Aleksandrs Saveljevs [ 2016 Mar 15 ] |
(1) Please review r59015 - it removes "static" from "unset_if_novalue" constant. sandis.neilands CLOSED, thanks! |
Comment by Aleksandrs Saveljevs [ 2016 Mar 15 ] |
Successfully tested. |
Comment by Sandis Neilands (Inactive) [ 2016 Mar 16 ] |
Released in:
|