[ZBX-10497] zabbix_proxy 3.0.1 crashing when the log monitoring Created: 2016 Mar 04  Updated: 2017 May 30  Resolved: 2016 Mar 16

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P)
Affects Version/s: 3.0.1
Fix Version/s: 3.0.2rc1, 3.2.0alpha1

Type: Incident report Priority: Blocker
Reporter: Yoshihisa Wakui Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: crash, dbsyncer, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS 6.7


Attachments: File bugreport.tar.gz    

 Description   

when log file monitoring 1000 items on a monitored server by zabbix proxy, zabbix proxy crashes.
Each log, describes the required content pattern.



 Comments   
Comment by Sandis Neilands (Inactive) [ 2016 Mar 04 ]

Hello,

Thanks for the error report!

1. Does the issue occur regularly?
2. Could you provide the build configuration (./configure script parameters) that you used to build Zabbix (proxy)?

BR,
Sandis

Comment by Sandis Neilands (Inactive) [ 2016 Mar 04 ]

Also, if possible, save the relevant logs from the time of crash. They might be useful later in the investigation.

Comment by Glebs Ivanovskis (Inactive) [ 2016 Mar 04 ]

From objdump we see that there is comparison to NULL, then jump, then dereference operation. Why it crashes due to NULL dereference when it should have jumped?

00000000004ca550 <zbx_db_dyn_escape_string_len>:
...
  4ca552:	48 85 ff             	test   %rdi,%rdi
...
  4ca55a:	74 7c                	je     4ca5d8 <zbx_db_dyn_escape_string_len+0x88>
...
  4ca560:	0f b6 07             	movzbl (%rdi),%eax

And from log we see that it happened two times. Is your CPU healthy at all?

Comment by Oleksii Zagorskyi [ 2016 Mar 05 ]

For easing searching, backtraces from attached log:

  2518:20160303:165221.040 Got signal [signal:11(SIGSEGV),reason:128,refaddr:(nil)]. Crashing ...
...
  2518:20160303:165221.041 === Backtrace: ===
  2518:20160303:165221.042 13: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](print_fatal_info+0x280) [0x47a0a0]
  2518:20160303:165221.042 12: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history]() [0x47a85c]
  2518:20160303:165221.042 11: /lib64/libc.so.6(+0x326a0) [0x7f456ce8e6a0]
  2518:20160303:165221.042 10: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](zbx_db_dyn_escape_string_len+0x10) [0x4ca560]
  2518:20160303:165221.042 9: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](zbx_db_insert_add_values_dyn+0xa3) [0x49d843]
  2518:20160303:165221.042 8: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](zbx_db_insert_add_values+0x15a) [0x49da0a]
  2518:20160303:165221.042 7: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](DCsync_history+0x141d) [0x45abdd]
  2518:20160303:165221.042 6: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](dbsyncer_thread+0xe7) [0x418697]
  2518:20160303:165221.042 5: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](zbx_thread_start+0x64) [0x47ae14]
  2518:20160303:165221.042 4: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](MAIN_ZABBIX_ENTRY+0x3c2) [0x417642]
  2518:20160303:165221.042 3: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](daemon_start+0x1c1) [0x4790c1]
  2518:20160303:165221.042 2: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history](main+0x456) [0x417e16]
  2518:20160303:165221.042 1: /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f456ce7ad5d]
  2518:20160303:165221.042 0: /usr/sbin/zabbix_proxy: history syncer #2 [synced 0 items in 0.000001 sec, syncing history]() [0x4169f9]
  2493:20160303:165221.046 One child process died (PID:2518,exitcode/signal:1). Exiting ...
  2493:20160303:165223.047 syncing history data...
  2493:20160303:165223.048 Got signal [signal:11(SIGSEGV),reason:128,refaddr:(nil)]. Crashing ...
...
  2493:20160303:165223.049 === Backtrace: ===
  2493:20160303:165223.049 15: /usr/sbin/zabbix_proxy(print_fatal_info+0x280) [0x47a0a0]
  2493:20160303:165223.049 14: /usr/sbin/zabbix_proxy() [0x47a85c]
  2493:20160303:165223.049 13: /lib64/libc.so.6(+0x326a0) [0x7f456ce8e6a0]
  2493:20160303:165223.049 12: /usr/sbin/zabbix_proxy(zbx_db_dyn_escape_string_len+0x10) [0x4ca560]
  2493:20160303:165223.049 11: /usr/sbin/zabbix_proxy(zbx_db_insert_add_values_dyn+0xa3) [0x49d843]
  2493:20160303:165223.049 10: /usr/sbin/zabbix_proxy(zbx_db_insert_add_values+0x15a) [0x49da0a]
  2493:20160303:165223.049 9: /usr/sbin/zabbix_proxy(DCsync_history+0x141d) [0x45abdd]
  2493:20160303:165223.049 8: /usr/sbin/zabbix_proxy(free_database_cache+0x48) [0x45b478]
  2493:20160303:165223.049 7: /usr/sbin/zabbix_proxy(zbx_on_exit+0xdb) [0x41720b]
  2493:20160303:165223.049 6: /lib64/libc.so.6(+0x326a0) [0x7f456ce8e6a0]
  2493:20160303:165223.049 5: /lib64/libc.so.6(wait+0x24) [0x7f456cf085c4]
  2493:20160303:165223.049 4: /usr/sbin/zabbix_proxy(MAIN_ZABBIX_ENTRY+0x3e8) [0x417668]
  2493:20160303:165223.049 3: /usr/sbin/zabbix_proxy(daemon_start+0x1c1) [0x4790c1]
  2493:20160303:165223.050 2: /usr/sbin/zabbix_proxy(main+0x456) [0x417e16]
  2493:20160303:165223.050 1: /lib64/libc.so.6(__libc_start_main+0xfd) [0x7f456ce7ad5d]
  2493:20160303:165223.050 0: /usr/sbin/zabbix_proxy() [0x4169f9]
Comment by Oleksii Zagorskyi [ 2016 Mar 05 ]

2nd crash has happened in a main process when it tried to sync write cache.
I seem to recall similar cases in other ZBXses.

Comment by Yoshihisa Wakui [ 2016 Mar 07 ]

Thank you for your reply.

The cause was a mismatch of character code.

Monitoring items on Zabbix Server had been set in UTF8.
but , monitored file was using the UTF16.
When monitoring items and monitored file are set in UTF8, Symptoms were improved. - no crash

1. Does the issue occur regularly?
If monitoring items is set UTF8 , monitored file is set in UTF16 and Proxy is monitoring 1000 LOG items, the issue occur regularly.
but monitoring items is set UTF8 , monitored file is set in UTF16 and Proxy is monitoring 1 LOG items, the issue doesn't occur.

2. Could you provide the build configuration (./configure script parameters) that you used to build Zabbix (proxy)?
I'm using the RPM package.
http://repo.zabbix.com/zabbix/3.0/rhel/6/x86_64/
zabbix-proxy-mysql-3.0.1-1.el6.x86_64.rpm

Best regards,
Yoshihisa

Comment by Sandis Neilands (Inactive) [ 2016 Mar 07 ]

Hello stereoType,

Thanks you for the clarifications. While we are trying to reproduce the issue, could you provide some additional information?

1. We are assuming that the monitored files contain lines in Japanese?
2. Do the crashes occur after some specific line in log file or just at random?
3. How many items per a single log file do you have? E.g. do you have 1 log file and 1000 different items on that file or you have 1000 different log files and each is monitored by an item?
4. You had no such issue before 3.0.1?

BR,
Sandis

Comment by Sandis Neilands (Inactive) [ 2016 Mar 07 ]

Also, if possible, please attach the log file in question. If you do not want to disclose it on public site you can send it to me personally at [email protected].

Comment by Yoshihisa Wakui [ 2016 Mar 08 ]

Hello Sandis Neilands,

I will send the log file to you.

1. We are assuming that the monitored files contain lines in Japanese?
The monitored files don't contain lines in Japanese.
See the attached documents. Agent/monitoredFile/perfX.log

2. Do the crashes occur after some specific line in log file or just at random?
Maybe random.
When the crashes occur, character string "Error 3" and "Error 4" were described.

3. How many items per a single log file do you have? E.g. do you have 1 log file and 1000 different items on that file or you have 1000 different log files and each is monitored by an item?
I have 1000 different log files and each is monitored by an item.
See the attached pictures. - items.png.

4. You had no such issue before 3.0.1?
I don't know because I have not tested before 3.0.1.

Best regards,
Yoshihisa

Comment by Sandis Neilands (Inactive) [ 2016 Mar 09 ]

The crash is a regression since ZBX-10405.

Some fields of history object are no longer initialized in hc_copy_history_data() when ZBX_DC_FLAG_NOVALUE flag is set. Later, in dc_add_proxy_history_log(), pvalue is set to empty string in case the flag is set however the uninitialized h->value.str is passed to zbx_db_insert_add_values() as is in case it was not NULL. The value of h->value.str can be either NULL (no crash), address out of bounds (crash) or address within bounds (possible memory corruption).

Valgrind report:

==16181== Use of uninitialised value of size 8
==16181==    at 0x4CA820: zbx_db_dyn_escape_string_len (db.c:2106)
==16181==    by 0x49DB02: zbx_db_insert_add_values_dyn (db.c:2080)
==16181==    by 0x49DCC9: zbx_db_insert_add_values (db.c:2156)
==16181==    by 0x45AD6C: DCsync_history (dbcache.c:1931)
==16181==    by 0x4186D6: dbsyncer_thread (dbsyncer.c:77)
==16181==    by 0x47B0C3: zbx_thread_start (threads.c:128)
==16181==    by 0x417681: MAIN_ZABBIX_ENTRY (proxy.c:1016)
==16181==    by 0x479370: daemon_start (daemon.c:392)
==16181==    by 0x417E55: main (proxy.c:804)
==16181==  Uninitialised value was created by a heap allocation
==16181==    at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
==16181==    by 0x47DF58: zbx_malloc2 (misc.c:320)
==16181==    by 0x45B4E0: DCsync_history (dbcache.c:2084)
==16181==    by 0x4186D6: dbsyncer_thread (dbsyncer.c:77)
==16181==    by 0x47B0C3: zbx_thread_start (threads.c:128)
==16181==    by 0x417681: MAIN_ZABBIX_ENTRY (proxy.c:1016)
==16181==    by 0x479370: daemon_start (daemon.c:392)
==16181==    by 0x417E55: main (proxy.c:804)
Comment by Sandis Neilands (Inactive) [ 2016 Mar 09 ]

Related issues: ZBX-10102 and ZBX-7821.

Comment by Sandis Neilands (Inactive) [ 2016 Mar 10 ]

In case there was no crash the proxy can send other uninitialized values to the server: timestamp, severity, logeventid.

Such log entries can be identified in the proxy DB by the flags field of proxy_history table.

Comment by Sandis Neilands (Inactive) [ 2016 Mar 10 ]

The 3.0 server will ignore the possibly corrupt timestamp, severity, logeventid values if the proxy doesn't send lastlogsize (it will not in case of PROXY_HISTORY_FLAG_NOVALUE.

According to the documentation (https://www.zabbix.com/documentation/3.0/manual/appendix/compatibility) proxy 3.0 should only be used with server 3.0.

Comment by Sandis Neilands (Inactive) [ 2016 Mar 10 ]

Resolved in development branch svn://svn.zabbix.com/branches/dev/ZBX-10497 .

The correction is conservative to reduce the risk of further regressions in a minor release. Decided against zeroing out the history objects before use in order to be able to spot with Valgrind if something else is used unintentionally.

Testing scenario

  • Set up and start server and proxy (active) on Linux;
  • Set up 10 or more items like this:
    log[C:\Users\Zabbix\Desktop\agentd\log\perf1.log,Error|Critical]
    log[C:\Users\Zabbix\Desktop\agentd\log\perf2.log,Error|Critical]
    log[C:\Users\Zabbix\Desktop\agentd\log\perf3.log,Error|Critical]
    etc...
    
  • On Windows create the corresponding files in UTF-16 (yes, that is mismatch to what is configured above) encoding;
  • Set up 3.0.* agent (active) on Windows;
  • Modify the files so that metadata updates are sent to proxy (verify with Wireshark or by debug logs).

Expected: proxy doesn't crash, Valgrind doesn't complain about use of uninitialized variables.

Comment by richlv [ 2016 Mar 10 ]

support for 2.4 ends, but is there something to fix for 2.2 - for example, if somebody accidentally points 3.0 proxy at a 2.2 server, can they get corrupt entries ?

Comment by Sandis Neilands (Inactive) [ 2016 Mar 11 ]

The 2.2 and 2.4 servers ignore received history record if it doesn't contain a value. See process_hist_data() for details.

Comment by Aleksandrs Saveljevs [ 2016 Mar 15 ]

(1) Please review r59015 - it removes "static" from "unset_if_novalue" constant.

sandis.neilands CLOSED, thanks!

Comment by Aleksandrs Saveljevs [ 2016 Mar 15 ]

Successfully tested.

Comment by Sandis Neilands (Inactive) [ 2016 Mar 16 ]

Released in:

  • pre-3.0.2.rc1 r59023;
  • pre-3.1.0 (trunk) r59025.
Generated at Fri Mar 29 08:57:21 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.