[ZBX-4980] Trappers can hang (futex) or crash on decoding big base64 encoded values Created: 2012 May 10  Updated: 2017 May 30  Resolved: 2012 Jun 04

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: None
Fix Version/s: 1.8.14rc1, 2.0.0rc5, 2.0.1rc1, 2.1.0

Type: Incident report Priority: Blocker
Reporter: Oleksii Zagorskyi Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: crash, futex
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

i686, x86_64
affected any version of zabbix server


Attachments: File 185_ltrace_1901_24K_wrong_decoded_value.out     File 185_ltrace_1901_48K_sucessfull+correct_value.out     File 200rc3_ltrace_681_futex.out     File 200rc3_strace_342_futex.out     Text File incorrect_encoded_value.txt     File r27613_ltrace_17825_wrong_decoded_value.out     File script_to_reproduce.tar.gz     PNG File trappers_hang.png     Text File zabbix_server_trappers_crash_centos_i686.log    
Issue Links:
Duplicate
duplicates ZBX-4478 trapper crashes when receiving large ... Closed
is duplicated by ZBX-4444 Got signal [signal:11(SIGSEGV),reason... Closed

 Description   

This case observed because of Orabbix.

Orabbix's keys: "audit" (I'm 100% sure), "locks" (almost 100% sure) are potential killers.

As we know Orabbix send data in the old ZBX protocol where XML format used and all values (hostname, key, value) encoded in base64.

When value is very big (I always consider clear text values, i.e. not encoded to base64) - there two different behavior observed, both are critical.

1) When trappers convert big base64 encoded values it can returns wrong decoded value. In my test lab I can reproduce it when real DBforBix (Orabbix) sends two small values and one "24K" in the row.
You can see it in the attached file "185_ltrace_1901_24K_wrong_decoded_value.out". The value "DBforBIX Version 0.6" for the key "DBforBIX.MySQL.oratest" comes from previous decoded key "DBforBIX.Version"
In the same time, more big value "48K" decoded and returned correctly - see another attached "185_ltrace_1901_48K_sucessfull+correct_value.out"
And very short excerpt of this comparison you can see in attached "incorrect_encoded_value.txt".

In another production environment (where is constant flow of different values from real Orabbix instance) I was able to reproduce this case with even smaller values, as I recall 12-30K

2) trapper hangs (for x86_64) or dies (for i686) in function "str_base64_decode"

For i686 see attached "zabbix_server_trappers_crash_centos_i686.log"
On this 32bit CentOS 6.2 zabbix_server dies with 48K, and works ok with 36K.

To help find similar cases over the Jira here is small log's excerpt:

      • stack smashing detected ***: /usr/sbin/zabbix_server terminated
        ======= Backtrace: =========
        /lib/libc.so.6(__fortify_fail+0x4d)[0xb8e59d]
        /lib/libc.so.6(+0xf754a)[0xb8e54a]
        /usr/sbin/zabbix_server[0x80dd0f4]
        /usr/sbin/zabbix_server(str_base64_decode+0x51a)[0x80acdba]
        /usr/sbin/zabbix_server(comms_parse_response+0x25b)[0x80a6b5b]
        /usr/sbin/zabbix_server[0x806e69e]
        /usr/sbin/zabbix_server(main_trapper_loop+0x12f)[0x806f41f]
        /usr/sbin/zabbix_server(MAIN_ZABBIX_ENTRY+0x816)[0x8059416]
        /usr/sbin/zabbix_server(daemon_start+0x2af)[0x809dcbf]
        /usr/sbin/zabbix_server(main+0x2d0)[0x8059c40]
        /lib/libc.so.6(__libc_start_main+0xe6)[0xaadce6]
        /usr/sbin/zabbix_server[0x8053de1]
        ...
        14314:20120503:115247.714 Zabbix Server stopped. Zabbix 2.0.0rc4 (revision 27142).

More detailed investigated "x86_64" platform on Debian 6.0.4, zabbix_server 2.0.0rc3 (latest revisions)
If sent values is 52K and more - trapper will get "futex", sent value 48K - server was able to decode it.
In production environment I saw 90-120K sent values.

See attached "200rc3_ltrace_681_futex.out" and just in case small Strace "200rc3_strace_342_futex.out"
DBforBix (Orabbix) sends two small values and one "52K" in the row.

It's not possible to see anything in the zabbix log because the trapper hang.
The same will happen with all available trappers during some period of time. Frontend will show that zabbix server is not running.

In the attached archive "script_to_reproduce" you can find a ready script and dummy files different size to easily reproduce this case, you even don'y need to create any hosts/items to reproduce the crash/futex.



 Comments   
Comment by jchegedus [ 2012 May 10 ]

I also reported that at: https://support.zabbix.com/browse/ZBX-4478

Comment by Oleksii Zagorskyi [ 2012 May 10 ]

Yes, many thanks! it's identical case, I'll link it here.
I don't know why I don't remember about it

Comment by Oleksii Zagorskyi [ 2012 May 10 ]

ZBX-4444 could be similar, find there my comment.

Comment by Andris Mednis [ 2012 May 16 ]

For stable version fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4980.

Comment by Alexander Vladishev [ 2012 May 16 ]

Great work! Tested. Please review my changes in r27557:27560

Comment by Andris Mednis [ 2012 May 16 ]

Fixed in versions pre-1.8.14 rev. 27565 and pre-2.0.0 rev. 27567.

Comment by Oleksii Zagorskyi [ 2012 May 17 ]

Tested latest revision.
I cannot reproduce case with futex, thanks.

But with the scenario I described above (three values in the row) I have the same as I've described in the summary.

  1. zabbix_server -V
    Zabbix server v2.0.0rc4 (revision 27613) (12 May 2012)
    Compilation time: May 17 2012 19:29:41

17809:20120517:203437.569 Starting Zabbix Server. Zabbix 2.0.0rc4 (revision 27613).
17809:20120517:203437.569 ****** Enabled features ******
...
17825:20120517:203437.610 server #9 started trapper #1
...

See attached "r27613_ltrace_17825_wrong_decoded_value.out"

You can see wrong 3rd decoded value "DBforBIX Version 0.6" for the key "DBforBIX.MySQL.oratest"
the save value as for 2nd value with key "DBforBIX.Version"

Once I was able to get this wrong value "DBforBIX Version 0.6" after sending manually from script the 52K-size file after some pause, i.e. the case where three values comes in the row is not 100% relevant.

So, there is stil some problem with memory managing.
I used the same Debian 6.0.4 x86_64

REOPENED

<Andris> RESOLVED in r27850

Comment by Andris Mednis [ 2012 May 18 ]

Thanks for noticing the problem! I focused only on repairing and testing str_base64_decode() to solve "hang on futex or crash", but missed wrong decoding.

Comment by Andris Mednis [ 2012 May 28 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4980

Comment by Alexander Vladishev [ 2012 May 31 ]

(1) call of strstr() function is very expensive. It should be rewritten.
src/libs/zbxcomms/comms.c:1034
src/libs/zbxcomms/comms.c:1079

<Andris> fixed in r27983. strstr() is still used, but now on very short substrings.

<Sasha> CLOSED

Comment by Alexander Vladishev [ 2012 May 31 ]

(2) we should verify XML opening "<reg>" tag
src/libs/zbxcomms/comms.c:1031
src/libs/zbxcomms/comms.c:1076

<Andris> fixed in r27983.

<Sasha> CLOSED

Comment by Alexander Vladishev [ 2012 May 31 ]

(3) please review my changes in r27967

<Andris> reviewed and included in r27983.

<Sasha> CLOSED

Comment by Oleksii Zagorskyi [ 2012 May 31 ]

We should not remove values from Fix Version/s field if they ready included in some release (or branch). I mean 2.0.0rc5
1.8.14rc1 and 2.0.0rc5 has been re-added again.
Before closing 1.8.14rc1 can be replaced with 1.8.14rcX, depends when it will be merged.
Appropriate 2.0.x and 2.1.x also should be added before closing

Comment by Alexander Vladishev [ 2012 Jun 01 ]

Great! Successfully tested.

Comment by Andris Mednis [ 2012 Jun 04 ]

Fixed in versions pre-1.8.14 rev. 28037, pre-2.0.1 rev. 28031 and pre-2.1.0 rev. 28032.

Comment by Oleksii Zagorskyi [ 2012 Jun 04 ]

Just small final note: one part of this fix (for crash/futex) has been included a bit early - in the 2.0.0 release.

Comment by Oleksii Zagorskyi [ 2013 May 10 ]

I've checked it once more with DebugLevel=4.
A bit clarified version on my last comment: zabbix server inserts in to DB just 64KB of data (counted from a SQL statement in zabbix_server.log debug) disregarding that original sent value (later decoded to base64) was 104 KB.

Generated at Fri Mar 29 00:02:46 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.