ZABBIX BUGS AND ISSUES

Trappers can hang (futex) or crash on decoding big base64 encoded values

Details

  • Zabbix ID:
    Reviewed 2.0

Description

This case observed because of Orabbix.

Orabbix's keys: "audit" (I'm 100% sure), "locks" (almost 100% sure) are potential killers.

As we know Orabbix send data in the old ZBX protocol where XML format used and all values (hostname, key, value) encoded in base64.

When value is very big (I always consider clear text values, i.e. not encoded to base64) - there two different behavior observed, both are critical.

1) When trappers convert big base64 encoded values it can returns wrong decoded value. In my test lab I can reproduce it when real DBforBix (Orabbix) sends two small values and one "24K" in the row.
You can see it in the attached file "185_ltrace_1901_24K_wrong_decoded_value.out". The value "DBforBIX Version 0.6" for the key "DBforBIX.MySQL.oratest" comes from previous decoded key "DBforBIX.Version"
In the same time, more big value "48K" decoded and returned correctly - see another attached "185_ltrace_1901_48K_sucessfull+correct_value.out"
And very short excerpt of this comparison you can see in attached "incorrect_encoded_value.txt".

In another production environment (where is constant flow of different values from real Orabbix instance) I was able to reproduce this case with even smaller values, as I recall 12-30K



2) trapper hangs (for x86_64) or dies (for i686) in function "str_base64_decode"

For i686 see attached "zabbix_server_trappers_crash_centos_i686.log"
On this 32bit CentOS 6.2 zabbix_server dies with 48K, and works ok with 36K.

To help find similar cases over the Jira here is small log's excerpt:
*** stack smashing detected ***: /usr/sbin/zabbix_server terminated
======= Backtrace: =========
/lib/libc.so.6(__fortify_fail+0x4d)[0xb8e59d]
/lib/libc.so.6(+0xf754a)[0xb8e54a]
/usr/sbin/zabbix_server[0x80dd0f4]
/usr/sbin/zabbix_server(str_base64_decode+0x51a)[0x80acdba]
/usr/sbin/zabbix_server(comms_parse_response+0x25b)[0x80a6b5b]
/usr/sbin/zabbix_server[0x806e69e]
/usr/sbin/zabbix_server(main_trapper_loop+0x12f)[0x806f41f]
/usr/sbin/zabbix_server(MAIN_ZABBIX_ENTRY+0x816)[0x8059416]
/usr/sbin/zabbix_server(daemon_start+0x2af)[0x809dcbf]
/usr/sbin/zabbix_server(main+0x2d0)[0x8059c40]
/lib/libc.so.6(__libc_start_main+0xe6)[0xaadce6]
/usr/sbin/zabbix_server[0x8053de1]
...
 14314:20120503:115247.714 Zabbix Server stopped. Zabbix 2.0.0rc4 (revision 27142).



More detailed investigated "x86_64" platform on Debian 6.0.4, zabbix_server 2.0.0rc3 (latest revisions)
If sent values is 52K and more - trapper will get "futex", sent value 48K - server was able to decode it.
In production environment I saw 90-120K sent values.

See attached "200rc3_ltrace_681_futex.out" and just in case small Strace "200rc3_strace_342_futex.out"
DBforBix (Orabbix) sends two small values and one "52K" in the row.

It's not possible to see anything in the zabbix log because the trapper hang.
The same will happen with all available trappers during some period of time. Frontend will show that zabbix server is not running.

In the attached archive "script_to_reproduce" you can find a ready script and dummy files different size to easily reproduce this case, you even don'y need to create any hosts/items to reproduce the crash/futex.
  1. 185_ltrace_1901_24K_wrong_decoded_value.out
    2012 May 10 14:16
    40 kB
    Oleksiy Zagorskyi
  2. 185_ltrace_1901_48K_sucessfull+correct_value.out
    2012 May 10 14:16
    48 kB
    Oleksiy Zagorskyi
  3. 200rc3_ltrace_681_futex.out
    2012 May 10 14:16
    40 kB
    Oleksiy Zagorskyi
  4. 200rc3_strace_342_futex.out
    2012 May 10 14:16
    5 kB
    Oleksiy Zagorskyi
  5. incorrect_encoded_value.txt
    2012 May 10 14:16
    4 kB
    Oleksiy Zagorskyi
  6. r27613_ltrace_17825_wrong_decoded_value.out
    2012 May 17 20:57
    39 kB
    Oleksiy Zagorskyi
  7. script_to_reproduce.tar.gz
    2012 May 10 15:07
    2 kB
    Oleksiy Zagorskyi
  8. zabbix_server_trappers_crash_centos_i686.log
    2012 May 10 14:16
    22 kB
    Oleksiy Zagorskyi
  1. trappers_hang.png
    65 kB
    2012 May 10 14:16

Issue Links

Activity

Hide
j c hegedus added a comment -
Show
j c hegedus added a comment - I also reported that at: https://support.zabbix.com/browse/ZBX-4478
Hide
Oleksiy Zagorskyi added a comment -

Yes, many thanks! it's identical case, I'll link it here.
I don't know why I don't remember about it

Show
Oleksiy Zagorskyi added a comment - Yes, many thanks! it's identical case, I'll link it here. I don't know why I don't remember about it
Hide
Oleksiy Zagorskyi added a comment -

ZBX-4444 could be similar, find there my comment.

Show
Oleksiy Zagorskyi added a comment - ZBX-4444 could be similar, find there my comment.
Hide
Andris Mednis added a comment -

For stable version fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4980.

Show
Andris Mednis added a comment - For stable version fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4980.
Hide
Alexander Vladishev added a comment -

Great work! Tested. Please review my changes in r27557:27560

Show
Alexander Vladishev added a comment - Great work! Tested. Please review my changes in r27557:27560
Hide
Andris Mednis added a comment -

Fixed in versions pre-1.8.14 rev. 27565 and pre-2.0.0 rev. 27567.

Show
Andris Mednis added a comment - Fixed in versions pre-1.8.14 rev. 27565 and pre-2.0.0 rev. 27567.
Hide
Oleksiy Zagorskyi added a comment - - edited

Tested latest revision.
I cannot reproduce case with futex, thanks.

But with the scenario I described above (three values in the row) I have the same as I've described in the summary.

  1. zabbix_server -V
    Zabbix server v2.0.0rc4 (revision 27613) (12 May 2012)
    Compilation time: May 17 2012 19:29:41

17809:20120517:203437.569 Starting Zabbix Server. Zabbix 2.0.0rc4 (revision 27613).
17809:20120517:203437.569 ****** Enabled features ******
...
17825:20120517:203437.610 server #9 started trapper #1
...

See attached "r27613_ltrace_17825_wrong_decoded_value.out"

You can see wrong 3rd decoded value "DBforBIX Version 0.6" for the key "DBforBIX.MySQL.oratest"
the save value as for 2nd value with key "DBforBIX.Version"

Once I was able to get this wrong value "DBforBIX Version 0.6" after sending manually from script the 52K-size file after some pause, i.e. the case where three values comes in the row is not 100% relevant.

So, there is stil some problem with memory managing.
I used the same Debian 6.0.4 x86_64

REOPENED

<Andris> RESOLVED in r27850

Show
Oleksiy Zagorskyi added a comment - - edited Tested latest revision. I cannot reproduce case with futex, thanks. But with the scenario I described above (three values in the row) I have the same as I've described in the summary.
  1. zabbix_server -V Zabbix server v2.0.0rc4 (revision 27613) (12 May 2012) Compilation time: May 17 2012 19:29:41
17809:20120517:203437.569 Starting Zabbix Server. Zabbix 2.0.0rc4 (revision 27613). 17809:20120517:203437.569 ****** Enabled features ****** ... 17825:20120517:203437.610 server #9 started trapper #1 ... See attached "r27613_ltrace_17825_wrong_decoded_value.out" You can see wrong 3rd decoded value "DBforBIX Version 0.6" for the key "DBforBIX.MySQL.oratest" the save value as for 2nd value with key "DBforBIX.Version" Once I was able to get this wrong value "DBforBIX Version 0.6" after sending manually from script the 52K-size file after some pause, i.e. the case where three values comes in the row is not 100% relevant. So, there is stil some problem with memory managing. I used the same Debian 6.0.4 x86_64 REOPENED <Andris> RESOLVED in r27850
Hide
Andris Mednis added a comment -

Thanks for noticing the problem! I focused only on repairing and testing str_base64_decode() to solve "hang on futex or crash", but missed wrong decoding.

Show
Andris Mednis added a comment - Thanks for noticing the problem! I focused only on repairing and testing str_base64_decode() to solve "hang on futex or crash", but missed wrong decoding.
Hide
Andris Mednis added a comment -

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4980

Show
Andris Mednis added a comment - Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4980
Hide
Alexander Vladishev added a comment - - edited

(1) call of strstr() function is very expensive. It should be rewritten.
src/libs/zbxcomms/comms.c:1034
src/libs/zbxcomms/comms.c:1079

<Andris> fixed in r27983. strstr() is still used, but now on very short substrings.

<Sasha> CLOSED

Show
Alexander Vladishev added a comment - - edited (1) call of strstr() function is very expensive. It should be rewritten. src/libs/zbxcomms/comms.c:1034 src/libs/zbxcomms/comms.c:1079 <Andris> fixed in r27983. strstr() is still used, but now on very short substrings. <Sasha> CLOSED
Hide
Alexander Vladishev added a comment - - edited

(2) we should verify XML opening "<reg>" tag
src/libs/zbxcomms/comms.c:1031
src/libs/zbxcomms/comms.c:1076

<Andris> fixed in r27983.

<Sasha> CLOSED

Show
Alexander Vladishev added a comment - - edited (2) we should verify XML opening "<reg>" tag src/libs/zbxcomms/comms.c:1031 src/libs/zbxcomms/comms.c:1076 <Andris> fixed in r27983. <Sasha> CLOSED
Hide
Alexander Vladishev added a comment - - edited

(3) please review my changes in r27967

<Andris> reviewed and included in r27983.

<Sasha> CLOSED

Show
Alexander Vladishev added a comment - - edited (3) please review my changes in r27967 <Andris> reviewed and included in r27983. <Sasha> CLOSED
Hide
Oleksiy Zagorskyi added a comment -

We should not remove values from Fix Version/s field if they ready included in some release (or branch). I mean 2.0.0rc5
1.8.14rc1 and 2.0.0rc5 has been re-added again.
Before closing 1.8.14rc1 can be replaced with 1.8.14rcX, depends when it will be merged.
Appropriate 2.0.x and 2.1.x also should be added before closing

Show
Oleksiy Zagorskyi added a comment - We should not remove values from Fix Version/s field if they ready included in some release (or branch). I mean 2.0.0rc5 1.8.14rc1 and 2.0.0rc5 has been re-added again. Before closing 1.8.14rc1 can be replaced with 1.8.14rcX, depends when it will be merged. Appropriate 2.0.x and 2.1.x also should be added before closing
Hide
Alexander Vladishev added a comment -

Great! Successfully tested.

Show
Alexander Vladishev added a comment - Great! Successfully tested.
Hide
Andris Mednis added a comment -

Fixed in versions pre-1.8.14 rev. 28037, pre-2.0.1 rev. 28031 and pre-2.1.0 rev. 28032.

Show
Andris Mednis added a comment - Fixed in versions pre-1.8.14 rev. 28037, pre-2.0.1 rev. 28031 and pre-2.1.0 rev. 28032.
Hide
Oleksiy Zagorskyi added a comment -

Just small final note: one part of this fix (for crash/futex) has been included a bit early - in the 2.0.0 release.

Show
Oleksiy Zagorskyi added a comment - Just small final note: one part of this fix (for crash/futex) has been included a bit early - in the 2.0.0 release.
Hide
Oleksiy Zagorskyi added a comment -

I've checked it once more with DebugLevel=4.
A bit clarified version on my last comment: zabbix server inserts in to DB just 64KB of data (counted from a SQL statement in zabbix_server.log debug) disregarding that original sent value (later decoded to base64) was 104 KB.

Show
Oleksiy Zagorskyi added a comment - I've checked it once more with DebugLevel=4. A bit clarified version on my last comment: zabbix server inserts in to DB just 64KB of data (counted from a SQL statement in zabbix_server.log debug) disregarding that original sent value (later decoded to base64) was 104 KB.

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: