[ZBX-4980] Trappers can hang (futex) or crash on decoding big base64 encoded values Created: 2012 May 10 Updated: 2017 May 30 Resolved: 2012 Jun 04 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Proxy (P), Server (S) |
Affects Version/s: | None |
Fix Version/s: | 1.8.14rc1, 2.0.0rc5, 2.0.1rc1, 2.1.0 |
Type: | Incident report | Priority: | Blocker |
Reporter: | Oleksii Zagorskyi | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | crash, futex | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
i686, x86_64 |
Attachments: | 185_ltrace_1901_24K_wrong_decoded_value.out 185_ltrace_1901_48K_sucessfull+correct_value.out 200rc3_ltrace_681_futex.out 200rc3_strace_342_futex.out incorrect_encoded_value.txt r27613_ltrace_17825_wrong_decoded_value.out script_to_reproduce.tar.gz trappers_hang.png zabbix_server_trappers_crash_centos_i686.log | ||||||||||||
Issue Links: |
|
Description |
This case observed because of Orabbix. Orabbix's keys: "audit" (I'm 100% sure), "locks" (almost 100% sure) are potential killers. As we know Orabbix send data in the old ZBX protocol where XML format used and all values (hostname, key, value) encoded in base64. When value is very big (I always consider clear text values, i.e. not encoded to base64) - there two different behavior observed, both are critical. 1) When trappers convert big base64 encoded values it can returns wrong decoded value. In my test lab I can reproduce it when real DBforBix (Orabbix) sends two small values and one "24K" in the row. In another production environment (where is constant flow of different values from real Orabbix instance) I was able to reproduce this case with even smaller values, as I recall 12-30K 2) trapper hangs (for x86_64) or dies (for i686) in function "str_base64_decode" For i686 see attached "zabbix_server_trappers_crash_centos_i686.log" To help find similar cases over the Jira here is small log's excerpt:
More detailed investigated "x86_64" platform on Debian 6.0.4, zabbix_server 2.0.0rc3 (latest revisions) See attached "200rc3_ltrace_681_futex.out" and just in case small Strace "200rc3_strace_342_futex.out" It's not possible to see anything in the zabbix log because the trapper hang. In the attached archive "script_to_reproduce" you can find a ready script and dummy files different size to easily reproduce this case, you even don'y need to create any hosts/items to reproduce the crash/futex. |
Comments |
Comment by jchegedus [ 2012 May 10 ] |
I also reported that at: https://support.zabbix.com/browse/ZBX-4478 |
Comment by Oleksii Zagorskyi [ 2012 May 10 ] |
Yes, many thanks! it's identical case, I'll link it here. |
Comment by Oleksii Zagorskyi [ 2012 May 10 ] |
|
Comment by Andris Mednis [ 2012 May 16 ] |
For stable version fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4980. |
Comment by Alexander Vladishev [ 2012 May 16 ] |
Great work! Tested. Please review my changes in r27557:27560 |
Comment by Andris Mednis [ 2012 May 16 ] |
Fixed in versions pre-1.8.14 rev. 27565 and pre-2.0.0 rev. 27567. |
Comment by Oleksii Zagorskyi [ 2012 May 17 ] |
Tested latest revision. But with the scenario I described above (three values in the row) I have the same as I've described in the summary.
17809:20120517:203437.569 Starting Zabbix Server. Zabbix 2.0.0rc4 (revision 27613). See attached "r27613_ltrace_17825_wrong_decoded_value.out" You can see wrong 3rd decoded value "DBforBIX Version 0.6" for the key "DBforBIX.MySQL.oratest" Once I was able to get this wrong value "DBforBIX Version 0.6" after sending manually from script the 52K-size file after some pause, i.e. the case where three values comes in the row is not 100% relevant. So, there is stil some problem with memory managing. REOPENED <Andris> RESOLVED in r27850 |
Comment by Andris Mednis [ 2012 May 18 ] |
Thanks for noticing the problem! I focused only on repairing and testing str_base64_decode() to solve "hang on futex or crash", but missed wrong decoding. |
Comment by Andris Mednis [ 2012 May 28 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4980 |
Comment by Alexander Vladishev [ 2012 May 31 ] |
(1) call of strstr() function is very expensive. It should be rewritten. <Andris> fixed in r27983. strstr() is still used, but now on very short substrings. <Sasha> CLOSED |
Comment by Alexander Vladishev [ 2012 May 31 ] |
(2) we should verify XML opening "<reg>" tag <Andris> fixed in r27983. <Sasha> CLOSED |
Comment by Alexander Vladishev [ 2012 May 31 ] |
(3) please review my changes in r27967 <Andris> reviewed and included in r27983. <Sasha> CLOSED |
Comment by Oleksii Zagorskyi [ 2012 May 31 ] |
We should not remove values from Fix Version/s field if they ready included in some release (or branch). I mean 2.0.0rc5 |
Comment by Alexander Vladishev [ 2012 Jun 01 ] |
Great! Successfully tested. |
Comment by Andris Mednis [ 2012 Jun 04 ] |
Fixed in versions pre-1.8.14 rev. 28037, pre-2.0.1 rev. 28031 and pre-2.1.0 rev. 28032. |
Comment by Oleksii Zagorskyi [ 2012 Jun 04 ] |
Just small final note: one part of this fix (for crash/futex) has been included a bit early - in the 2.0.0 release. |
Comment by Oleksii Zagorskyi [ 2013 May 10 ] |
I've checked it once more with DebugLevel=4. |