ZABBIX BUGS AND ISSUES
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-4980

Trappers can hang (futex) or crash on decoding big base64 encoded values

    Details

      Description

      This case observed because of Orabbix.

      Orabbix's keys: "audit" (I'm 100% sure), "locks" (almost 100% sure) are potential killers.

      As we know Orabbix send data in the old ZBX protocol where XML format used and all values (hostname, key, value) encoded in base64.

      When value is very big (I always consider clear text values, i.e. not encoded to base64) - there two different behavior observed, both are critical.

      1) When trappers convert big base64 encoded values it can returns wrong decoded value. In my test lab I can reproduce it when real DBforBix (Orabbix) sends two small values and one "24K" in the row.
      You can see it in the attached file "185_ltrace_1901_24K_wrong_decoded_value.out". The value "DBforBIX Version 0.6" for the key "DBforBIX.MySQL.oratest" comes from previous decoded key "DBforBIX.Version"
      In the same time, more big value "48K" decoded and returned correctly - see another attached "185_ltrace_1901_48K_sucessfull+correct_value.out"
      And very short excerpt of this comparison you can see in attached "incorrect_encoded_value.txt".

      In another production environment (where is constant flow of different values from real Orabbix instance) I was able to reproduce this case with even smaller values, as I recall 12-30K

      2) trapper hangs (for x86_64) or dies (for i686) in function "str_base64_decode"

      For i686 see attached "zabbix_server_trappers_crash_centos_i686.log"
      On this 32bit CentOS 6.2 zabbix_server dies with 48K, and works ok with 36K.

      To help find similar cases over the Jira here is small log's excerpt:

          • stack smashing detected ***: /usr/sbin/zabbix_server terminated
            ======= Backtrace: =========
            /lib/libc.so.6(__fortify_fail+0x4d)[0xb8e59d]
            /lib/libc.so.6(+0xf754a)[0xb8e54a]
            /usr/sbin/zabbix_server[0x80dd0f4]
            /usr/sbin/zabbix_server(str_base64_decode+0x51a)[0x80acdba]
            /usr/sbin/zabbix_server(comms_parse_response+0x25b)[0x80a6b5b]
            /usr/sbin/zabbix_server[0x806e69e]
            /usr/sbin/zabbix_server(main_trapper_loop+0x12f)[0x806f41f]
            /usr/sbin/zabbix_server(MAIN_ZABBIX_ENTRY+0x816)[0x8059416]
            /usr/sbin/zabbix_server(daemon_start+0x2af)[0x809dcbf]
            /usr/sbin/zabbix_server(main+0x2d0)[0x8059c40]
            /lib/libc.so.6(__libc_start_main+0xe6)[0xaadce6]
            /usr/sbin/zabbix_server[0x8053de1]
            ...
            14314:20120503:115247.714 Zabbix Server stopped. Zabbix 2.0.0rc4 (revision 27142).

      More detailed investigated "x86_64" platform on Debian 6.0.4, zabbix_server 2.0.0rc3 (latest revisions)
      If sent values is 52K and more - trapper will get "futex", sent value 48K - server was able to decode it.
      In production environment I saw 90-120K sent values.

      See attached "200rc3_ltrace_681_futex.out" and just in case small Strace "200rc3_strace_342_futex.out"
      DBforBix (Orabbix) sends two small values and one "52K" in the row.

      It's not possible to see anything in the zabbix log because the trapper hang.
      The same will happen with all available trappers during some period of time. Frontend will show that zabbix server is not running.

      In the attached archive "script_to_reproduce" you can find a ready script and dummy files different size to easily reproduce this case, you even don'y need to create any hosts/items to reproduce the crash/futex.

      1. 185_ltrace_1901_24K_wrong_decoded_value.out
        40 kB
        Oleksiy Zagorskyi
      2. 185_ltrace_1901_48K_sucessfull+correct_value.out
        48 kB
        Oleksiy Zagorskyi
      3. 200rc3_ltrace_681_futex.out
        40 kB
        Oleksiy Zagorskyi
      4. 200rc3_strace_342_futex.out
        5 kB
        Oleksiy Zagorskyi
      5. incorrect_encoded_value.txt
        4 kB
        Oleksiy Zagorskyi
      6. r27613_ltrace_17825_wrong_decoded_value.out
        39 kB
        Oleksiy Zagorskyi
      7. script_to_reproduce.tar.gz
        2 kB
        Oleksiy Zagorskyi
      8. zabbix_server_trappers_crash_centos_i686.log
        22 kB
        Oleksiy Zagorskyi
      1. trappers_hang.png
        65 kB

        Issue Links

          Activity

          Hide
          j c hegedus added a comment -
          Show
          j c hegedus added a comment - I also reported that at: https://support.zabbix.com/browse/ZBX-4478
          Hide
          Oleksiy Zagorskyi added a comment -

          Yes, many thanks! it's identical case, I'll link it here.
          I don't know why I don't remember about it

          Show
          Oleksiy Zagorskyi added a comment - Yes, many thanks! it's identical case, I'll link it here. I don't know why I don't remember about it
          Hide
          Oleksiy Zagorskyi added a comment -

          ZBX-4444 could be similar, find there my comment.

          Show
          Oleksiy Zagorskyi added a comment - ZBX-4444 could be similar, find there my comment.
          Hide
          Andris Mednis added a comment -

          For stable version fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4980.

          Show
          Andris Mednis added a comment - For stable version fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4980.
          Hide
          Alexander Vladishev added a comment -

          Great work! Tested. Please review my changes in r27557:27560

          Show
          Alexander Vladishev added a comment - Great work! Tested. Please review my changes in r27557:27560
          Hide
          Andris Mednis added a comment -

          Fixed in versions pre-1.8.14 rev. 27565 and pre-2.0.0 rev. 27567.

          Show
          Andris Mednis added a comment - Fixed in versions pre-1.8.14 rev. 27565 and pre-2.0.0 rev. 27567.
          Hide
          Oleksiy Zagorskyi added a comment - - edited

          Tested latest revision.
          I cannot reproduce case with futex, thanks.

          But with the scenario I described above (three values in the row) I have the same as I've described in the summary.

          1. zabbix_server -V
            Zabbix server v2.0.0rc4 (revision 27613) (12 May 2012)
            Compilation time: May 17 2012 19:29:41

          17809:20120517:203437.569 Starting Zabbix Server. Zabbix 2.0.0rc4 (revision 27613).
          17809:20120517:203437.569 ****** Enabled features ******
          ...
          17825:20120517:203437.610 server #9 started trapper #1
          ...

          See attached "r27613_ltrace_17825_wrong_decoded_value.out"

          You can see wrong 3rd decoded value "DBforBIX Version 0.6" for the key "DBforBIX.MySQL.oratest"
          the save value as for 2nd value with key "DBforBIX.Version"

          Once I was able to get this wrong value "DBforBIX Version 0.6" after sending manually from script the 52K-size file after some pause, i.e. the case where three values comes in the row is not 100% relevant.

          So, there is stil some problem with memory managing.
          I used the same Debian 6.0.4 x86_64

          REOPENED

          <Andris> RESOLVED in r27850

          Show
          Oleksiy Zagorskyi added a comment - - edited Tested latest revision. I cannot reproduce case with futex, thanks. But with the scenario I described above (three values in the row) I have the same as I've described in the summary. zabbix_server -V Zabbix server v2.0.0rc4 (revision 27613) (12 May 2012) Compilation time: May 17 2012 19:29:41 17809:20120517:203437.569 Starting Zabbix Server. Zabbix 2.0.0rc4 (revision 27613). 17809:20120517:203437.569 ****** Enabled features ****** ... 17825:20120517:203437.610 server #9 started trapper #1 ... See attached "r27613_ltrace_17825_wrong_decoded_value.out" You can see wrong 3rd decoded value "DBforBIX Version 0.6" for the key "DBforBIX.MySQL.oratest" the save value as for 2nd value with key "DBforBIX.Version" Once I was able to get this wrong value "DBforBIX Version 0.6" after sending manually from script the 52K-size file after some pause, i.e. the case where three values comes in the row is not 100% relevant. So, there is stil some problem with memory managing. I used the same Debian 6.0.4 x86_64 REOPENED <Andris> RESOLVED in r27850
          Hide
          Andris Mednis added a comment -

          Thanks for noticing the problem! I focused only on repairing and testing str_base64_decode() to solve "hang on futex or crash", but missed wrong decoding.

          Show
          Andris Mednis added a comment - Thanks for noticing the problem! I focused only on repairing and testing str_base64_decode() to solve "hang on futex or crash", but missed wrong decoding.
          Hide
          Andris Mednis added a comment -

          Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4980

          Show
          Andris Mednis added a comment - Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4980
          Hide
          Alexander Vladishev added a comment - - edited

          (1) call of strstr() function is very expensive. It should be rewritten.
          src/libs/zbxcomms/comms.c:1034
          src/libs/zbxcomms/comms.c:1079

          <Andris> fixed in r27983. strstr() is still used, but now on very short substrings.

          <Sasha> CLOSED

          Show
          Alexander Vladishev added a comment - - edited (1) call of strstr() function is very expensive. It should be rewritten. src/libs/zbxcomms/comms.c:1034 src/libs/zbxcomms/comms.c:1079 <Andris> fixed in r27983. strstr() is still used, but now on very short substrings. <Sasha> CLOSED
          Hide
          Alexander Vladishev added a comment - - edited

          (2) we should verify XML opening "<reg>" tag
          src/libs/zbxcomms/comms.c:1031
          src/libs/zbxcomms/comms.c:1076

          <Andris> fixed in r27983.

          <Sasha> CLOSED

          Show
          Alexander Vladishev added a comment - - edited (2) we should verify XML opening "<reg>" tag src/libs/zbxcomms/comms.c:1031 src/libs/zbxcomms/comms.c:1076 <Andris> fixed in r27983. <Sasha> CLOSED
          Hide
          Alexander Vladishev added a comment - - edited

          (3) please review my changes in r27967

          <Andris> reviewed and included in r27983.

          <Sasha> CLOSED

          Show
          Alexander Vladishev added a comment - - edited (3) please review my changes in r27967 <Andris> reviewed and included in r27983. <Sasha> CLOSED
          Hide
          Oleksiy Zagorskyi added a comment -

          We should not remove values from Fix Version/s field if they ready included in some release (or branch). I mean 2.0.0rc5
          1.8.14rc1 and 2.0.0rc5 has been re-added again.
          Before closing 1.8.14rc1 can be replaced with 1.8.14rcX, depends when it will be merged.
          Appropriate 2.0.x and 2.1.x also should be added before closing

          Show
          Oleksiy Zagorskyi added a comment - We should not remove values from Fix Version/s field if they ready included in some release (or branch). I mean 2.0.0rc5 1.8.14rc1 and 2.0.0rc5 has been re-added again. Before closing 1.8.14rc1 can be replaced with 1.8.14rcX, depends when it will be merged. Appropriate 2.0.x and 2.1.x also should be added before closing
          Hide
          Alexander Vladishev added a comment -

          Great! Successfully tested.

          Show
          Alexander Vladishev added a comment - Great! Successfully tested.
          Hide
          Andris Mednis added a comment -

          Fixed in versions pre-1.8.14 rev. 28037, pre-2.0.1 rev. 28031 and pre-2.1.0 rev. 28032.

          Show
          Andris Mednis added a comment - Fixed in versions pre-1.8.14 rev. 28037, pre-2.0.1 rev. 28031 and pre-2.1.0 rev. 28032.
          Hide
          Oleksiy Zagorskyi added a comment -

          Just small final note: one part of this fix (for crash/futex) has been included a bit early - in the 2.0.0 release.

          Show
          Oleksiy Zagorskyi added a comment - Just small final note: one part of this fix (for crash/futex) has been included a bit early - in the 2.0.0 release.
          Hide
          Oleksiy Zagorskyi added a comment -

          I've checked it once more with DebugLevel=4.
          A bit clarified version on my last comment: zabbix server inserts in to DB just 64KB of data (counted from a SQL statement in zabbix_server.log debug) disregarding that original sent value (later decoded to base64) was 104 KB.

          Show
          Oleksiy Zagorskyi added a comment - I've checked it once more with DebugLevel=4. A bit clarified version on my last comment: zabbix server inserts in to DB just 64KB of data (counted from a SQL statement in zabbix_server.log debug) disregarding that original sent value (later decoded to base64) was 104 KB.

            People

            • Assignee:
              Unassigned
              Reporter:
              Oleksiy Zagorskyi
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: