Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-4980

Trappers can hang (futex) or crash on decoding big base64 encoded values

    XMLWordPrintable

    Details

      Description

      This case observed because of Orabbix.

      Orabbix's keys: "audit" (I'm 100% sure), "locks" (almost 100% sure) are potential killers.

      As we know Orabbix send data in the old ZBX protocol where XML format used and all values (hostname, key, value) encoded in base64.

      When value is very big (I always consider clear text values, i.e. not encoded to base64) - there two different behavior observed, both are critical.

      1) When trappers convert big base64 encoded values it can returns wrong decoded value. In my test lab I can reproduce it when real DBforBix (Orabbix) sends two small values and one "24K" in the row.
      You can see it in the attached file "185_ltrace_1901_24K_wrong_decoded_value.out". The value "DBforBIX Version 0.6" for the key "DBforBIX.MySQL.oratest" comes from previous decoded key "DBforBIX.Version"
      In the same time, more big value "48K" decoded and returned correctly - see another attached "185_ltrace_1901_48K_sucessfull+correct_value.out"
      And very short excerpt of this comparison you can see in attached "incorrect_encoded_value.txt".

      In another production environment (where is constant flow of different values from real Orabbix instance) I was able to reproduce this case with even smaller values, as I recall 12-30K

      2) trapper hangs (for x86_64) or dies (for i686) in function "str_base64_decode"

      For i686 see attached "zabbix_server_trappers_crash_centos_i686.log"
      On this 32bit CentOS 6.2 zabbix_server dies with 48K, and works ok with 36K.

      To help find similar cases over the Jira here is small log's excerpt:

          • stack smashing detected ***: /usr/sbin/zabbix_server terminated
            ======= Backtrace: =========
            /lib/libc.so.6(__fortify_fail+0x4d)[0xb8e59d]
            /lib/libc.so.6(+0xf754a)[0xb8e54a]
            /usr/sbin/zabbix_server[0x80dd0f4]
            /usr/sbin/zabbix_server(str_base64_decode+0x51a)[0x80acdba]
            /usr/sbin/zabbix_server(comms_parse_response+0x25b)[0x80a6b5b]
            /usr/sbin/zabbix_server[0x806e69e]
            /usr/sbin/zabbix_server(main_trapper_loop+0x12f)[0x806f41f]
            /usr/sbin/zabbix_server(MAIN_ZABBIX_ENTRY+0x816)[0x8059416]
            /usr/sbin/zabbix_server(daemon_start+0x2af)[0x809dcbf]
            /usr/sbin/zabbix_server(main+0x2d0)[0x8059c40]
            /lib/libc.so.6(__libc_start_main+0xe6)[0xaadce6]
            /usr/sbin/zabbix_server[0x8053de1]
            ...
            14314:20120503:115247.714 Zabbix Server stopped. Zabbix 2.0.0rc4 (revision 27142).

      More detailed investigated "x86_64" platform on Debian 6.0.4, zabbix_server 2.0.0rc3 (latest revisions)
      If sent values is 52K and more - trapper will get "futex", sent value 48K - server was able to decode it.
      In production environment I saw 90-120K sent values.

      See attached "200rc3_ltrace_681_futex.out" and just in case small Strace "200rc3_strace_342_futex.out"
      DBforBix (Orabbix) sends two small values and one "52K" in the row.

      It's not possible to see anything in the zabbix log because the trapper hang.
      The same will happen with all available trappers during some period of time. Frontend will show that zabbix server is not running.

      In the attached archive "script_to_reproduce" you can find a ready script and dummy files different size to easily reproduce this case, you even don'y need to create any hosts/items to reproduce the crash/futex.

        Attachments

        1. 185_ltrace_1901_24K_wrong_decoded_value.out
          40 kB
          Oleksii Zagorskyi
        2. 185_ltrace_1901_48K_sucessfull+correct_value.out
          48 kB
          Oleksii Zagorskyi
        3. 200rc3_ltrace_681_futex.out
          40 kB
          Oleksii Zagorskyi
        4. 200rc3_strace_342_futex.out
          5 kB
          Oleksii Zagorskyi
        5. incorrect_encoded_value.txt
          4 kB
          Oleksii Zagorskyi
        6. r27613_ltrace_17825_wrong_decoded_value.out
          39 kB
          Oleksii Zagorskyi
        7. script_to_reproduce.tar.gz
          2 kB
          Oleksii Zagorskyi
        8. trappers_hang.png
          65 kB
          Oleksii Zagorskyi
        9. zabbix_server_trappers_crash_centos_i686.log
          22 kB
          Oleksii Zagorskyi

          Issue Links

            Activity

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              zalex_ua Oleksii Zagorskyi
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: