ZABBIX BUGS AND ISSUES
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-10692

Zabbix server crash in escalator in DCexpression_expand_user_macros()

    Details

    • Type: Incident report Incident report
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.0.2
    • Fix Version/s: 3.0.3rc1, 3.2.0alpha1
    • Component/s: Server (S)
    • Environment:
      Updated to 3.0.2 today. The server crashed within 5 hours of running

      Description

      The server was running fine for about 5 hours after updating to v3.0.2.
      I was adding hosts/configuring the server, when all of a sudden the server crashed.

      1. objdump.dump.tgz
        4.35 MB
        Ilya Kruchinin
      2. zabbix segfault.txt
        37 kB
        Ilya Kruchinin
      3. ZBX-10692.patch
        1 kB
        Glebs Ivanovskis

        Activity

        Hide
        Glebs Ivanovskis added a comment - - edited

        Backtrace for easier search:

          6988:20160421:160051.920 === Backtrace: ===
          6988:20160421:160051.921 15: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](print_fatal_info+0xae) [0x47e06e]
          6988:20160421:160051.921 14: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations]() [0x47e332]
          6988:20160421:160051.921 13: /lib/x86_64-linux-gnu/libc.so.6(+0x350e0) [0x7f783d92c0e0]
          6988:20160421:160051.921 12: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations]() [0x4634f4]
          6988:20160421:160051.921 11: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](DCexpression_expand_user_macros+0x21) [0x46dda1]
          6988:20160421:160051.921 10: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](substitute_simple_macros+0xde0) [0x456a40]
          6988:20160421:160051.921 9: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](substitute_simple_macros+0x385e) [0x4594be]
          6988:20160421:160051.921 8: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations]() [0x4405d0]
          6988:20160421:160051.921 7: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations]() [0x440def]
          6988:20160421:160051.921 6: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](escalator_thread+0x157) [0x4410f7]
          6988:20160421:160051.921 5: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](zbx_thread_start+0x45) [0x47ed65]
          6988:20160421:160051.921 4: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](MAIN_ZABBIX_ENTRY+0x53c) [0x41d04c]
          6988:20160421:160051.921 3: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](daemon_start+0x1c3) [0x47d903]
          6988:20160421:160051.921 2: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](main+0x2f8) [0x417d78]
          6988:20160421:160051.921 1: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f783d918b45]
          6988:20160421:160051.921 0: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations]() [0x418043]
        

        Could you attach the result of the following command?

        objdump -DSswx zabbix_server
        
        Show
        Glebs Ivanovskis added a comment - - edited Backtrace for easier search: 6988:20160421:160051.920 === Backtrace: === 6988:20160421:160051.921 15: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](print_fatal_info+0xae) [0x47e06e] 6988:20160421:160051.921 14: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations]() [0x47e332] 6988:20160421:160051.921 13: /lib/x86_64-linux-gnu/libc.so.6(+0x350e0) [0x7f783d92c0e0] 6988:20160421:160051.921 12: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations]() [0x4634f4] 6988:20160421:160051.921 11: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](DCexpression_expand_user_macros+0x21) [0x46dda1] 6988:20160421:160051.921 10: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](substitute_simple_macros+0xde0) [0x456a40] 6988:20160421:160051.921 9: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](substitute_simple_macros+0x385e) [0x4594be] 6988:20160421:160051.921 8: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations]() [0x4405d0] 6988:20160421:160051.921 7: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations]() [0x440def] 6988:20160421:160051.921 6: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](escalator_thread+0x157) [0x4410f7] 6988:20160421:160051.921 5: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](zbx_thread_start+0x45) [0x47ed65] 6988:20160421:160051.921 4: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](MAIN_ZABBIX_ENTRY+0x53c) [0x41d04c] 6988:20160421:160051.921 3: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](daemon_start+0x1c3) [0x47d903] 6988:20160421:160051.921 2: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations](main+0x2f8) [0x417d78] 6988:20160421:160051.921 1: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f783d918b45] 6988:20160421:160051.921 0: /usr/sbin/zabbix_server: escalator #4 [processed 0 escalations in 0.001460 sec, processing escalations]() [0x418043] Could you attach the result of the following command? objdump -DSswx zabbix_server
        Hide
        Glebs Ivanovskis added a comment -

        It's difficult to say for sure without a binary, but most likely call stack looks like so: main escalator loop -> process_escalation() -> execute_escalations() -> process_recovery_msg() -> substitute_simple_macros() -> substitute_simple_macros(..., MACRO_TYPE_TRIGGER_DESCRIPTION, ...) -> DCexpression_expand_user_macro() -> dc_express_expand_user_macros() -> dc_expand_user_macros() -> strchr().

        In this scenario event (with SOURCE_TRIGGERS) has somehow event.trigger.expression == NULL. Actually, get_event_info() guarantees only that event trigger fields are initialized if event object is EVENT_OBJECT_TRIGGER. If event source is SOURCE_TRIGGERS but event object is not EVENT_OBJECT_TRIGGER we may eventually pass NULL to strchr().

        To check this please try to

        select * from events where source=0 and object<>0;
        
        Show
        Glebs Ivanovskis added a comment - It's difficult to say for sure without a binary, but most likely call stack looks like so: main escalator loop -> process_escalation() -> execute_escalations() -> process_recovery_msg() -> substitute_simple_macros() -> substitute_simple_macros(..., MACRO_TYPE_TRIGGER_DESCRIPTION, ...) -> DCexpression_expand_user_macro() -> dc_express_expand_user_macros() -> dc_expand_user_macros() -> strchr() . In this scenario event (with SOURCE_TRIGGERS ) has somehow event.trigger.expression == NULL . Actually, get_event_info() guarantees only that event trigger fields are initialized if event object is EVENT_OBJECT_TRIGGER . If event source is SOURCE_TRIGGERS but event object is not EVENT_OBJECT_TRIGGER we may eventually pass NULL to strchr() . To check this please try to select * from events where source=0 and object<>0;
        Hide
        Ilya Kruchinin added a comment - - edited

        The query you mentioned, here's the output (or lack of one):
        MariaDB [zabbix]> select * from events where source=0 and object<>0;
        Empty set (0.00 sec)

        Show
        Ilya Kruchinin added a comment - - edited The query you mentioned, here's the output (or lack of one): MariaDB [zabbix] > select * from events where source=0 and object<>0; Empty set (0.00 sec)
        Hide
        Ilya Kruchinin added a comment -

        The "zabbix_server" I'm using is the binary from the official repo

        Show
        Ilya Kruchinin added a comment - The "zabbix_server" I'm using is the binary from the official repo
        Hide
        Ilya Kruchinin added a comment - - edited

        Added "objdump -DSswx /usr/sbin/zabbix_server > /root/objdump.dump" output (file attached as a tar.gz archive)
        The zabbix_server binary is from the official repo, running on Debian 8 with Maria DB as the DB backend.

        Please let me know if you need any additional information.

        Show
        Ilya Kruchinin added a comment - - edited Added "objdump -DSswx /usr/sbin/zabbix_server > /root/objdump.dump" output (file attached as a tar.gz archive) The zabbix_server binary is from the official repo, running on Debian 8 with Maria DB as the DB backend. Please let me know if you need any additional information.
        Hide
        Glebs Ivanovskis added a comment -

        My initial guess was a bit off, but now I think I've localised the place where crash happens. Thank you for the provided information!

        Can you check what these commands give you?

        describe globalmacro;
        describe hostmacro;
        
        Show
        Glebs Ivanovskis added a comment - My initial guess was a bit off, but now I think I've localised the place where crash happens. Thank you for the provided information! Can you check what these commands give you? describe globalmacro; describe hostmacro;
        Hide
        Ilya Kruchinin added a comment - - edited
        MariaDB [zabbix]> describe globalmacro;
        +---------------+---------------------+------+-----+---------+-------+
        | Field         | Type                | Null | Key | Default | Extra |
        +---------------+---------------------+------+-----+---------+-------+
        | globalmacroid | bigint(20) unsigned | NO   | PRI | NULL    |       |
        | macro         | varchar(255)        | NO   | UNI |         |       |
        | value         | varchar(255)        | NO   |     |         |       |
        +---------------+---------------------+------+-----+---------+-------+
        3 rows in set (0.00 sec)
        
        MariaDB [zabbix]> describe hostmacro;
        +-------------+---------------------+------+-----+---------+-------+
        | Field       | Type                | Null | Key | Default | Extra |
        +-------------+---------------------+------+-----+---------+-------+
        | hostmacroid | bigint(20) unsigned | NO   | PRI | NULL    |       |
        | hostid      | bigint(20) unsigned | NO   | MUL | NULL    |       |
        | macro       | varchar(255)        | NO   |     |         |       |
        | value       | varchar(255)        | NO   |     |         |       |
        +-------------+---------------------+------+-----+---------+-------+
        4 rows in set (0.00 sec)
        
        Show
        Ilya Kruchinin added a comment - - edited MariaDB [zabbix]> describe globalmacro; +---------------+---------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------------+---------------------+------+-----+---------+-------+ | globalmacroid | bigint(20) unsigned | NO | PRI | NULL | | | macro | varchar(255) | NO | UNI | | | | value | varchar(255) | NO | | | | +---------------+---------------------+------+-----+---------+-------+ 3 rows in set (0.00 sec) MariaDB [zabbix]> describe hostmacro; +-------------+---------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------+---------------------+------+-----+---------+-------+ | hostmacroid | bigint(20) unsigned | NO | PRI | NULL | | | hostid | bigint(20) unsigned | NO | MUL | NULL | | | macro | varchar(255) | NO | | | | | value | varchar(255) | NO | | | | +-------------+---------------------+------+-----+---------+-------+ 4 rows in set (0.00 sec)
        Hide
        Glebs Ivanovskis added a comment -

        Thank you for response! So far everything looks fine.

        Can you see any suspicious entries in these tables? Like with empty macro field? Can you recall adding/deleting user/host macros and/or their context before the crash happened?

        Show
        Glebs Ivanovskis added a comment - Thank you for response! So far everything looks fine. Can you see any suspicious entries in these tables? Like with empty macro field? Can you recall adding/deleting user/host macros and/or their context before the crash happened?
        Hide
        Ilya Kruchinin added a comment - - edited

        Yes, I was changing the templates/hosts USERMACROs at about the same time when the crash happened.

        There are currently no MACROs with an empty MACRO field (but the server hasn't crashed since then, all looks good):

        MariaDB [zabbix]> select * from hostmacro where macro ='';
        Empty set (0.00 sec)
        

        There are 3 MACROS with empty VALUE fields (which, I guess, may not be a problem, just an empty value, and this is intended (undefined value in the template) from my part):

        MariaDB [zabbix]> select * from hostmacro where value ='';
        +-------------+--------+-----------------+-------+
        | hostmacroid | hostid | macro           | value |
        +-------------+--------+-----------------+-------+
        |         187 |  10196 | {$IPMIIP}       |       |
        |         472 |  10355 | {$PLESKDNSNAME} |       |
        |         474 |  10357 | {$PLESKDNSNAME} |       |
        +-------------+--------+-----------------+-------+
        

        There hasn't been an issue since the last crash - all working fine.

        Show
        Ilya Kruchinin added a comment - - edited Yes, I was changing the templates/hosts USERMACROs at about the same time when the crash happened. There are currently no MACROs with an empty MACRO field (but the server hasn't crashed since then, all looks good): MariaDB [zabbix]> select * from hostmacro where macro =''; Empty set (0.00 sec) There are 3 MACROS with empty VALUE fields (which, I guess, may not be a problem, just an empty value, and this is intended (undefined value in the template) from my part): MariaDB [zabbix]> select * from hostmacro where value =''; +-------------+--------+-----------------+-------+ | hostmacroid | hostid | macro | value | +-------------+--------+-----------------+-------+ | 187 | 10196 | {$IPMIIP} | | | 472 | 10355 | {$PLESKDNSNAME} | | | 474 | 10357 | {$PLESKDNSNAME} | | +-------------+--------+-----------------+-------+ There hasn't been an issue since the last crash - all working fine.
        Hide
        Glebs Ivanovskis added a comment -

        Empty value shouldn't be an issue. Although I don't have a clear idea where and how Zabbix crashed in your case, there are couple of places where Zabbix can be reinforced. Thank you for cooperation and let me know if the crash happens again and/or new details become available.

        Show
        Glebs Ivanovskis added a comment - Empty value shouldn't be an issue. Although I don't have a clear idea where and how Zabbix crashed in your case, there are couple of places where Zabbix can be reinforced. Thank you for cooperation and let me know if the crash happens again and/or new details become available.
        Hide
        Oleksiy Zagorskyi added a comment - - edited

        Got the same constant crash on escalator on 3.0.2, using postgres.

         20822:20160504:213801.265 temporarily disabling Zabbix agent checks on host "AWS-L": host unavailable
         20822:20160504:213801.359 temporarily disabling Zabbix agent checks on host "AWS-T": host unavailable
         20822:20160504:213801.600 temporarily disabling Zabbix agent checks on host "Test-host": host unavailable
         20853:20160504:213803.309 escalation cancelled: trigger id:14839 deleted.
         20853:20160504:213803.325 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ...
        ....
         20853:20160504:213803.325 === Backtrace: ===
         20853:20160504:213803.348 15: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](print_fatal_info+0x9e) [0x46b58e]
         20853:20160504:213803.348 14: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations]() [0x46b847]
         20853:20160504:213803.348 13: /lib64/libc.so.6(+0x35640) [0x7f1c27af6640]
         20853:20160504:213803.348 12: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations]() [0x450fa4]
         20853:20160504:213803.348 11: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](DCexpression_expand_user_macros+0x21) [0x45b751]
         20853:20160504:213803.348 10: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](substitute_simple_macros+0xd50) [0x4444e0]
         20853:20160504:213803.348 9: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](substitute_simple_macros+0x42a3) [0x447a33]
         20853:20160504:213803.348 8: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations]() [0x433432]
         20853:20160504:213803.348 7: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations]() [0x4341cf]
         20853:20160504:213803.348 6: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](escalator_thread+0x143) [0x434463]
         20853:20160504:213803.348 5: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](zbx_thread_start+0x45) [0x46c165]
         20853:20160504:213803.348 4: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](MAIN_ZABBIX_ENTRY+0x52b) [0x41919b]
         20853:20160504:213803.348 3: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](daemon_start+0x1ad) [0x46aecd]
         20853:20160504:213803.348 2: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](main+0x336) [0x414246]
         20853:20160504:213803.348 1: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f1c27ae2af5]
         20853:20160504:213803.348 0: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations]() [0x4144b1]
        

        I have those 3 tables dump.

        To be able run zabbix server again I've cleaned out escalations table.
        I feel that it could be related to context macro, which I'm starting to use.

        Show
        Oleksiy Zagorskyi added a comment - - edited Got the same constant crash on escalator on 3.0.2, using postgres. 20822:20160504:213801.265 temporarily disabling Zabbix agent checks on host "AWS-L": host unavailable 20822:20160504:213801.359 temporarily disabling Zabbix agent checks on host "AWS-T": host unavailable 20822:20160504:213801.600 temporarily disabling Zabbix agent checks on host "Test-host": host unavailable 20853:20160504:213803.309 escalation cancelled: trigger id:14839 deleted. 20853:20160504:213803.325 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ... .... 20853:20160504:213803.325 === Backtrace: === 20853:20160504:213803.348 15: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](print_fatal_info+0x9e) [0x46b58e] 20853:20160504:213803.348 14: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations]() [0x46b847] 20853:20160504:213803.348 13: /lib64/libc.so.6(+0x35640) [0x7f1c27af6640] 20853:20160504:213803.348 12: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations]() [0x450fa4] 20853:20160504:213803.348 11: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](DCexpression_expand_user_macros+0x21) [0x45b751] 20853:20160504:213803.348 10: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](substitute_simple_macros+0xd50) [0x4444e0] 20853:20160504:213803.348 9: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](substitute_simple_macros+0x42a3) [0x447a33] 20853:20160504:213803.348 8: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations]() [0x433432] 20853:20160504:213803.348 7: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations]() [0x4341cf] 20853:20160504:213803.348 6: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](escalator_thread+0x143) [0x434463] 20853:20160504:213803.348 5: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](zbx_thread_start+0x45) [0x46c165] 20853:20160504:213803.348 4: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](MAIN_ZABBIX_ENTRY+0x52b) [0x41919b] 20853:20160504:213803.348 3: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](daemon_start+0x1ad) [0x46aecd] 20853:20160504:213803.348 2: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations](main+0x336) [0x414246] 20853:20160504:213803.348 1: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f1c27ae2af5] 20853:20160504:213803.348 0: /zabbix/sbin/zabbix_server: escalator #1 [processed 0 escalations in 0.705582 sec, processing escalations]() [0x4144b1] I have those 3 tables dump. To be able run zabbix server again I've cleaned out escalations table. I feel that it could be related to context macro, which I'm starting to use.
        Hide
        Oleksiy Zagorskyi added a comment - - edited

        Wow, I can reproduce it! CONFIRMED
        More details coming tomorrow morning.

        Show
        Oleksiy Zagorskyi added a comment - - edited Wow, I can reproduce it! CONFIRMED More details coming tomorrow morning.
        Hide
        Glebs Ivanovskis added a comment -

        Can't wait!

        Show
        Glebs Ivanovskis added a comment - Can't wait!
        Hide
        Oleksiy Zagorskyi added a comment - - edited

        Well, while I've found a few issues and first of all I've faced with complex conditions how to reproduce the crash, now I've simplified the scenario:

        • have an action for internal events for triggers with a macro {TRIGGER.NAME} (will be enough), for example you can just enable existing action "Report unknown triggers".
        • have a trigger with expression {host:test.item.last(0)}>{$FSPFREE} and the trigger is in OK/PROBLEM state, i.e. it's not unknown (state =! 1)
        • make sure that the user macro from expression does NOT exist in your configuration
        • send a valid value to the item

        When trigger will be recalculated - it will become unknown (state=1) with error message "Cannot evaluate expression: macro '{$FSPFREE}' is not found."
        In the same time internal event and escalation will be generated.
        After a few seconds escalator will process the record and will crash:

          8122:20160506:093338.346 In process_escalations()
          8122:20160506:093338.346 query [txnlev:0] [select escalationid,actionid,triggerid,eventid,r_eventid,nextcheck,esc_step,status,itemid from escalations where triggerid is not null order by actionid,triggerid,itemid,escalationid]
          8122:20160506:093338.346 query [txnlev:1] [begin;]
          8122:20160506:093338.346 In check_escalation() escalationid:1 status:active
          8122:20160506:093338.346 query [txnlev:1] [select source,object from events where eventid=67]
          8122:20160506:093338.346 query [txnlev:1] [select actionid,name,status,eventsource,esc_period,def_shortdata,def_longdata,r_shortdata,r_longdata,recovery_msg from actions where actionid=8]
          8122:20160506:093338.347 End of check_escalation():SUCCEED error: '(null)'
          8122:20160506:093338.347 In execute_escalation() escalationid:1 status:active
          8122:20160506:093338.347 query [txnlev:1] [select eventid,source,object,objectid,clock,value,acknowledged,ns from events where eventid=67]
          8122:20160506:093338.347 query [txnlev:1] [select description,expression,priority,comments,url from triggers where triggerid=13561]
          8122:20160506:093338.347 In execute_operations()
          8122:20160506:093338.347 query [txnlev:1] [select o.operationid,o.operationtype,o.esc_period,o.evaltype,m.operationid,m.default_msg,subject,message,mediatypeid from operations o left join opmessage m on m.operationid=o.operationid where o.actionid=8 ando.operationtype in (0,1) and o.esc_step_from<=1 and (o.esc_step_to=0 or o.esc_step_to>=1)]
          8122:20160506:093338.347 In check_operation_conditions() operationid:8
          8122:20160506:093338.347 query [txnlev:1] [select conditiontype,operator,value from opconditions where operationid=8 order by conditiontype]
          8122:20160506:093338.347 End of check_operation_conditions():SUCCEED
          8122:20160506:093338.347 Conditions match our event. Execute operation.
          8122:20160506:093338.347 In add_object_msg()
          8122:20160506:093338.347 query [txnlev:1] [select userid from opmessage_usr where operationid=8 union select g.userid from opmessage_grp m,users_groups g where m.usrgrpid=g.usrgrpid and m.operationid=8]
          8122:20160506:093338.348 query [txnlev:1] [select count(*) from usrgrp g,users_groups ug where ug.userid=1 and g.usrgrpid=ug.usrgrpid and g.users_status=1]
          8122:20160506:093338.348 In get_trigger_permission()
          8122:20160506:093338.348 query [txnlev:1] [select distinct i.hostid from items i,functions f where i.itemid=f.itemid and f.triggerid=13561]
          8122:20160506:093338.348 In get_host_permission()
          8122:20160506:093338.348 query [txnlev:1] [select type from users where userid=1]
          8122:20160506:093338.349 End of get_host_permission():rw
          8122:20160506:093338.349 End of get_trigger_permission():rw
          8122:20160506:093338.349 In substitute_simple_macros() data:'{TRIGGER.NAME}'
          8122:20160506:093338.349 In substitute_simple_macros() data:'Some myname more than 500'
          8122:20160506:093338.349 In get_N_functionid() expression:'{13250}>{$FSPFREE}' N_functionid:1
          8122:20160506:093338.349 get_N_functionid() functionid:13250
          8122:20160506:093338.349 End of get_N_functionid():SUCCEED
          8122:20160506:093338.349 In get_N_functionid() expression:'>{$FSPFREE}' N_functionid:1
          8122:20160506:093338.349 End of get_N_functionid():FAIL
          8122:20160506:093338.349 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ...
        
          8122:20160506:093338.349 === Backtrace: ===
          8122:20160506:093338.350 15: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](print_fatal_info+0x9e) [0x47898e]
          8122:20160506:093338.350 14: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations]() [0x478c32]
          8122:20160506:093338.350 13: /lib/x86_64-linux-gnu/libc.so.6(+0x350e0) [0x7fe9ac6980e0]
          8122:20160506:093338.350 12: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations]() [0x45edf4]
          8122:20160506:093338.350 11: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](DCexpression_expand_user_macros+0x21) [0x468bf1]
          8122:20160506:093338.350 10: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](substitute_simple_macros+0xd68) [0x452ae8]
          8122:20160506:093338.350 9: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](substitute_simple_macros+0x3f23) [0x455ca3]
          8122:20160506:093338.350 8: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations]() [0x43d3f0]
          8122:20160506:093338.350 7: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations]() [0x43dbdf]
          8122:20160506:093338.350 6: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](escalator_thread+0x13f) [0x43decf]
          8122:20160506:093338.350 5: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](zbx_thread_start+0x45) [0x479515]
          8122:20160506:093338.350 4: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](MAIN_ZABBIX_ENTRY+0x598) [0x41b198]
          8122:20160506:093338.350 3: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](daemon_start+0x1b0) [0x4782d0]
          8122:20160506:093338.350 2: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](main+0x388) [0x4160d8]
          8122:20160506:093338.350 1: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fe9ac684b45]
          8122:20160506:093338.350 0: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations]() [0x416365]
        
          8094:20160506:093340.357 Zabbix Server stopped. Zabbix 3.0.3rc1 (revision 59873).
        
        Show
        Oleksiy Zagorskyi added a comment - - edited Well, while I've found a few issues and first of all I've faced with complex conditions how to reproduce the crash, now I've simplified the scenario: have an action for internal events for triggers with a macro {TRIGGER.NAME} (will be enough), for example you can just enable existing action "Report unknown triggers". have a trigger with expression {host:test.item.last(0)}>{$FSPFREE} and the trigger is in OK/PROBLEM state, i.e. it's not unknown (state =! 1) make sure that the user macro from expression does NOT exist in your configuration send a valid value to the item When trigger will be recalculated - it will become unknown (state=1) with error message "Cannot evaluate expression: macro '{$FSPFREE}' is not found." In the same time internal event and escalation will be generated. After a few seconds escalator will process the record and will crash: 8122:20160506:093338.346 In process_escalations() 8122:20160506:093338.346 query [txnlev:0] [select escalationid,actionid,triggerid,eventid,r_eventid,nextcheck,esc_step,status,itemid from escalations where triggerid is not null order by actionid,triggerid,itemid,escalationid] 8122:20160506:093338.346 query [txnlev:1] [begin;] 8122:20160506:093338.346 In check_escalation() escalationid:1 status:active 8122:20160506:093338.346 query [txnlev:1] [select source,object from events where eventid=67] 8122:20160506:093338.346 query [txnlev:1] [select actionid,name,status,eventsource,esc_period,def_shortdata,def_longdata,r_shortdata,r_longdata,recovery_msg from actions where actionid=8] 8122:20160506:093338.347 End of check_escalation():SUCCEED error: '(null)' 8122:20160506:093338.347 In execute_escalation() escalationid:1 status:active 8122:20160506:093338.347 query [txnlev:1] [select eventid,source,object,objectid,clock,value,acknowledged,ns from events where eventid=67] 8122:20160506:093338.347 query [txnlev:1] [select description,expression,priority,comments,url from triggers where triggerid=13561] 8122:20160506:093338.347 In execute_operations() 8122:20160506:093338.347 query [txnlev:1] [select o.operationid,o.operationtype,o.esc_period,o.evaltype,m.operationid,m.default_msg,subject,message,mediatypeid from operations o left join opmessage m on m.operationid=o.operationid where o.actionid=8 ando.operationtype in (0,1) and o.esc_step_from<=1 and (o.esc_step_to=0 or o.esc_step_to>=1)] 8122:20160506:093338.347 In check_operation_conditions() operationid:8 8122:20160506:093338.347 query [txnlev:1] [select conditiontype,operator,value from opconditions where operationid=8 order by conditiontype] 8122:20160506:093338.347 End of check_operation_conditions():SUCCEED 8122:20160506:093338.347 Conditions match our event. Execute operation. 8122:20160506:093338.347 In add_object_msg() 8122:20160506:093338.347 query [txnlev:1] [select userid from opmessage_usr where operationid=8 union select g.userid from opmessage_grp m,users_groups g where m.usrgrpid=g.usrgrpid and m.operationid=8] 8122:20160506:093338.348 query [txnlev:1] [select count(*) from usrgrp g,users_groups ug where ug.userid=1 and g.usrgrpid=ug.usrgrpid and g.users_status=1] 8122:20160506:093338.348 In get_trigger_permission() 8122:20160506:093338.348 query [txnlev:1] [select distinct i.hostid from items i,functions f where i.itemid=f.itemid and f.triggerid=13561] 8122:20160506:093338.348 In get_host_permission() 8122:20160506:093338.348 query [txnlev:1] [select type from users where userid=1] 8122:20160506:093338.349 End of get_host_permission():rw 8122:20160506:093338.349 End of get_trigger_permission():rw 8122:20160506:093338.349 In substitute_simple_macros() data:'{TRIGGER.NAME}' 8122:20160506:093338.349 In substitute_simple_macros() data:'Some myname more than 500' 8122:20160506:093338.349 In get_N_functionid() expression:'{13250}>{$FSPFREE}' N_functionid:1 8122:20160506:093338.349 get_N_functionid() functionid:13250 8122:20160506:093338.349 End of get_N_functionid():SUCCEED 8122:20160506:093338.349 In get_N_functionid() expression:'>{$FSPFREE}' N_functionid:1 8122:20160506:093338.349 End of get_N_functionid():FAIL 8122:20160506:093338.349 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ... 8122:20160506:093338.349 === Backtrace: === 8122:20160506:093338.350 15: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](print_fatal_info+0x9e) [0x47898e] 8122:20160506:093338.350 14: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations]() [0x478c32] 8122:20160506:093338.350 13: /lib/x86_64-linux-gnu/libc.so.6(+0x350e0) [0x7fe9ac6980e0] 8122:20160506:093338.350 12: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations]() [0x45edf4] 8122:20160506:093338.350 11: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](DCexpression_expand_user_macros+0x21) [0x468bf1] 8122:20160506:093338.350 10: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](substitute_simple_macros+0xd68) [0x452ae8] 8122:20160506:093338.350 9: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](substitute_simple_macros+0x3f23) [0x455ca3] 8122:20160506:093338.350 8: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations]() [0x43d3f0] 8122:20160506:093338.350 7: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations]() [0x43dbdf] 8122:20160506:093338.350 6: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](escalator_thread+0x13f) [0x43decf] 8122:20160506:093338.350 5: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](zbx_thread_start+0x45) [0x479515] 8122:20160506:093338.350 4: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](MAIN_ZABBIX_ENTRY+0x598) [0x41b198] 8122:20160506:093338.350 3: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](daemon_start+0x1b0) [0x4782d0] 8122:20160506:093338.350 2: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations](main+0x388) [0x4160d8] 8122:20160506:093338.350 1: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fe9ac684b45] 8122:20160506:093338.350 0: ./zabbix_server3.0: escalator #1 [processed 0 escalations in 0.000768 sec, processing escalations]() [0x416365] 8094:20160506:093340.357 Zabbix Server stopped. Zabbix 3.0.3rc1 (revision 59873).
        Hide
        Glebs Ivanovskis added a comment -

        Oh, finally! After so many hours...

        Long story short. To expand {TRIGGER.NAME} macro properly we need to take trigger name and resolve all macros there. Trigger name may contain positional macro ($1, $2, ...) and therefore we need to parse trigger expression in search for constants. To do this we must first resolve user macros in trigger expression. If there is an unknown macro dc_expand_user_macros() wants to provide an error message with

        *error = zbx_dsprintf(*error, "macro '%.*s' is not found", len, ptr);

        But substitute_simple_macros() is not bothered and simply passes NULL as error:

        DCexpression_expand_user_macros(event->trigger.expression, NULL)

        Caused by ZBXNEXT-3210.

        Huge thanks to Oleksiy Zagorskyi for so much effort put into investigation!

        Show
        Glebs Ivanovskis added a comment - Oh, finally! After so many hours... Long story short. To expand {TRIGGER.NAME} macro properly we need to take trigger name and resolve all macros there. Trigger name may contain positional macro ($1, $2, ...) and therefore we need to parse trigger expression in search for constants. To do this we must first resolve user macros in trigger expression. If there is an unknown macro dc_expand_user_macros() wants to provide an error message with *error = zbx_dsprintf(*error, "macro '%.*s' is not found" , len, ptr); But substitute_simple_macros() is not bothered and simply passes NULL as error : DCexpression_expand_user_macros(event->trigger.expression, NULL) Caused by ZBXNEXT-3210 . Huge thanks to Oleksiy Zagorskyi for so much effort put into investigation!
        Hide
        Glebs Ivanovskis added a comment -

        Fix for version 3.0 is available in development branch svn://svn.zabbix.com/branches/dev/ZBX-10692 revision 59917. Patch attached for those who need it urgently.

        Show
        Glebs Ivanovskis added a comment - Fix for version 3.0 is available in development branch svn://svn.zabbix.com/branches/dev/ZBX-10692 revision 59917. Patch attached for those who need it urgently.
        Hide
        Oleksiy Zagorskyi added a comment -

        More complex scenario how it's possible to trigger such a crash (only once for each trigger prototype) described in ZBX-10759.
        Initially current crash was discovered by me with that conditions.

        Show
        Oleksiy Zagorskyi added a comment - More complex scenario how it's possible to trigger such a crash (only once for each trigger prototype) described in ZBX-10759 . Initially current crash was discovered by me with that conditions.
        Hide
        Alexander Vladishev added a comment - - edited

        Successfully tested! Have a look at my changes in r59921.

        Glebs Ivanovskis Changes look fine. Accepted!

        Show
        Alexander Vladishev added a comment - - edited Successfully tested! Have a look at my changes in r59921. Glebs Ivanovskis Changes look fine. Accepted!
        Hide
        Glebs Ivanovskis added a comment -

        Fixed in pre-3.0.3rc1 r59924, pre-3.1.0 (trunk) r59926.

        Show
        Glebs Ivanovskis added a comment - Fixed in pre-3.0.3rc1 r59924, pre-3.1.0 (trunk) r59926.

          People

          • Assignee:
            Unassigned
            Reporter:
            Ilya Kruchinin
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: