Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-15097

zabbix escalator thread crash

    XMLWordPrintable

Details

    • Incident report
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 4.0 (plan)
    • None
    • Server (S)
    • None

    Description

      Steps to reproduce:

      unknown. possible happens after manually deleting some lld-items linked to triggers

      Result:

       14219:20181030:151309.599 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ...
      inside the escalator thread (and "graceful" server shutdown after). repeated each time I've tried to restart server.

      I've traced the problem to libs/zbxserver/expression.c substitute_simple_macros() - it accepts tons of pointer parameters, seems newer bothering to check them all for null values.

      I've no time to dig future in ugly c code, but this dirty hack at least solved my problem (allowing escalator to skip problematic item and making time for garbage collector to clear escalations table):

      @@ -2779,8 +2780,20 @@
                                      res = FAIL;
                                      continue;
                      }
      +       zabbix_log(LOG_LEVEL_DEBUG, "after newer happen %s() data:'%s'", __function_name, *data);
      
                      ret = SUCCEED;
      +               if ( strncmp(*data,"Unknown: {TRIGGER.NAME}",strlen("Unknown: {TRIGGER.NAME}"))  )
      +                               {
      +       zabbix_log(LOG_LEVEL_DEBUG, "unknow happen %s() data:'%s'", __function_name, *data);
      +
      +       zbx_free(expression);
      +       zbx_vector_uint64_destroy(&hostids);
      +
      +       zabbix_log(LOG_LEVEL_DEBUG, "End %s() data:'%s'", __function_name, *data);
      +
      +       return res;
      +                               }
      
                      if (0 != (macro_type & (MACRO_TYPE_MESSAGE_NORMAL | MACRO_TYPE_MESSAGE_RECOVERY |
                                      MACRO_TYPE_MESSAGE_ACK)))
      
      

      I never bothered to make "objdump", because the source of segfault is clearly in next 20-screens length nested `if`, but I've kept both the binary and 1.7G database dump, if someone want more clarification.
      For me it seems clear enough - it fails in expanding macro values from already unexisting items in already unexisting trigger.

      hope it will attract attention from someone with knowledge of this function internals.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ralx not needed
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: