zabbix escalator thread crash

XMLWordPrintable

    • Type: Incident report
    • Resolution: Duplicate
    • Priority: Major
    • None
    • Affects Version/s: 4.0 (plan)
    • Component/s: Server (S)
    • None

      Steps to reproduce:

      unknown. possible happens after manually deleting some lld-items linked to triggers

      Result:

       14219:20181030:151309.599 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ...
      inside the escalator thread (and "graceful" server shutdown after). repeated each time I've tried to restart server.

      I've traced the problem to libs/zbxserver/expression.c substitute_simple_macros() - it accepts tons of pointer parameters, seems newer bothering to check them all for null values.

      I've no time to dig future in ugly c code, but this dirty hack at least solved my problem (allowing escalator to skip problematic item and making time for garbage collector to clear escalations table):

      @@ -2779,8 +2780,20 @@
                                      res = FAIL;
                                      continue;
                      }
      +       zabbix_log(LOG_LEVEL_DEBUG, "after newer happen %s() data:'%s'", __function_name, *data);
      
                      ret = SUCCEED;
      +               if ( strncmp(*data,"Unknown: {TRIGGER.NAME}",strlen("Unknown: {TRIGGER.NAME}"))  )
      +                               {
      +       zabbix_log(LOG_LEVEL_DEBUG, "unknow happen %s() data:'%s'", __function_name, *data);
      +
      +       zbx_free(expression);
      +       zbx_vector_uint64_destroy(&hostids);
      +
      +       zabbix_log(LOG_LEVEL_DEBUG, "End %s() data:'%s'", __function_name, *data);
      +
      +       return res;
      +                               }
      
                      if (0 != (macro_type & (MACRO_TYPE_MESSAGE_NORMAL | MACRO_TYPE_MESSAGE_RECOVERY |
                                      MACRO_TYPE_MESSAGE_ACK)))
      
      

      I never bothered to make "objdump", because the source of segfault is clearly in next 20-screens length nested `if`, but I've kept both the binary and 1.7G database dump, if someone want more clarification.
      For me it seems clear enough - it fails in expanding macro values from already unexisting items in already unexisting trigger.

      hope it will attract attention from someone with knowledge of this function internals.

            Assignee:
            Unassigned
            Reporter:
            not needed
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: