Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-15097

zabbix escalator thread crash

XMLWordPrintable

    • Icon: Incident report Incident report
    • Resolution: Duplicate
    • Icon: Major Major
    • None
    • 4.0 (plan)
    • Server (S)
    • None

      Steps to reproduce:

      unknown. possible happens after manually deleting some lld-items linked to triggers

      Result:

       14219:20181030:151309.599 Got signal [signal:11(SIGSEGV),reason:1,refaddr:(nil)]. Crashing ...
      inside the escalator thread (and "graceful" server shutdown after). repeated each time I've tried to restart server.

      I've traced the problem to libs/zbxserver/expression.c substitute_simple_macros() - it accepts tons of pointer parameters, seems newer bothering to check them all for null values.

      I've no time to dig future in ugly c code, but this dirty hack at least solved my problem (allowing escalator to skip problematic item and making time for garbage collector to clear escalations table):

      @@ -2779,8 +2780,20 @@
                                      res = FAIL;
                                      continue;
                      }
      +       zabbix_log(LOG_LEVEL_DEBUG, "after newer happen %s() data:'%s'", __function_name, *data);
      
                      ret = SUCCEED;
      +               if ( strncmp(*data,"Unknown: {TRIGGER.NAME}",strlen("Unknown: {TRIGGER.NAME}"))  )
      +                               {
      +       zabbix_log(LOG_LEVEL_DEBUG, "unknow happen %s() data:'%s'", __function_name, *data);
      +
      +       zbx_free(expression);
      +       zbx_vector_uint64_destroy(&hostids);
      +
      +       zabbix_log(LOG_LEVEL_DEBUG, "End %s() data:'%s'", __function_name, *data);
      +
      +       return res;
      +                               }
      
                      if (0 != (macro_type & (MACRO_TYPE_MESSAGE_NORMAL | MACRO_TYPE_MESSAGE_RECOVERY |
                                      MACRO_TYPE_MESSAGE_ACK)))
      
      

      I never bothered to make "objdump", because the source of segfault is clearly in next 20-screens length nested `if`, but I've kept both the binary and 1.7G database dump, if someone want more clarification.
      For me it seems clear enough - it fails in expanding macro values from already unexisting items in already unexisting trigger.

      hope it will attract attention from someone with knowledge of this function internals.

            Unassigned Unassigned
            ralx not needed
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: