Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-10410

Zabbix server crash when doing history cache sync during shutdown

XMLWordPrintable

      Zabbix server crashed when exiting because of low history index cache size:

        31283:20160218:160105.927 __mem_malloc: skipped 3 asked 66304 kip_min 1728 skip_max 22608
        31283:20160218:160105.927 [file:dbcache.c,line:2561] zbx_mem_realloc(): out of memory (requested 66304 bytes)
        31283:20160218:160105.927 [file:dbcache.c,line:2561] zbx_mem_realloc(): please increase HistoryIndexCacheSize configuration parameter
      

      with the following stack trace:

        31277:20160218:160107.937 === Backtrace: ===
        31277:20160218:160107.952 17: 
      /home/monitoring/zabbix-bin/sbin/zabbix_server(print_fatal_info+0x19d) [0x46bcfd]
        31277:20160218:160107.952 16: /home/monitoring/zabbix-bin/sbin/zabbix_server() [0x46c058]
        31277:20160218:160107.952 15: /lib/x86_64-linux-gnu/libc.so.6(+0x36d40) [0x7f6b8315ad40]
        
        31277:20160218:160107.952 14: /home/monitoring/zabbix-bin/sbin/zabbix_server() [0x44cb23]
      	return zbx_timespec_compare(&item1->tail->ts, &item2->tail->ts); 
        
        31277:20160218:160107.952 13: /home/monitoring/zabbix-bin/sbin/zabbix_server() [0x4648fb]
      	if (heap->compare_func(&heap->elems[(index - 1) / 2], &heap->elems[index]) <= 0)
        
          31277:20160218:160107.952 12: /home/monitoring/zabbix-bin/sbin/zabbix_server(zbx_binary_heap_insert+0xbe) [0x464c1c] 
      	index = __binary_heap_bubble_up(heap, index);
        
        31277:20160218:160107.953 11: /home/monitoring/zabbix-bin/sbin/zabbix_server() [0x44cb5c]
      	static void	hc_queue_item(zbx_hc_item_t *item)
      	{
      		...
      		zbx_binary_heap_insert(&cache->history_queue, &elem);
      	}
      	
        31277:20160218:160107.953 10: /home/monitoring/zabbix-bin/sbin/zabbix_server(DCsync_history+0xf0) [0x44df64]
      	if (ZBX_HC_ITEM_STATUS_QUEUED != item->status)
      		hc_queue_item(item);
      	
        31277:20160218:160107.953 9: /home/monitoring/zabbix-bin/sbin/zabbix_server(free_database_cache+0x40) [0x4511d0]
        31277:20160218:160107.953 8: /home/monitoring/zabbix-bin/sbin/zabbix_server(zbx_on_exit+0xe6) [0x417d01]
        31277:20160218:160107.953 7: /home/monitoring/zabbix-bin/sbin/zabbix_server() [0x46c310] 
        31277:20160218:160107.953 6: /lib/x86_64-linux-gnu/libc.so.6(+0x36d40) 
      

      Apparently during hc_update_history_queue() the history item hashset contained items without data (NULL tail/head values). It's not clear yet how it could have happened. The initial error (insufficient history index cache size) can't be direct cause.

      Current theory is that after the exit of the initial process server sent termination signal to the rest processes and one was aborted in the middle of new item adding to history cache, however unlikely it would seem:

      		item = hc_get_item(item_value->itemid);
      
      		/* new items must be inserted in queue */
      		update_queue = (NULL == item->tail);
      
      		if (NULL == item->head)
      			item->tail = data;
      		else
      			item->head->next = data;
      
      		item->head = data;
      

      If a process was killed right after item = hc_get_item(item_value->itemid); then we would have history item with NULL tail/head values.

            Unassigned Unassigned
            wiper Andris Zeila
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: