-
Incident report
-
Resolution: Fixed
-
Critical
-
3.0.0
Zabbix server crashed when exiting because of low history index cache size:
31283:20160218:160105.927 __mem_malloc: skipped 3 asked 66304 kip_min 1728 skip_max 22608 31283:20160218:160105.927 [file:dbcache.c,line:2561] zbx_mem_realloc(): out of memory (requested 66304 bytes) 31283:20160218:160105.927 [file:dbcache.c,line:2561] zbx_mem_realloc(): please increase HistoryIndexCacheSize configuration parameter
with the following stack trace:
31277:20160218:160107.937 === Backtrace: === 31277:20160218:160107.952 17: /home/monitoring/zabbix-bin/sbin/zabbix_server(print_fatal_info+0x19d) [0x46bcfd] 31277:20160218:160107.952 16: /home/monitoring/zabbix-bin/sbin/zabbix_server() [0x46c058] 31277:20160218:160107.952 15: /lib/x86_64-linux-gnu/libc.so.6(+0x36d40) [0x7f6b8315ad40] 31277:20160218:160107.952 14: /home/monitoring/zabbix-bin/sbin/zabbix_server() [0x44cb23] return zbx_timespec_compare(&item1->tail->ts, &item2->tail->ts); 31277:20160218:160107.952 13: /home/monitoring/zabbix-bin/sbin/zabbix_server() [0x4648fb] if (heap->compare_func(&heap->elems[(index - 1) / 2], &heap->elems[index]) <= 0) 31277:20160218:160107.952 12: /home/monitoring/zabbix-bin/sbin/zabbix_server(zbx_binary_heap_insert+0xbe) [0x464c1c] index = __binary_heap_bubble_up(heap, index); 31277:20160218:160107.953 11: /home/monitoring/zabbix-bin/sbin/zabbix_server() [0x44cb5c] static void hc_queue_item(zbx_hc_item_t *item) { ... zbx_binary_heap_insert(&cache->history_queue, &elem); } 31277:20160218:160107.953 10: /home/monitoring/zabbix-bin/sbin/zabbix_server(DCsync_history+0xf0) [0x44df64] if (ZBX_HC_ITEM_STATUS_QUEUED != item->status) hc_queue_item(item); 31277:20160218:160107.953 9: /home/monitoring/zabbix-bin/sbin/zabbix_server(free_database_cache+0x40) [0x4511d0] 31277:20160218:160107.953 8: /home/monitoring/zabbix-bin/sbin/zabbix_server(zbx_on_exit+0xe6) [0x417d01] 31277:20160218:160107.953 7: /home/monitoring/zabbix-bin/sbin/zabbix_server() [0x46c310] 31277:20160218:160107.953 6: /lib/x86_64-linux-gnu/libc.so.6(+0x36d40)
Apparently during hc_update_history_queue() the history item hashset contained items without data (NULL tail/head values). It's not clear yet how it could have happened. The initial error (insufficient history index cache size) can't be direct cause.
Current theory is that after the exit of the initial process server sent termination signal to the rest processes and one was aborted in the middle of new item adding to history cache, however unlikely it would seem:
item = hc_get_item(item_value->itemid); /* new items must be inserted in queue */ update_queue = (NULL == item->tail); if (NULL == item->head) item->tail = data; else item->head->next = data; item->head = data;
If a process was killed right after item = hc_get_item(item_value->itemid); then we would have history item with NULL tail/head values.