Memory fragmentation on zbx_hashset_copy during um_cache_sync

XMLWordPrintable

    • Sprint candidates

      Copy on write during (zbx_hashset_copy, um_cache_sync) when syncing new macros can cause big fragmentation and configuration cache runs out of memory while there is still lots of configuration cache free.
      It is better not to free when releasing hashset but store for further reuse when releasing.

      Later consider increasing max bucket size to reduce fragmentation as following or not copying so frequently or introduce new bucket for allocations that are bigger than 32 KB.

      diff --git a/include/zbxshmem.h b/include/zbxshmem.h
      index a0d54ff3454..7473001a7c3 100644
      --- a/include/zbxshmem.h
      +++ b/include/zbxshmem.h
      @@ -20,7 +20,7 @@
       #define SHMEM_MIN_ALLOC        24              /* should be a multiple of 8 and at least (2 * ZBX_PTR_SIZE) */
       
       #define ZBX_SHMEM_MIN_BUCKET_SIZE      SHMEM_MIN_ALLOC
      -#define SHMEM_MAX_BUCKET_SIZE          256 /* starting from this size all free chunks are put into the same bucket */
      +#define SHMEM_MAX_BUCKET_SIZE          4096 /* starting from this size all free chunks are put into the same bucket */
       #define ZBX_SHMEM_BUCKET_COUNT         ((SHMEM_MAX_BUCKET_SIZE - ZBX_SHMEM_MIN_BUCKET_SIZE) / 8 + 1)
      
      3595095:20251216:092116.688 query [txnlev:0] [select globalmacroid,macro,value,type from globalmacro]
      3595095:20251216:092116.689 query [txnlev:0] [select hostmacroid,hostid,macro,value,type from hostmacro]
      3595095:20251216:092117.093 __mem_malloc: skipped 1342958 asked 366104 skip_min 256 skip_max 365928
      3595095:20251216:092117.287 === memory statistics for configuration cache ===
      3595095:20251216:092117.288 free chunks of size     24 bytes:    32703
      3595095:20251216:092117.289 free chunks of size     32 bytes:        8
      3595095:20251216:092117.290 free chunks of size     40 bytes:      110
      3595095:20251216:092117.290 free chunks of size     48 bytes:        4
      3595095:20251216:092117.290 free chunks of size     56 bytes:        3
      3595095:20251216:092117.290 free chunks of size     64 bytes:        5
      3595095:20251216:092117.290 free chunks of size     72 bytes:        1
      3595095:20251216:092117.290 free chunks of size     80 bytes:     9048
      3595095:20251216:092117.290 free chunks of size     88 bytes:       47
      3595095:20251216:092117.290 free chunks of size     96 bytes:        8
      3595095:20251216:092117.290 free chunks of size    104 bytes:        8
      3595095:20251216:092117.290 free chunks of size    112 bytes:       15
      3595095:20251216:092117.290 free chunks of size    120 bytes:       38
      3595095:20251216:092117.290 free chunks of size    128 bytes:        2
      3595095:20251216:092117.290 free chunks of size    136 bytes:       30
      3595095:20251216:092117.290 free chunks of size    144 bytes:        1
      3595095:20251216:092117.290 free chunks of size    160 bytes:        1
      3595095:20251216:092117.290 free chunks of size    168 bytes:        4
      3595095:20251216:092117.290 free chunks of size    176 bytes:     1766
      3595095:20251216:092117.290 free chunks of size    184 bytes:        1
      3595095:20251216:092117.290 free chunks of size    208 bytes:        2
      3595095:20251216:092117.290 free chunks of size    216 bytes:       20
      3595095:20251216:092117.291 free chunks of size    224 bytes:        1
      3595095:20251216:092117.291 free chunks of size    232 bytes:       12
      3595095:20251216:092117.291 free chunks of size    248 bytes:        1
      3595095:20251216:092117.291 free chunks of size >= 256 bytes:  1342958
      3595095:20251216:092117.291 min chunk size:         24 bytes
      3595095:20251216:092117.291 max chunk size:     365928 bytes
      3595095:20251216:092117.291 memory of total size 9947792640 bytes fragmented into 49351577 chunks
      3595095:20251216:092117.291 of those, 5630902536 bytes are in  1386797 free chunks
      3595095:20251216:092117.291 of those, 4316890104 bytes are in 47964780 used chunks
      3595095:20251216:092117.291 of those,  789625216 bytes are used by allocation overhead
      3595095:20251216:092117.291 ================================
      3595095:20251216:092117.291 === Backtrace: ===
      3595095:20251216:092117.292 14: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 6.674081 sec, syncing configuration](zbx_backtrace+0x41) [0x55acce253641]
      3595095:20251216:092117.292 13: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 6.674081 sec, syncing configuration](__zbx_shmem_malloc+0x68) [0x55acce166df8]
      3595095:20251216:092117.292 12: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 6.674081 sec, syncing configuration](zbx_hashset_copy+0x4b) [0x55acce1987ab]
      3595095:20251216:092117.292 11: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 6.674081 sec, syncing configuration](+0x2d1a88) [0x55acce103a88]
      3595095:20251216:092117.292 10: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 6.674081 sec, syncing configuration](um_cache_sync+0x4b) [0x55acce10517b]
      3595095:20251216:092117.292 9: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 6.674081 sec, syncing configuration](zbx_dc_sync_configuration+0x2766) [0x55acce0d4a76]
      3595095:20251216:092117.292 8: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 6.674081 sec, syncing configuration](dbconfig_thread+0x418) [0x55accdf0fdd8]
      3595095:20251216:092117.292 7: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 6.674081 sec, syncing configuration](zbx_thread_start+0x20) [0x55acce1736c0]
      3595095:20251216:092117.292 6: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 6.674081 sec, syncing configuration](+0x46a77a) [0x55acce29c77a]
      3595095:20251216:092117.292 5: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 6.674081 sec, syncing configuration](MAIN_ZABBIX_ENTRY+0x116b) [0x55accdf09dab]
      3595095:20251216:092117.292 4: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 6.674081 sec, syncing configuration](zbx_daemon_start+0x145) [0x55acce256275]
      3595095:20251216:092117.292 3: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 6.674081 sec, syncing configuration](main+0x3f5) [0x55accdefdc55]
      3595095:20251216:092117.292 2: /lib64/libc.so.6(+0x29590) [0x7fb307829590]
      3595095:20251216:092117.292 1: /lib64/libc.so.6(__libc_start_main+0x80) [0x7fb307829640]
      3595095:20251216:092117.292 0: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 6.674081 sec, syncing configuration](_start+0x25) [0x55accdf04ff5]
      3595095:20251216:092117.292 [file:dbconfig.c,line:231] __zbx_shmem_malloc(): out of memory (requested 366104 bytes)
      3595095:20251216:092117.292 [file:dbconfig.c,line:231] __zbx_shmem_malloc(): please increase CacheSize configuration parameter
      3595080:20251216:092117.736 One child process died (PID:3595095,exitcode/signal:1). Exiting ...
      zabbix_server [3595080]: Error waiting for process with PID 3595095: [10] No child processes 

            Assignee:
            Zabbix Support Team
            Reporter:
            Vladislavs Sokurenko
            Team A
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: