[ZBX-15774] Server housekeeper memory leakage Created: 2019 Mar 06 Updated: 2024 Apr 10 Resolved: 2019 Mar 17 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 4.0.4 |
Fix Version/s: | 4.0.6rc1, 4.2.0rc1, 4.2 (plan) |
Type: | Problem report | Priority: | Critical |
Reporter: | Oleg Morozov | Assignee: | Vladislavs Sokurenko |
Resolution: | Fixed | Votes: | 0 |
Labels: | elasticsearch, housekeeper, memoryleak | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: | graph.png hk-settings.png pmap.1 pmap.2 pmap.3 zabbix_server.objdump.gz |
Team: | Team A |
Sprint: | Sprint 50 (Mar 2019) |
Story Points: | 0.25 |
Description |
After upgrade to 4 version + Elasticsearch I see constantly raising memory usage by housekeeper process. After some investigation (few hk runs, pmap dump after each run) found that hk process eats +15872 kbytes after each run. So with housekeeper every hour we got ~372 Mb memory leakage every day. For now we have to restart server every month. Attached 3 pmap dumps and server memory usage graph. Server configuration: LogFile=/var/log/zabbix/zabbix_server.log LogFileSize=0 PidFile=/var/run/zabbix/zabbix_server.pid SocketDir=/var/run/zabbix DBHost=127.0.0.1 DBName=zabbix DBUser=zabbix DBPassword=*** DBPort=7001 HistoryStorageURL=http://localhost:9200 HistoryStorageDateIndex=1 StartPollers=4 StartTrappers=2 SNMPTrapperFile=/var/log/snmptrap/snmptrap.log MaxHousekeeperDelete=100000 CacheSize=2G StartDBSyncers=16 HistoryCacheSize=2G HistoryIndexCacheSize=256M TrendCacheSize=128M ValueCacheSize=6G Timeout=5 AlertScriptsPath=/usr/lib/zabbix/alertscripts ExternalScripts=/usr/lib/zabbix/externalscripts FpingLocation=/usr/bin/fping Fping6Location=/usr/bin/fping6 LogSlowQueries=5000 ProxyConfigFrequency=60 |
Comments |
Comment by Edgar Akhmetshin [ 2019 Mar 06 ] |
Hello Oleg, Thank you for reporting the issue. Please, provide the following information:
Regards, |
Comment by Oleg Morozov [ 2019 Mar 06 ] |
Hi Edgar, thanks for reply. Attached zabbix_server.objdump.gz
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenial
# uname -a
Linux *** 4.15.0-38-generic #41~16.04.1-Ubuntu SMP Wed Oct 10 20:16:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
# ldd $(which zabbix_server)
linux-vdso.so.1 => (0x00007ffc6a3f5000)
libmysqlclient.so.20 => /usr/lib/x86_64-linux-gnu/libmysqlclient.so.20 (0x00007f821e9b4000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f821e797000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f821e57d000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f821e274000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f821e06c000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f821de68000)
libiksemel.so.3 => /usr/lib/x86_64-linux-gnu/libiksemel.so.3 (0x00007f821dc5a000)
libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f821d89f000)
libodbc.so.2 => /usr/lib/x86_64-linux-gnu/libodbc.so.2 (0x00007f821d636000)
libnetsnmp.so.30 => /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30 (0x00007f821d359000)
libssh2.so.1 => /usr/lib/x86_64-linux-gnu/libssh2.so.1 (0x00007f821d130000)
libOpenIPMI.so.0 => /usr/lib/libOpenIPMI.so.0 (0x00007f821ce22000)
libOpenIPMIposix.so.0 => /usr/lib/libOpenIPMIposix.so.0 (0x00007f821cc1c000)
libevent-2.0.so.5 => /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5 (0x00007f821c9d6000)
libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007f821c76d000)
libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007f821c328000)
libldap_r-2.4.so.2 => /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007f821c0d7000)
liblber-2.4.so.2 => /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007f821bec8000)
libcurl.so.4 => /usr/lib/x86_64-linux-gnu/libcurl.so.4 (0x00007f821bc59000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f821ba3e000)
libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f821b7ce000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f821b404000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f821b082000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f821ae6c000)
/lib64/ld-linux-x86-64.so.2 (0x00007f821f498000)
libgnutls.so.30 => /usr/lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007f821ab3c000)
libicuuc.so.55 => /usr/lib/x86_64-linux-gnu/libicuuc.so.55 (0x00007f821a7a8000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f821a586000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f821a37c000)
libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007f821a09b000)
libOpenIPMIutils.so.0 => /usr/lib/libOpenIPMIutils.so.0 (0x00007f8219e92000)
libsasl2.so.2 => /usr/lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f8219c77000)
libgssapi.so.3 => /usr/lib/x86_64-linux-gnu/libgssapi.so.3 (0x00007f8219a36000)
libidn.so.11 => /usr/lib/x86_64-linux-gnu/libidn.so.11 (0x00007f8219803000)
librtmp.so.1 => /usr/lib/x86_64-linux-gnu/librtmp.so.1 (0x00007f82195e7000)
libgssapi_krb5.so.2 => /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f821939d000)
libp11-kit.so.0 => /usr/lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f8219139000)
libtasn1.so.6 => /usr/lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f8218f26000)
libnettle.so.6 => /usr/lib/x86_64-linux-gnu/libnettle.so.6 (0x00007f8218cf0000)
libhogweed.so.4 => /usr/lib/x86_64-linux-gnu/libhogweed.so.4 (0x00007f8218abd000)
libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f821883d000)
libicudata.so.55 => /usr/lib/x86_64-linux-gnu/libicudata.so.55 (0x00007f8216d86000)
libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007f8216b72000)
libheimntlm.so.0 => /usr/lib/x86_64-linux-gnu/libheimntlm.so.0 (0x00007f8216969000)
libkrb5.so.26 => /usr/lib/x86_64-linux-gnu/libkrb5.so.26 (0x00007f82166df000)
libasn1.so.8 => /usr/lib/x86_64-linux-gnu/libasn1.so.8 (0x00007f821643d000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f8216239000)
libhcrypto.so.4 => /usr/lib/x86_64-linux-gnu/libhcrypto.so.4 (0x00007f8216006000)
libroken.so.18 => /usr/lib/x86_64-linux-gnu/libroken.so.18 (0x00007f8215df0000)
libkrb5.so.3 => /usr/lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f8215b1e000)
libk5crypto.so.3 => /usr/lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f82158ef000)
libkrb5support.so.0 => /usr/lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f82156e4000)
libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007f82154dc000)
libwind.so.0 => /usr/lib/x86_64-linux-gnu/libwind.so.0 (0x00007f82152b3000)
libheimbase.so.1 => /usr/lib/x86_64-linux-gnu/libheimbase.so.1 (0x00007f82150a4000)
libhx509.so.5 => /usr/lib/x86_64-linux-gnu/libhx509.so.5 (0x00007f8214e59000)
libsqlite3.so.0 => /usr/lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f8214b84000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f821494c000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f8214748000)
|
Comment by Edgar Akhmetshin [ 2019 Mar 06 ] |
Oleg, One more thing, please. Version of the ElasticSearch used? Regards, |
Comment by Oleg Morozov [ 2019 Mar 06 ] |
curl ***:9200 { "name" : "***", "cluster_name" : "zabbix", "cluster_uuid" : "LihB0jtyTWiFsbwZXnyJ3w", "version" : { "number" : "6.1.4", "build_hash" : "d838f2d", "build_date" : "2018-03-14T08:28:22.470Z", "build_snapshot" : false, "lucene_version" : "7.1.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" } |
Comment by Vladislavs Sokurenko [ 2019 Mar 06 ] |
Does disabling housekeeping of history helps ? select count(*) from history; select count(*) from history_text; select count(*) from history_uint; select count(*) from history_str; select count(*) from history_log; |
Comment by Oleg Morozov [ 2019 Mar 06 ] |
Vladislav, history tables are empty since we switched to elasticsearch. I've checked, no records in history* tables. Housekeeping for history currently enabled, I'll try now disable it and make few hk runs. |
Comment by Oleg Morozov [ 2019 Mar 06 ] |
Disabled history housekeeping via web-interface and made 20 runs. Memory leak is still present, but now it eats 6500 kbytes instead of 15872 kbytes for one run. |
Comment by Oleg Morozov [ 2019 Mar 06 ] |
10 runs with 5 sec delay (1 sec is enough according to log) # for i in {1..10}; do zabbix_server -R housekeeper_execute; sleep 5; pmap -x 1286 > $i; done zabbix_server [30896]: command sent successfully zabbix_server [31050]: command sent successfully zabbix_server [31163]: command sent successfully zabbix_server [31277]: command sent successfully zabbix_server [31420]: command sent successfully zabbix_server [31626]: command sent successfully zabbix_server [31847]: command sent successfully zabbix_server [32013]: command sent successfully zabbix_server [32210]: command sent successfully zabbix_server [32325]: command sent successfully # for i in {1..10}; do grep -m1 00005637a6b58000 $i; done 00005637a6b58000 3765408 3765224 3765224 rw--- [ anon ] 00005637a6b58000 3771908 3771724 3771724 rw--- [ anon ] 00005637a6b58000 3778408 3778224 3778224 rw--- [ anon ] 00005637a6b58000 3784908 3784724 3784724 rw--- [ anon ] 00005637a6b58000 3791408 3791224 3791224 rw--- [ anon ] 00005637a6b58000 3797908 3797724 3797724 rw--- [ anon ] 00005637a6b58000 3804408 3804224 3804224 rw--- [ anon ] 00005637a6b58000 3810908 3810724 3810724 rw--- [ anon ] 00005637a6b58000 3817408 3817224 3817224 rw--- [ anon ] 00005637a6b58000 3823908 3823724 3823724 rw--- [ anon ] |
Comment by Vladislavs Sokurenko [ 2019 Mar 06 ] |
What if trends housekeeping is disabled ? |
Comment by Oleg Morozov [ 2019 Mar 06 ] |
With trends housekeeping disabled no leakage after 10 runs. |
Comment by Oleg Morozov [ 2019 Mar 06 ] |
Enabled trends housekeeping and leak is back again, so definitively leak somewhere in that place. |
Comment by Vladislavs Sokurenko [ 2019 Mar 06 ] |
Reproduced. Please also provide screenshot of housekeeper configuration from frontend steps:
Observe memory leak: ==11409== For counts of detected and suppressed errors, rerun with: -v ==11409== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 48 from 48) ==11413== 32,768 bytes in 1 blocks are possibly lost in loss record 70 of 75 ==11413== at 0x4838748: malloc (vg_replace_malloc.c:308) ==11413== by 0x483AD63: realloc (vg_replace_malloc.c:836) ==11413== by 0x849BF0: zbx_realloc2 (misc.c:550) ==11413== by 0x810B58: zbx_default_mem_realloc_func (algodefs.c:331) ==11413== by 0x8359C6: zbx_vector_ptr_reserve (vector.c:28) ==11413== by 0x49B9D8: hk_history_delete_queue_prepare_global (housekeeper.c:491) ==11413== by 0x49BCF8: hk_history_delete_queue_prepare_all (housekeeper.c:546) ==11413== by 0x49C177: housekeeping_history_and_trends (housekeeper.c:651) ==11413== by 0x49E427: housekeeper_thread (housekeeper.c:1197) ==11413== by 0x83D318: zbx_thread_start (threads.c:132) ==11413== by 0x4237EE: MAIN_ZABBIX_ENTRY (server.c:1165) ==11413== by 0x80C8CD: daemon_start (daemon.c:392) ==11413== ==11413== 196,608 bytes in 6 blocks are definitely lost in loss record 73 of 75 ==11413== at 0x4838748: malloc (vg_replace_malloc.c:308) ==11413== by 0x483AD63: realloc (vg_replace_malloc.c:836) ==11413== by 0x849BF0: zbx_realloc2 (misc.c:550) ==11413== by 0x810B58: zbx_default_mem_realloc_func (algodefs.c:331) ==11413== by 0x8359C6: zbx_vector_ptr_reserve (vector.c:28) ==11413== by 0x49B9D8: hk_history_delete_queue_prepare_global (housekeeper.c:491) ==11413== by 0x49BCF8: hk_history_delete_queue_prepare_all (housekeeper.c:546) ==11413== by 0x49C177: housekeeping_history_and_trends (housekeeper.c:651) ==11413== by 0x49E427: housekeeper_thread (housekeeper.c:1197) ==11413== by 0x83D318: zbx_thread_start (threads.c:132) ==11413== by 0x4237EE: MAIN_ZABBIX_ENTRY (server.c:1165) ==11413== by 0x80C8CD: daemon_start (daemon.c:392) ==11413== ==11413== 229,376 bytes in 7 blocks are definitely lost in loss record 75 of 75 ==11413== at 0x4838748: malloc (vg_replace_malloc.c:308) ==11413== by 0x483AD63: realloc (vg_replace_malloc.c:836) ==11413== by 0x849BF0: zbx_realloc2 (misc.c:550) ==11413== by 0x810B58: zbx_default_mem_realloc_func (algodefs.c:331) ==11413== by 0x8359C6: zbx_vector_ptr_reserve (vector.c:28) ==11413== by 0x49ADE9: hk_history_prepare (housekeeper.c:294) ==11413== by 0x49BD71: hk_history_delete_queue_prepare_all (housekeeper.c:553) ==11413== by 0x49C177: housekeeping_history_and_trends (housekeeper.c:651) ==11413== by 0x49E427: housekeeper_thread (housekeeper.c:1197) ==11413== by 0x83D318: zbx_thread_start (threads.c:132) ==11413== by 0x4237EE: MAIN_ZABBIX_ENTRY (server.c:1165) ==11413== by 0x80C8CD: daemon_start (daemon.c:392) Suspicious place to blame(notice skipping but no clearing of allocated vectors) if (ZBX_HK_MODE_DISABLED == *rule->poption_mode || FAIL == zbx_history_requires_trends(rule->type)) continue; Workaround, disable history and trends housekeeping since they are not stored in MySQL database anyway. |
Comment by Oleg Morozov [ 2019 Mar 06 ] |
Attached screenshot. We cannot disable trends housekeeping since Zabbix cannot use elasticsearch for trends, only for history. |
Comment by Vladislavs Sokurenko [ 2019 Mar 06 ] |
It looks like housekeeper also does not delete trends when elastic search is enabled. |
Comment by Oleg Morozov [ 2019 Mar 06 ] |
Nice. |
Comment by Vladislavs Sokurenko [ 2019 Mar 06 ] |
Fixed in development branch: fixed old trends not being deleted by housekeeper and a memory leak in housekeeper when elasticsearch is used. Trends and item delete would be queued for deletion but never get deleted so vector would grow. |
Comment by Vladislavs Sokurenko [ 2019 Mar 11 ] |
Fixed in:
|
Comment by Oleg Morozov [ 2019 Mar 11 ] |
Спасибо. Thanks. |