[ZBX-15774] Server housekeeper memory leakage Created: 2019 Mar 06  Updated: 2024 Apr 10  Resolved: 2019 Mar 17

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 4.0.4
Fix Version/s: 4.0.6rc1, 4.2.0rc1, 4.2 (plan)

Type: Problem report Priority: Critical
Reporter: Oleg Morozov Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 0
Labels: elasticsearch, housekeeper, memoryleak
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File graph.png     PNG File hk-settings.png     File pmap.1     File pmap.2     File pmap.3     File zabbix_server.objdump.gz    
Team: Team A
Sprint: Sprint 50 (Mar 2019)
Story Points: 0.25

 Description   

After upgrade to 4 version + Elasticsearch I see constantly raising memory usage by housekeeper process. After some investigation (few hk runs, pmap dump after each run) found that hk process eats +15872 kbytes after each run. So with housekeeper every hour we got ~372 Mb memory leakage every day. For now we have to restart server every month.

Attached 3 pmap dumps and server memory usage graph.

Server configuration:

LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/var/run/zabbix/zabbix_server.pid
SocketDir=/var/run/zabbix
DBHost=127.0.0.1
DBName=zabbix
DBUser=zabbix
DBPassword=***
DBPort=7001
HistoryStorageURL=http://localhost:9200
HistoryStorageDateIndex=1
StartPollers=4
StartTrappers=2
SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
MaxHousekeeperDelete=100000
CacheSize=2G
StartDBSyncers=16
HistoryCacheSize=2G
HistoryIndexCacheSize=256M
TrendCacheSize=128M
ValueCacheSize=6G
Timeout=5
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/bin/fping
Fping6Location=/usr/bin/fping6
LogSlowQueries=5000
ProxyConfigFrequency=60


 Comments   
Comment by Edgar Akhmetshin [ 2019 Mar 06 ]

Hello Oleg,

Thank you for reporting the issue. Please, provide the following information:

  1. operating system used and it's version
  2. objdump -Dswx $(which zabbix_server) | gzip -c > zabbix_server.objdump.gz
  3. ldd $(which zabbix_server)

Regards,
Edgar

Comment by Oleg Morozov [ 2019 Mar 06 ]

Hi Edgar, thanks for reply. Attached zabbix_server.objdump.gz

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenial

# uname -a
Linux *** 4.15.0-38-generic #41~16.04.1-Ubuntu SMP Wed Oct 10 20:16:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

# ldd $(which zabbix_server)
linux-vdso.so.1 => (0x00007ffc6a3f5000)
libmysqlclient.so.20 => /usr/lib/x86_64-linux-gnu/libmysqlclient.so.20 (0x00007f821e9b4000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f821e797000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f821e57d000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f821e274000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f821e06c000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f821de68000)
libiksemel.so.3 => /usr/lib/x86_64-linux-gnu/libiksemel.so.3 (0x00007f821dc5a000)
libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f821d89f000)
libodbc.so.2 => /usr/lib/x86_64-linux-gnu/libodbc.so.2 (0x00007f821d636000)
libnetsnmp.so.30 => /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30 (0x00007f821d359000)
libssh2.so.1 => /usr/lib/x86_64-linux-gnu/libssh2.so.1 (0x00007f821d130000)
libOpenIPMI.so.0 => /usr/lib/libOpenIPMI.so.0 (0x00007f821ce22000)
libOpenIPMIposix.so.0 => /usr/lib/libOpenIPMIposix.so.0 (0x00007f821cc1c000)
libevent-2.0.so.5 => /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5 (0x00007f821c9d6000)
libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007f821c76d000)
libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007f821c328000)
libldap_r-2.4.so.2 => /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007f821c0d7000)
liblber-2.4.so.2 => /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007f821bec8000)
libcurl.so.4 => /usr/lib/x86_64-linux-gnu/libcurl.so.4 (0x00007f821bc59000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f821ba3e000)
libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f821b7ce000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f821b404000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f821b082000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f821ae6c000)
/lib64/ld-linux-x86-64.so.2 (0x00007f821f498000)
libgnutls.so.30 => /usr/lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007f821ab3c000)
libicuuc.so.55 => /usr/lib/x86_64-linux-gnu/libicuuc.so.55 (0x00007f821a7a8000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f821a586000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f821a37c000)
libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007f821a09b000)
libOpenIPMIutils.so.0 => /usr/lib/libOpenIPMIutils.so.0 (0x00007f8219e92000)
libsasl2.so.2 => /usr/lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f8219c77000)
libgssapi.so.3 => /usr/lib/x86_64-linux-gnu/libgssapi.so.3 (0x00007f8219a36000)
libidn.so.11 => /usr/lib/x86_64-linux-gnu/libidn.so.11 (0x00007f8219803000)
librtmp.so.1 => /usr/lib/x86_64-linux-gnu/librtmp.so.1 (0x00007f82195e7000)
libgssapi_krb5.so.2 => /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f821939d000)
libp11-kit.so.0 => /usr/lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f8219139000)
libtasn1.so.6 => /usr/lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f8218f26000)
libnettle.so.6 => /usr/lib/x86_64-linux-gnu/libnettle.so.6 (0x00007f8218cf0000)
libhogweed.so.4 => /usr/lib/x86_64-linux-gnu/libhogweed.so.4 (0x00007f8218abd000)
libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f821883d000)
libicudata.so.55 => /usr/lib/x86_64-linux-gnu/libicudata.so.55 (0x00007f8216d86000)
libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007f8216b72000)
libheimntlm.so.0 => /usr/lib/x86_64-linux-gnu/libheimntlm.so.0 (0x00007f8216969000)
libkrb5.so.26 => /usr/lib/x86_64-linux-gnu/libkrb5.so.26 (0x00007f82166df000)
libasn1.so.8 => /usr/lib/x86_64-linux-gnu/libasn1.so.8 (0x00007f821643d000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f8216239000)
libhcrypto.so.4 => /usr/lib/x86_64-linux-gnu/libhcrypto.so.4 (0x00007f8216006000)
libroken.so.18 => /usr/lib/x86_64-linux-gnu/libroken.so.18 (0x00007f8215df0000)
libkrb5.so.3 => /usr/lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f8215b1e000)
libk5crypto.so.3 => /usr/lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f82158ef000)
libkrb5support.so.0 => /usr/lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f82156e4000)
libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007f82154dc000)
libwind.so.0 => /usr/lib/x86_64-linux-gnu/libwind.so.0 (0x00007f82152b3000)
libheimbase.so.1 => /usr/lib/x86_64-linux-gnu/libheimbase.so.1 (0x00007f82150a4000)
libhx509.so.5 => /usr/lib/x86_64-linux-gnu/libhx509.so.5 (0x00007f8214e59000)
libsqlite3.so.0 => /usr/lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f8214b84000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f821494c000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f8214748000)
Comment by Edgar Akhmetshin [ 2019 Mar 06 ]

Oleg,

One more thing, please. Version of the ElasticSearch used?

Regards,
Edgar

Comment by Oleg Morozov [ 2019 Mar 06 ]
curl ***:9200
{
"name" : "***",
"cluster_name" : "zabbix",
"cluster_uuid" : "LihB0jtyTWiFsbwZXnyJ3w",
"version" : {
"number" : "6.1.4",
"build_hash" : "d838f2d",
"build_date" : "2018-03-14T08:28:22.470Z",
"build_snapshot" : false,
"lucene_version" : "7.1.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
Comment by Vladislavs Sokurenko [ 2019 Mar 06 ]

Does disabling housekeeping of history helps ?
Is there any history in MySQL database ?

select count(*) from history;
select count(*) from history_text;
select count(*) from history_uint;
select count(*) from history_str;
select count(*) from history_log;
Comment by Oleg Morozov [ 2019 Mar 06 ]

Vladislav, history tables are empty since we switched to elasticsearch. I've checked, no records in history* tables.

Housekeeping for history currently enabled, I'll try now disable it and make few hk runs.

Comment by Oleg Morozov [ 2019 Mar 06 ]

Disabled history housekeeping via web-interface and made 20 runs.

Memory leak is still present, but now it eats 6500 kbytes instead of 15872 kbytes for one run.

Comment by Oleg Morozov [ 2019 Mar 06 ]

10 runs with 5 sec delay (1 sec is enough according to log)

# for i in {1..10}; do zabbix_server -R housekeeper_execute; sleep 5; pmap -x 1286 > $i; done
zabbix_server [30896]: command sent successfully
zabbix_server [31050]: command sent successfully
zabbix_server [31163]: command sent successfully
zabbix_server [31277]: command sent successfully
zabbix_server [31420]: command sent successfully
zabbix_server [31626]: command sent successfully
zabbix_server [31847]: command sent successfully
zabbix_server [32013]: command sent successfully
zabbix_server [32210]: command sent successfully
zabbix_server [32325]: command sent successfully

# for i in {1..10}; do grep -m1 00005637a6b58000 $i; done
00005637a6b58000 3765408 3765224 3765224 rw--- [ anon ]
00005637a6b58000 3771908 3771724 3771724 rw--- [ anon ]
00005637a6b58000 3778408 3778224 3778224 rw--- [ anon ]
00005637a6b58000 3784908 3784724 3784724 rw--- [ anon ]
00005637a6b58000 3791408 3791224 3791224 rw--- [ anon ]
00005637a6b58000 3797908 3797724 3797724 rw--- [ anon ]
00005637a6b58000 3804408 3804224 3804224 rw--- [ anon ]
00005637a6b58000 3810908 3810724 3810724 rw--- [ anon ]
00005637a6b58000 3817408 3817224 3817224 rw--- [ anon ]
00005637a6b58000 3823908 3823724 3823724 rw--- [ anon ]
Comment by Vladislavs Sokurenko [ 2019 Mar 06 ]

What if trends housekeeping is disabled ?

Comment by Oleg Morozov [ 2019 Mar 06 ]

With trends housekeeping disabled no leakage after 10 runs.

Comment by Oleg Morozov [ 2019 Mar 06 ]

Enabled trends housekeeping and leak is back again, so definitively leak somewhere in that place.

Comment by Vladislavs Sokurenko [ 2019 Mar 06 ]

Reproduced. Please also provide screenshot of housekeeper configuration from frontend

steps:

  • Enable elastic
  • Override item history period and item trend period

Observe memory leak:

==11409== For counts of detected and suppressed errors, rerun with: -v
==11409== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 48 from 48)
==11413== 32,768 bytes in 1 blocks are possibly lost in loss record 70 of 75
==11413==    at 0x4838748: malloc (vg_replace_malloc.c:308)
==11413==    by 0x483AD63: realloc (vg_replace_malloc.c:836)
==11413==    by 0x849BF0: zbx_realloc2 (misc.c:550)
==11413==    by 0x810B58: zbx_default_mem_realloc_func (algodefs.c:331)
==11413==    by 0x8359C6: zbx_vector_ptr_reserve (vector.c:28)
==11413==    by 0x49B9D8: hk_history_delete_queue_prepare_global (housekeeper.c:491)
==11413==    by 0x49BCF8: hk_history_delete_queue_prepare_all (housekeeper.c:546)
==11413==    by 0x49C177: housekeeping_history_and_trends (housekeeper.c:651)
==11413==    by 0x49E427: housekeeper_thread (housekeeper.c:1197)
==11413==    by 0x83D318: zbx_thread_start (threads.c:132)
==11413==    by 0x4237EE: MAIN_ZABBIX_ENTRY (server.c:1165)
==11413==    by 0x80C8CD: daemon_start (daemon.c:392)
==11413== 
==11413== 196,608 bytes in 6 blocks are definitely lost in loss record 73 of 75
==11413==    at 0x4838748: malloc (vg_replace_malloc.c:308)
==11413==    by 0x483AD63: realloc (vg_replace_malloc.c:836)
==11413==    by 0x849BF0: zbx_realloc2 (misc.c:550)
==11413==    by 0x810B58: zbx_default_mem_realloc_func (algodefs.c:331)
==11413==    by 0x8359C6: zbx_vector_ptr_reserve (vector.c:28)
==11413==    by 0x49B9D8: hk_history_delete_queue_prepare_global (housekeeper.c:491)
==11413==    by 0x49BCF8: hk_history_delete_queue_prepare_all (housekeeper.c:546)
==11413==    by 0x49C177: housekeeping_history_and_trends (housekeeper.c:651)
==11413==    by 0x49E427: housekeeper_thread (housekeeper.c:1197)
==11413==    by 0x83D318: zbx_thread_start (threads.c:132)
==11413==    by 0x4237EE: MAIN_ZABBIX_ENTRY (server.c:1165)
==11413==    by 0x80C8CD: daemon_start (daemon.c:392)
==11413== 
==11413== 229,376 bytes in 7 blocks are definitely lost in loss record 75 of 75
==11413==    at 0x4838748: malloc (vg_replace_malloc.c:308)
==11413==    by 0x483AD63: realloc (vg_replace_malloc.c:836)
==11413==    by 0x849BF0: zbx_realloc2 (misc.c:550)
==11413==    by 0x810B58: zbx_default_mem_realloc_func (algodefs.c:331)
==11413==    by 0x8359C6: zbx_vector_ptr_reserve (vector.c:28)
==11413==    by 0x49ADE9: hk_history_prepare (housekeeper.c:294)
==11413==    by 0x49BD71: hk_history_delete_queue_prepare_all (housekeeper.c:553)
==11413==    by 0x49C177: housekeeping_history_and_trends (housekeeper.c:651)
==11413==    by 0x49E427: housekeeper_thread (housekeeper.c:1197)
==11413==    by 0x83D318: zbx_thread_start (threads.c:132)
==11413==    by 0x4237EE: MAIN_ZABBIX_ENTRY (server.c:1165)
==11413==    by 0x80C8CD: daemon_start (daemon.c:392)

Suspicious place to blame(notice skipping but no clearing of allocated vectors)

		if (ZBX_HK_MODE_DISABLED == *rule->poption_mode || FAIL == zbx_history_requires_trends(rule->type))
			continue;

Workaround, disable history and trends housekeeping since they are not stored in MySQL database anyway.

Comment by Oleg Morozov [ 2019 Mar 06 ]

Attached screenshot.

We cannot disable trends housekeeping since Zabbix cannot use elasticsearch for trends, only for history.

Comment by Vladislavs Sokurenko [ 2019 Mar 06 ]

It looks like housekeeper also does not delete trends when elastic search is enabled.

Comment by Oleg Morozov [ 2019 Mar 06 ]

It looks like housekeeper also does not delete trends when elastic search is enabled.

Nice.

Comment by Vladislavs Sokurenko [ 2019 Mar 06 ]

Fixed in development branch:
svn://svn.zabbix.com/branches/dev/ZBX-15774

fixed old trends not being deleted by housekeeper and a memory leak in housekeeper when elasticsearch is used.

Trends and item delete would be queued for deletion but never get deleted so vector would grow.

Comment by Vladislavs Sokurenko [ 2019 Mar 11 ]

Fixed in:

  • pre-4.0.6rc1 r90872
  • pre-4.2.0rc1 (trunk) r90873
Comment by Oleg Morozov [ 2019 Mar 11 ]

Спасибо. Thanks.

Generated at Thu Apr 25 17:16:43 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.