[ZBX-9084] Zabbix Proxy allocates more memory that necessary when it starts Created: 2014 Nov 27  Updated: 2024 Apr 10  Resolved: 2019 Dec 19

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P)
Affects Version/s: 2.4.1, 2.4.2, 4.0.12
Fix Version/s: 4.0.16rc1, 4.4.4rc1, 5.0.0alpha1, 5.0 (plan)

Type: Problem report Priority: Trivial
Reporter: Alisson Ramos de Oliveira Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 11
Labels: PGSQL, sqlite
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Zabbix Proxy using SQLite3 installed by rpm


Attachments: File ZBX-9084-test.patch     JPEG File Zabbix Proxy Cache Utilisation James Cook.JPG     File last_ps_with_database.log     File last_ps_without_database.log     Text File smap.txt     Text File smaps_vmware.txt     File valgrind_proxy_new.log     File valgrind_proxy_with.log     File zabbix_proxy.log     HTML File zabbix_proxy_configure_options_james_cook     Text File zabbix_proxy_debug_log_james_cook.txt     HTML File zabbix_proxy_process_listing_james_cook     File zabbix_proxy_smaps_james_cook.tar.gz    
Issue Links:
Causes
Duplicate
is duplicated by ZBXNEXT-5334 Huge memory use on proxy docker conta... Closed
is duplicated by ZBX-16582 Zabbix Proxy fails to start : zabbix_... Closed
Team: Team A
Sprint: Sprint 58 (Nov 2019), Sprint 59 (Dec 2019)
Story Points: 0.5

 Description   

When zabbix_proxy starts without database and create it, that's ok. It only allocate the memory to procced with the collect data.

On the other hand, if database exists and have, for example, 100M, zabbix_proxy allocate about 100M for each process. In a host with 2GB of RAM, with, for example, 40 pollers, the zabbix_proxy is dead by "out of memory" killer because it has allocated a lot of memory.

If database is removed and zabbix_proxy is restarted, the daemon starts normally and allocate much less memory. A new restart on the process cause out of memory again.



 Comments   
Comment by richlv [ 2014 Nov 28 ]

just to confirm, that's not shared memory between all of those processes ?

Comment by Aleksandrs Saveljevs [ 2014 Nov 28 ]

With an existing database, how long does the proxy work after start, before it gets terminated by OOM killer? Could it be a memory leak? Could you please show cache settings from your proxy configuration file? How many items are monitored by proxy?

Comment by Alisson Ramos de Oliveira [ 2014 Nov 28 ]

richlv, if you mean /dev/shm, so not. The database is in /etc/zabbix.

The shared memory (IPC) show me:

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x68021a2d 3801088    zabbix     600        8388575    74                      
0x78021a2d 3833857    zabbix     600        8388608    74                      
0x67021a2d 3866626    zabbix     600        43673191   74                      
0x73021a2d 3899395    zabbix     600        7707033    74                      
0x53021a2d 3932164    zabbix     600        18688      74                      

The same result is without database, but the process not dead.

The data received from server (when database does not exists):

 27776:20141128:081805.798 received configuration data from server, datalen 20327247

The items by status (at this point, i think the proxy should not receive the disabled items, but it's another problem):

sqlite> SELECT status,COUNT(*) from items GROUP BY 1;
0|93865
1|3

Attached the log.

The CacheSize parameter is on 49M. If change to lower value the proxy dies saying "Increase the CacheSize parameter".

The start pollers that is as follow:

StartPollers=30
StartPingers=20
StartDiscoverers=3
StartSNMPTrapper=1

I guess the problem is the fork() process. If the parent process has a huge memory usage (49M, in this case), the child, at the fork moment, will have the same huge memory of the parent. But it is just a suspicion.

One last information: the database (sqlite) has 168M now, but start arround 32M.

Comment by Alisson Ramos de Oliveira [ 2014 Nov 28 ]

I made a new test. The command "ps aux | grep zabbix_proxy" show me that the RSS memory usage is very different with and without database.

The file "last_ps_with_database.log" show the last "ps" before process die. The "last_ps_without_database.log" is at the moment before all process starts.

Comment by Aleksandrs Saveljevs [ 2014 Nov 28 ]

This looks like it might be a memory leak. Would it be possible for you to run Zabbix proxy under Valgrind? It can be done in the following manner:

  1. stop zabbix_proxy
  2. $ valgrind --tool=memcheck --leak-check=full --log-file=/tmp/valgrind_proxy.log /usr/local/sbin/zabbix_proxy ...
  3. allow the proxy to work for some time
  4. stop zabbix_proxy
  5. attach the received file here (/tmp/valgrind_proxy.log)
Comment by Alisson Ramos de Oliveira [ 2014 Nov 28 ]

follows the requested file.

The problem with valgrind is that one process die, the zabbix_proxy does not stops... I had kill all process manually.

Comment by Aleksandrs Saveljevs [ 2014 Nov 28 ]

Attached Valgrind log seems to show quite a number of "possibly lost" bytes:

$ grep lost valgrind_proxy_with.log | grep possibly.*,.*bytes
==11345==      possibly lost: 388,704 bytes in 4,312 blocks
==11349==      possibly lost: 393,608 bytes in 4,318 blocks
==11346==      possibly lost: 388,704 bytes in 4,312 blocks
==11347==      possibly lost: 388,704 bytes in 4,312 blocks
==11348==      possibly lost: 388,704 bytes in 4,312 blocks
==11341==      possibly lost: 286,336 bytes in 4,234 blocks
==11283==      possibly lost: 287,600 bytes in 4,234 blocks
==11316==      possibly lost: 394,304 bytes in 4,321 blocks
==11317==      possibly lost: 287,600 bytes in 4,234 blocks
==11315==      possibly lost: 287,600 bytes in 4,234 blocks
==11319==      possibly lost: 389,984 bytes in 4,313 blocks

However, according to http://stackoverflow.com/questions/6771610/what-does-possible-lost-means-in-valgrind , this seems to be OK.

There are two instances of the following leak:

==11317== 16 bytes in 1 blocks are definitely lost in loss record 33 of 228
==11317==    at 0x4A06BE0: realloc (vg_replace_malloc.c:662)
==11317==    by 0x359C81DD25: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==11317==    by 0x359C809670: ??? (in /usr/lib64/libsqlite3.so.0.8.6)
==11317==    by 0x359C866D73: sqlite3_get_table (in /usr/lib64/libsqlite3.so.0.8.6)
==11317==    by 0x49CE5C: zbx_db_vselect (in /usr/sbin/zabbix_proxy_sqlite3)
==11317==    by 0x478A8F: __zbx_DBselect (in /usr/sbin/zabbix_proxy_sqlite3)
==11317==    by 0x42E375: ??? (in /usr/sbin/zabbix_proxy_sqlite3)
==11317==    by 0x42E82F: send_list_of_active_checks_json (in /usr/sbin/zabbix_proxy_sqlite3)
==11317==    by 0x42CDCE: ??? (in /usr/sbin/zabbix_proxy_sqlite3)
==11317==    by 0x42D417: trapper_thread (in /usr/sbin/zabbix_proxy_sqlite3)
==11317==    by 0x459381: zbx_thread_start (in /usr/sbin/zabbix_proxy_sqlite3)
==11317==    by 0x412B1F: MAIN_ZABBIX_ENTRY (in /usr/sbin/zabbix_proxy_sqlite3)

They should be investigated. However, they are very small and cannot serve as an explanation for going out of memory.

Comment by Alisson Ramos de Oliveira [ 2014 Nov 28 ]

I made a new test, this time I don't remove the proxy.db. Previously, the test was started with no sqlite database created.

For mapping what process is what, follow the start of the test:

 16349:20141128:131309.174 proxy #0 started [main process]
 16366:20141128:131309.221 proxy #1 started [configuration syncer #1]
 16367:20141128:131309.287 proxy #2 started [heartbeat sender #1]
 16368:20141128:131309.394 proxy #3 started [data sender #1]
 16369:20141128:131309.514 proxy #4 started [poller #1]
 16370:20141128:131309.674 proxy #5 started [poller #2]
 16371:20141128:131309.792 proxy #6 started [poller #3]
 16372:20141128:131310.060 proxy #7 started [poller #4]
 16373:20141128:131310.118 proxy #8 started [poller #5]
 16374:20141128:131310.267 proxy #9 started [poller #6]
 16375:20141128:131310.366 proxy #10 started [poller #7]
 16376:20141128:131310.657 proxy #11 started [poller #8]
 16380:20141128:131311.306 proxy #15 started [poller #12]
 16381:20141128:131311.847 proxy #16 started [poller #13]
 16377:20141128:131311.986 proxy #12 started [poller #9]
 16378:20141128:131312.496 proxy #13 started [poller #10]
 16386:20141128:131312.636 proxy #21 started [poller #18]
 16384:20141128:131312.603 proxy #19 started [poller #16]
 16382:20141128:131312.698 proxy #17 started [poller #14]
 16383:20141128:131312.657 proxy #18 started [poller #15]
 16391:20141128:131313.000 proxy #26 started [poller #23]
 16393:20141128:131312.994 proxy #28 started [poller #25]
 16385:20141128:131312.864 proxy #20 started [poller #17]
 16387:20141128:131313.403 proxy #22 started [poller #19]
 16379:20141128:131313.400 proxy #14 started [poller #11]
 16394:20141128:131313.499 proxy #29 started [poller #26]
 16389:20141128:131313.509 proxy #24 started [poller #21]
 16396:20141128:131313.537 proxy #31 started [poller #28]
 16397:20141128:131313.544 proxy #32 started [poller #29]
 16390:20141128:131313.645 proxy #25 started [poller #22]
 16388:20141128:131313.731 proxy #23 started [poller #20]
 16392:20141128:131313.679 proxy #27 started [poller #24]
 16398:20141128:131313.733 proxy #33 started [poller #30]
 16395:20141128:131313.807 proxy #30 started [poller #27]
 16399:20141128:131314.084 proxy #34 started [unreachable poller #1]
 16400:20141128:131315.964 proxy #35 started [trapper #1]
 16407:20141128:131319.182 proxy #42 started [icmp pinger #3]
 16405:20141128:131319.178 proxy #40 started [icmp pinger #1]
 16406:20141128:131319.518 proxy #41 started [icmp pinger #2]
 16403:20141128:131319.705 proxy #38 started [trapper #4]
 16402:20141128:131319.743 proxy #37 started [trapper #3]
 16401:20141128:131319.701 proxy #36 started [trapper #2]
 16404:20141128:131320.073 proxy #39 started [trapper #5]
 16408:20141128:131320.050 proxy #43 started [icmp pinger #4]
 16409:20141128:131320.146 proxy #44 started [icmp pinger #5]
 16410:20141128:131320.203 proxy #45 started [icmp pinger #6]
 16411:20141128:131320.956 proxy #46 started [icmp pinger #7]
 16412:20141128:131323.163 proxy #47 started [icmp pinger #8]
 16416:20141128:131323.125 proxy #50 started [icmp pinger #11]
 16419:20141128:131323.129 proxy #53 started [icmp pinger #14]
 16414:20141128:131323.119 proxy #48 started [icmp pinger #9]
 16417:20141128:131323.133 proxy #51 started [icmp pinger #12]
 16418:20141128:131323.145 proxy #52 started [icmp pinger #13]
 16415:20141128:131323.169 proxy #49 started [icmp pinger #10]
 16420:20141128:131323.323 proxy #54 started [icmp pinger #15]
 16422:20141128:131323.423 proxy #56 started [icmp pinger #17]
 16421:20141128:131323.644 proxy #55 started [icmp pinger #16]
 16423:20141128:131323.877 proxy #57 started [icmp pinger #18]
 16424:20141128:131324.196 proxy #58 started [icmp pinger #19]
 16425:20141128:131324.707 proxy #59 started [icmp pinger #20]
 16426:20141128:131325.180 proxy #60 started [housekeeper #1]
 16427:20141128:131326.082 proxy #61 started [http poller #1]
 16433:20141128:131328.470 proxy #66 started [history syncer #2]
 16432:20141128:131328.470 proxy #65 started [history syncer #1]
 16434:20141128:131328.674 proxy #67 started [history syncer #3]
 16436:20141128:131328.947 proxy #69 started [snmp trapper #1]
 16437:20141128:131328.803 proxy #70 started [self-monitoring #1]
 16435:20141128:131329.495 proxy #68 started [history syncer #4]
 16349:20141128:131355.239 One child process died (PID:16367,exitcode/signal:9). Exiting ...
Comment by Aleksandrs Saveljevs [ 2014 Dec 01 ]

The new log shows no sign of a memory leak either.

Comment by Alisson Ramos de Oliveira [ 2014 Dec 10 ]

I discovered the problem. The OOM occurs only in VMWare's hypervisor.

I made a test with same zabbix_proxy.conf and sqlite3 database on vSphere 5.1 and Xen Server 6.2. In Xen Server the proxy use ~800MB when it starts (71 pollers). In VMWare it uses all memory and is killed by OOM-Killer.

In test was used 2 GB of RAM and proxy database with 84MB.

nevertheless, i think that ~800MB is a lot of memory to use by proxy. But OK. Now, I will investigate the memory alocation in VMWare.

Comment by Sergey Usov [ 2015 Apr 14 ]

Same error on hyper-v

Zabbix proxy v2.4.4 (revision 52341) (23 February 2015)

3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64

Comment by Igor Golubkov [ 2017 Apr 10 ]

Same problem with 3.2.4. I see like x4 memory usage if sqlite database is present at proxy start. This happens only with sqlite, pgsql works normal.

Comment by Kaue Pereira [ 2019 Mar 27 ]

Still occurs on 4.0.5  proxies with sqlite and virtualized enviroments such as hyper-v.

Comment by Daniel [ 2019 Jul 20 ]

We're seeing similar issues on 4.2.4 running on CentOS (Hyper-V). 

Comment by James Cook [ 2019 Aug 30 ]

Im seeing issues like this which I have tried absolutely everything to resolve (https://support.zabbix.com/browse/ZBX-16582)

 

Centos 7.6

PostgreSQL 11

Zabbix 4.0.7

Comment by Vladislavs Sokurenko [ 2019 Aug 30 ]

Is there only a problem when using some kind of virtualization ? Could you please be so kind and describe your environment further (configuration file and VMware version) so we can try to reproduce the issue, have you tried appliance supplied by Zabbix ?

Have you tried MySQL ?

Comment by Daniel [ 2019 Aug 30 ]

We didn't try appliance for proxies - especially not when using sqlite, as we're using mysql when proxies process more values. Our current solution is to change installation from sqlite to mysql because the issue doesn't occur here anymore.

We've seen this issue running on Hyper-V based on Windows Server 2016.

Comment by Vladislavs Sokurenko [ 2019 Aug 30 ]

It could help if you provide memory usage for Zabbix proxy process ids
For example:

cat /proc/$pid/smaps
Comment by Daniel [ 2019 Aug 30 ]

I can check this when we've the issue again - currently all systems have been ported to mysql to bypass the issue.

Comment by James Cook [ 2019 Aug 31 ]

If this helps:

I have discovered the only way the proxy can be started is using the following method:

  1. Disable all hosts monitored by the proxy
  2. Reload the Zabbix server cache
  3. Restart the Zabbix proxy
  4. Enable all hosts monitored by the proxy
  5. Reload the Zabbix server cache
  6. Reload the Zabbix proxy cache
  7. Monitoring will commence

I have had the VMWare infrastructure team check the environment and they said everything is operating normally and no changes have been made on their side.

Environment:

Centos 7.6

PostgreSQL 11

Zabbix 4.0.7

Comment by James Cook [ 2019 Aug 31 ]

I will check out Vladislavs's comment on Monday when back at work, as I can reproduce this with zabbix 4.0.7 and postgres 11... I have also update issue with debug level 5 logs if this helps ZBX-16582.

Comment by James Cook [ 2019 Sep 02 ]

Hi Vladislavs,

 

I have uploaded 3 files:

* zabbix_proxy_process_listing_james_cook.txt - Process listing of all Zabbix Proxy processes

* zabbix_proxy_smaps_james_cook.tar.gz - Compressed tar file containing all the smap details

* zabbix_proxy_configure_options_james_cook - Configure options used to compile our Zabbix proxy binary

 

We use a compiled binary (see configure options) and load a loadable module that utilises python which is why you will see zabbix_proxy_custom and references to python in the smap details.

 

I have tested and confirmed that this issue still occurs using the out of the box Zabbix proxy binary with no loaded modules present.

 

One other piece of information is the size of the configuration sent from the server which is 'received configuration data from server at "XXXXXXXX", datalen 85835245'

 

Cheers

James

Comment by Vladislavs Sokurenko [ 2019 Sep 02 ]

You could try experimenting with minimal set of flags and then increase and see if memory increases with the addition of some flag

./bootstrap.sh ; ./configure --prefix=$(pwd) --enable-proxy --with-postgresql

In my case it was very low memory consumption

vso        38833  0.0  0.0  23148  7368 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy -c /etc/zabbix/zabbix_proxy.conf --foreground
vso        38837  0.0  0.0  23148  2672 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: configuration syncer [loading configuration]
vso        38838  0.0  0.0  23148  2672 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: heartbeat sender [sending heartbeat message failed in 0.000169 sec, idle 60 sec]
vso        38839  0.0  0.0  23148  4084 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: data sender [sent 0 values in 0.000000 sec, sending data]
vso        38840  0.0  0.0  23148  2672 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: housekeeper [startup idle for 30 minutes]
vso        38841  0.0  0.0  23148  2672 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: http poller #1 [got 0 values in 0.000437 sec, idle 5 sec]
vso        38842  0.0  0.0  23148  2672 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: discoverer #1 [processed 0 rules in 0.001415 sec, idle 60 sec]
vso        38844  0.0  0.0  23148  2672 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: history syncer #1 [processed 0 values in 0.000036 sec, idle 1 sec]
vso        38846  0.0  0.0  23148  2672 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: self-monitoring [processed data in 0.000033 sec, idle 1 sec]
vso        38848  0.0  0.0  23148  2672 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: task manager [processed 0 task(s) in 0.000778 sec, idle 5 sec]
vso        38849  0.0  0.0  25312  5956 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: poller #1 [got 0 values in 0.000013 sec, idle 5 sec]
vso        38851  0.0  0.0  25312  5956 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: poller #2 [got 0 values in 0.000016 sec, idle 5 sec]
vso        38852  0.0  0.0  25312  5956 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: poller #3 [got 0 values in 0.000011 sec, idle 5 sec]
vso        38854  0.0  0.0  25312  5956 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: poller #4 [got 0 values in 0.000015 sec, idle 5 sec]
vso        38856  0.0  0.0  25312  5956 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: poller #5 [got 0 values in 0.000015 sec, idle 5 sec]
vso        38858  0.0  0.0  25312  5952 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: unreachable poller #1 [got 0 values in 0.000014 sec, idle 5 sec]
vso        38860  0.0  0.0  23148  2612 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: trapper #1 [processed data in 0.000000 sec, waiting for connection]
vso        38861  0.0  0.0  23148  2612 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: trapper #2 [processed data in 0.000000 sec, waiting for connection]
vso        38867  0.0  0.0  23148  2612 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: trapper #3 [processed data in 0.000000 sec, waiting for connection]
vso        38869  0.0  0.0  23148  2612 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: trapper #4 [processed data in 0.000000 sec, waiting for connection]
vso        38870  0.0  0.0  23148  2612 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: trapper #5 [processed data in 0.000000 sec, waiting for connection]
vso        38871  0.0  0.0  23148  2612 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: trapper #6 [processed data in 0.000000 sec, waiting for connection]
vso        38872  0.0  0.0  23148  2612 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: trapper #7 [processed data in 0.000000 sec, waiting for connection]
vso        38873  0.0  0.0  23148  2612 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: trapper #8 [processed data in 0.000000 sec, waiting for connection]
vso        38874  0.0  0.0  23148  2612 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: trapper #9 [processed data in 0.000000 sec, waiting for connection]
vso        38875  0.0  0.0  23148  2612 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: trapper #10 [processed data in 0.000000 sec, waiting for connection]
vso        38876  0.0  0.0  23148  2612 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: trapper #11 [processed data in 0.000000 sec, waiting for connection]
vso        38877  0.0  0.0  23148  2612 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: trapper #12 [processed data in 0.000000 sec, waiting for connection]
vso        38878  0.0  0.0  25304  5040 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: icmp pinger #1 [got 0 values in 0.000012 sec, idle 5 sec]
vso        38879  0.0  0.0  23148  2708 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: preprocessing manager #1 [queued 0, processed 0 values, idle 5.005142 sec during 5.005226 sec]
vso        38880  0.0  0.0  23148  2840 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: preprocessing worker #1 started
vso        38881  0.0  0.0  23148  2840 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: preprocessing worker #2 started
vso        38883  0.0  0.0  23148  2840 pts/0    S+   10:49   0:00 ./sbin/zabbix_proxy: preprocessing worker #3 started

Also not sure if issue is present in Centos 7.6 have you tried other operating systems ? We should try to reproduce the issue with same set of flags that you use.

I am using Fedora release 30 (Thirty)

Linux version 5.2.9-200.fc30.x86_64 ([email protected]) (gcc version 9.1.1 20190503 (Red Hat 9.1.1-1) (GCC)) #1 SMP Fri Aug 16 21:37:45 UTC 2019
Comment by DaneT [ 2019 Sep 02 ]

Hello James

On the VirtualBox, compiled proxy with the same options as your memory usage stays low:

zabbix   12302  0.0  0.2 158232  5792 ?        S    13:32   0:00 sbin/zabbix_proxy --config /etc/zabbix/zabbix_proxy.conf
zabbix   12306  0.0  0.3 158464  6572 ?        S    13:32   0:00 sbin/zabbix_proxy: configuration syncer [synced config 20671 bytes in 0.213440 sec, idle 3600 sec]
zabbix   12307  0.0  0.2 158232  5372 ?        S    13:32   0:00 sbin/zabbix_proxy: heartbeat sender [sending heartbeat message success in 0.011470 sec, idle 60 sec]
zabbix   12308  0.0  0.2 158284  6000 ?        S    13:32   0:00 sbin/zabbix_proxy: data sender [sent 1 values in 0.002565 sec, idle 1 sec]
zabbix   12309  0.0  0.1 158232  3120 ?        S    13:32   0:00 sbin/zabbix_proxy: housekeeper [startup idle for 30 minutes]
zabbix   12310  0.0  0.2 158232  5300 ?        S    13:32   0:00 sbin/zabbix_proxy: http poller #1 [got 0 values in 0.000708 sec, idle 5 sec]
zabbix   12311  0.2  0.4 262736  9680 ?        S    13:32   0:00 sbin/zabbix_proxy: discoverer #1 [processed 0 rules in 0.001292 sec, idle 60 sec]
zabbix   12312  0.0  0.2 158232  5300 ?        S    13:32   0:00 sbin/zabbix_proxy: history syncer #1 [processed 0 values in 0.000018 sec, idle 1 sec]
zabbix   12313  0.0  0.2 158232  5300 ?        S    13:32   0:00 sbin/zabbix_proxy: history syncer #2 [processed 1 values in 0.001205 sec, idle 1 sec]
zabbix   12314  0.0  0.2 158232  5300 ?        S    13:32   0:00 sbin/zabbix_proxy: history syncer #3 [processed 0 values in 0.000014 sec, idle 1 sec]
zabbix   12315  0.0  0.2 158232  5300 ?        S    13:32   0:00 sbin/zabbix_proxy: history syncer #4 [processed 0 values in 0.000018 sec, idle 1 sec]
zabbix   12316  0.0  0.2 158232  4604 ?        S    13:32   0:00 sbin/zabbix_proxy: self-monitoring [processed data in 0.000037 sec, idle 1 sec]
zabbix   12317  0.0  0.2 158232  5796 ?        S    13:32   0:00 sbin/zabbix_proxy: task manager [processed 0 task(s) in 0.000414 sec, idle 5 sec]
zabbix   12318  0.2  0.4 266080  9744 ?        S    13:32   0:00 sbin/zabbix_proxy: poller #1 [got 1 values in 0.000047 sec, idle 2 sec]
zabbix   12319  0.2  0.4 266060  9744 ?        S    13:32   0:00 sbin/zabbix_proxy: poller #2 [got 0 values in 0.000007 sec, idle 2 sec]
zabbix   12320  0.2  0.4 266064  9744 ?        S    13:32   0:00 sbin/zabbix_proxy: poller #3 [got 0 values in 0.000006 sec, idle 2 sec]
zabbix   12321  0.2  0.4 266060  9744 ?        S    13:32   0:00 sbin/zabbix_proxy: poller #4 [got 0 values in 0.000006 sec, idle 2 sec]
zabbix   12322  0.2  0.4 266068  9744 ?        S    13:32   0:00 sbin/zabbix_proxy: poller #5 [got 0 values in 0.000008 sec, idle 2 sec]
zabbix   12323  0.2  0.4 266060  9744 ?        S    13:32   0:00 sbin/zabbix_proxy: unreachable poller #1 [got 0 values in 0.000031 sec, idle 5 sec]
zabbix   12324  0.0  0.2 158232  5380 ?        S    13:32   0:00 sbin/zabbix_proxy: trapper #1 [processed data in 0.000000 sec, waiting for connection]
zabbix   12325  0.0  0.2 158232  5380 ?        S    13:32   0:00 sbin/zabbix_proxy: trapper #2 [processed data in 0.000000 sec, waiting for connection]
zabbix   12326  0.0  0.2 158232  5380 ?        S    13:32   0:00 sbin/zabbix_proxy: trapper #3 [processed data in 0.000000 sec, waiting for connection]
zabbix   12327  0.0  0.2 158232  5380 ?        S    13:32   0:00 sbin/zabbix_proxy: trapper #4 [processed data in 0.000000 sec, waiting for connection]
zabbix   12328  0.0  0.2 158232  5380 ?        S    13:32   0:00 sbin/zabbix_proxy: trapper #5 [processed data in 0.000000 sec, waiting for connection]
zabbix   12329  0.0  0.2 161544  4604 ?        S    13:32   0:00 sbin/zabbix_proxy: icmp pinger #1 [got 0 values in 0.000010 sec, idle 5 sec]
postgres 12330  0.0  0.4 236292  9908 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49106) idle
postgres 12331  0.0  0.5 236292 10280 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49108) idle
postgres 12332  0.0  0.4 236292  9904 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49110) idle
postgres 12333  0.0  0.4 236296  9900 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49112) idle
postgres 12334  0.0  0.5 236380 10480 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49114) idle
postgres 12335  0.1  0.6 238596 13264 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49116) idle
postgres 12336  0.1  0.5 236948 11028 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49120) idle
postgres 12337  0.0  0.4 236296  9512 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49122) idle
postgres 12338  0.0  0.5 236992 10900 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49124) idle
postgres 12339  0.0  0.4 236296  9516 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49126) idle
postgres 12340  0.0  0.4 236296  9508 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49132) idle
postgres 12341  0.0  0.4 236296  9512 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49134) idle
postgres 12342  0.0  0.4 236296  9512 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49136) idle
postgres 12343  0.0  0.4 236296  9516 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49138) idle
postgres 12344  0.0  0.4 236296  9512 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49140) idle
postgres 12345  0.0  0.4 236296  9512 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49142) idle
postgres 12346  0.0  0.4 236296  9516 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49144) idle
postgres 12347  0.0  0.4 236296  9508 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49146) idle
postgres 12348  0.0  0.4 236296  9516 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49148) idle
postgres 12349  0.0  0.5 236516 10696 ?        Ss   13:32   0:00 postgres: zabbix_proxy zabbix_proxy 127.0.0.1(49150) idle

I also attached smaps of one of the poller processes

Comment by Vladislavs Sokurenko [ 2019 Sep 02 ]

This looks as the most suspicious place:

7fdd864b5000-7fddc64b5000 rw-s 00000000 00:04 393222                     /SYSV00000000 (deleted)
Size:            1048576 kB
Rss:               91908 kB
Pss:                 605 kB
Shared_Clean:          0 kB
Shared_Dirty:      91908 kB
Private_Clean:         0 kB
Private_Dirty:         0 kB
Referenced:        91908 kB
Anonymous:             0 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr sh mr mp me ms sd 

Could you please share values of following parameters ?

HistoryIndexCacheSize
CacheSize
HistoryCacheSize

Is HistoryCacheSize set to 1G ?

Comment by DaneT [ 2019 Sep 02 ]

Compiled proxy running on VMWare, using all the provided keys, except for jabber:

zabbix   101654  0.0  0.1 898680  3028 ?        S    15:58   0:00 zabbix_proxy --config /etc/zabbix/zabbix_proxy.conf
zabbix   101659  0.0  0.2 899028  3780 ?        S    15:58   0:00 zabbix_proxy: configuration syncer [synced config 18485 bytes in 0.049150 sec, idle 3600 sec]
zabbix   101660  0.0  0.1 898796  3080 ?        S    15:58   0:00 zabbix_proxy: heartbeat sender [sending heartbeat message success in 0.025432 sec, idle 60 sec]
zabbix   101661  0.0  0.1 898848  3540 ?        S    15:58   0:00 zabbix_proxy: data sender [sent 1 values in 0.003230 sec, idle 1 sec]
zabbix   101662  0.0  0.1 898680  1872 ?        S    15:58   0:00 zabbix_proxy: housekeeper [startup idle for 30 minutes]
zabbix   101663  0.0  0.1 898680  2268 ?        S    15:58   0:00 zabbix_proxy: http poller #1 [got 0 values in 0.000572 sec, idle 5 sec]
zabbix   101664  0.0  0.2 1003200 4628 ?        S    15:58   0:00 zabbix_proxy: discoverer #1 [processed 0 rules in 0.001561 sec, idle 60 sec]
zabbix   101665  0.0  0.1 898680  2280 ?        S    15:58   0:00 zabbix_proxy: history syncer #1 [processed 0 values in 0.000008 sec, idle 1 sec]
zabbix   101666  0.0  0.1 898680  2280 ?        S    15:58   0:00 zabbix_proxy: history syncer #2 [processed 1 values in 0.000656 sec, idle 1 sec]
zabbix   101667  0.0  0.1 898680  2280 ?        S    15:58   0:00 zabbix_proxy: history syncer #3 [processed 0 values in 0.000009 sec, idle 1 sec]
zabbix   101668  0.0  0.1 898680  2280 ?        S    15:58   0:00 zabbix_proxy: history syncer #4 [processed 0 values in 0.000010 sec, idle 1 sec]
zabbix   101669  0.0  0.1 898680  1996 ?        S    15:58   0:00 zabbix_proxy: self-monitoring [processed data in 0.000023 sec, idle 1 sec]
zabbix   101670  0.0  0.1 898812  3224 ?        S    15:58   0:00 zabbix_proxy: task manager [processed 0 task(s) in 0.000311 sec, idle 5 sec]
zabbix   101671  0.0  0.2 1006540 4888 ?        S    15:58   0:00 zabbix_proxy: poller #1 [got 0 values in 0.000010 sec, idle 1 sec]
zabbix   101672  0.0  0.2 1006520 4628 ?        S    15:58   0:00 zabbix_proxy: poller #2 [got 0 values in 0.000012 sec, idle 1 sec]
zabbix   101673  0.0  0.2 1006540 4888 ?        S    15:58   0:00 zabbix_proxy: poller #3 [got 0 values in 0.000029 sec, idle 1 sec]
zabbix   101674  0.0  0.2 1006540 4888 ?        S    15:58   0:00 zabbix_proxy: poller #4 [got 0 values in 0.000009 sec, idle 1 sec]
zabbix   101675  0.0  0.2 1006524 4888 ?        S    15:58   0:00 zabbix_proxy: poller #5 [got 1 values in 0.000069 sec, idle 1 sec]
zabbix   101676  0.0  0.2 1006520 4628 ?        S    15:58   0:00 zabbix_proxy: unreachable poller #1 [got 0 values in 0.000051 sec, idle 5 sec]
zabbix   101677  0.0  0.1 898812  3012 ?        S    15:58   0:00 zabbix_proxy: trapper #1 [processed data in 0.000000 sec, waiting for connection]
zabbix   101678  0.0  0.1 898812  3012 ?        S    15:58   0:00 zabbix_proxy: trapper #2 [processed data in 0.000000 sec, waiting for connection]
zabbix   101679  0.0  0.1 898812  3012 ?        S    15:58   0:00 zabbix_proxy: trapper #3 [processed data in 0.000000 sec, waiting for connection]
zabbix   101680  0.0  0.1 898812  3012 ?        S    15:58   0:00 zabbix_proxy: trapper #4 [processed data in 0.000000 sec, waiting for connection]
zabbix   101681  0.0  0.1 898812  3012 ?        S    15:58   0:00 zabbix_proxy: trapper #5 [processed data in 0.000000 sec, waiting for connection]
zabbix   101682  0.0  0.1 901992  1988 ?        S    15:58   0:00 zabbix_proxy: icmp pinger #1 [got 0 values in 0.000060 sec, idle 5 sec]
postgres 101683  0.0  0.2 236284  5180 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49154) idle
postgres 101684  0.0  0.3 236288  5980 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49156) idle
postgres 101685  0.0  0.2 236288  5184 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49158) idle
postgres 101686  0.0  0.3 236984  6488 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49160) idle
postgres 101687  0.0  0.4 238580  9172 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49162) idle
postgres 101688  0.2  0.3 236940  6764 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49166) idle
postgres 101689  0.0  0.3 236288  5976 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49168) idle
postgres 101690  0.0  0.3 236288  5980 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49170) idle
postgres 101691  0.0  0.3 236288  5980 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49172) idle
postgres 101692  0.0  0.3 236372  6016 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49174) idle
postgres 101693  0.0  0.2 236288  5188 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49176) idle
postgres 101694  0.0  0.2 236288  5184 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49178) idle
postgres 101695  0.0  0.2 236288  5184 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49180) idle
postgres 101696  0.0  0.2 236288  5184 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49182) idle
postgres 101697  0.0  0.3 236508  6312 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49184) idle
postgres 101698  0.0  0.2 236288  5188 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49186) idle
postgres 101699  0.0  0.2 236288  5184 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49188) idle
postgres 101700  0.0  0.2 236288  5188 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49190) idle
postgres 101701  0.0  0.2 236288  5188 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49192) idle
postgres 101702  0.0  0.2 236288  5184 ?        Ss   15:58   0:00 postgres: zabbix zabbix_proxy 127.0.0.1(49194) idle
Comment by James Cook [ 2019 Sep 02 ]

Hi Vladislavs,

Yes the cache size is set to 1gb. Is this any issue as it has been configured like this for months and been restarted many times.

It's super wierd why this only started the other day?

Cheers
James

Comment by Vladislavs Sokurenko [ 2019 Sep 02 ]

Wouldn't hurt to monitor write cache on Zabbix proxy

zabbix[wcache,<cache>,<mode>] 

My main suspicion is that it's related to write cache being filled up due to slow database, thus increasing memory usage. Is it possible that the majority of memory is not consumed by Zabbix proxy but by something else ? Could you please show output of top.

Also can you confirm that decreasing HistoryCacheSize helps with the issue ?

Comment by Leonardo Vendrusculo [ 2019 Sep 02 ]

 I don't know very well how it work... but for me the problem is:

sqlite database size * StartPollers = Needed RAM to start the proxy

so all this memory size is not allocated during the execution of the proxy... is being demanded just to start the service, maybe something related to the fork process

In my case, if I start the proxy service with an existent sqlite database the memory needed is about 12GB of RAM and zabbix never free this memory... if I started the service without a previous database, the usage is up to 1.5GB.

Comment by Daniel [ 2019 Sep 02 ]

We have exact the same behaviour as Leonardo - the allocation happens only on startup with an existing database. So the error triggers mostly after the proxy is running a while and restarted during some maintenance/updates.

Comment by James Cook [ 2019 Sep 03 ]

Hi Vladislavs,

I have attached a screenshot of the last 7 days for the configuration, history and history index cache (Zabbix Proxy Cache Utilisation James Cook). You can see there is no utilisation issue here and the gaps are cause by the outage we have had due to this issue.

The issue does look to be as described by Daniel and Leonardo, where the it only occurs on startup when the proxies are forked and they have enabled monitoring to perform.

This is why a clean database or my workaround (disabling hosts) functions; I suspect you could also reassign the monitored hosts to a different proxy, restart the proxy and then reassign the hosts.

Looking at the debug logs in my issue (https://support.zabbix.com/browse/ZBX-16582) I saw the following behaviour:

  • Loads existing configuration from database (See the attached debug log)
  • Forks processes
  • Loads updated configuration from Zabbix server

In terms of forking memory usage, it utilises the CoW (copy on write) method; which means that the forked process will utilise the parents memory as shared memory and only duplicate this if a write is performed.

My suspicion is that for some reason the Zabbix pollers when they are initially forked are using a large amount of the parents shared memory and then perform some write operation which suddenly consumes all the system memory as it needs to be duplicated.

An interesting article regarding the pitfalls of forking is : https://devarea.com/linux-fork-system-call-and-its-pitfalls

The behavior I observe when the fork error message does occur is that there was at least some Zabbix pollers that did succeed to start and the rest failed due to the memory allocation.

Possible solutions which may resolve this:

  • Zabbix pollers are forked prior to the initial configuration load
  • Determine why Zabbix pollers are initially writing to the parents shared memory (causing the memory duplication)

The other thing to keep in mind that may contribute to this issue is enabling / disabling memory over commit at the OS or Hypervisor layer, which is enabled by default at the OS layer. In our case we disabled over commit at the OS layer as we deem it a safer option and we reserved the memory within the VMWare Hypervisor. With that in mind any process needs to be a slimline as possible in terms of memory consumption.

Hope this helps.

Cheers

James

Comment by Vladislavs Sokurenko [ 2019 Sep 03 ]

Can you please decrease memory of those variables one by one and see which one helps to decrease the overall memory (if helps) ?

HistoryIndexCacheSize
CacheSize
HistoryCacheSize
Comment by James Cook [ 2019 Sep 04 ]

Hi Vladislavs,

While we were having the original issue prior to the work around I did in fact lower these at the time to 256M each, which allowed me to start more Zabbix pollers at the time, however only 10 or so.

Yesterday we enabled memory over commit on our Centos boxes which is allowing them to start up with 100 pollers, however the question I am still yet to answer is why all of a sudden did this occur on our proxies and was reproducible.

To enable memory over commit on Centos the following was added to sysctl:

  • vm.overcommit_memory = 0
  • vm.overcommit_ratio = 50 (ie allow 150% X configured memory)

To disable memory over commit on Centos the following was added to sysctl:

  • vm.overcommit_memory = 2
  • vm.overcommit_ratio = 50

It may be worth anyone who is experiencing this issue may enable memory over commit and confirm this allows the Zabbix proxy to start; Or for yourself to disable memory over commit and see if this then effects your Zabbix proxy.

NOTE: You can also change the ratio when over enabling memory over commit ie vm.overcommit_ratio = 1 (allow 101% configured memory) - So you may be able to force the issue by lowering this when memory over commit is enabled.

So the question is still why does the Zabbix proxy require so much memory upon start, and after the first configuration sync it has no issues after that? Which I suspect is the fork and write somewhere in memory causing the memory to be duplicated.

Cheers

James

 

Comment by Vladislavs Sokurenko [ 2019 Sep 04 ]

It would be nice to understand what is this memory used for and if it is memory for write cache "HistoryCacheSize" can you confirm that decreasing HistoryCacheSize makes pollers to consume allot less memory and attach smaps please.

A quick update [email protected], from your smaps I see that Pss is equal to 2191492 KB which is roughly 2 GB

Comment by Vladislavs Sokurenko [ 2019 Sep 04 ]

If anyone experience the issue, please attach smaps for the process in question and output of htop

Comment by James Cook [ 2019 Sep 09 ]

Hi Vladislavs,

I checked this and yes it is roughly 2GB, and our system memory is 8GB (only when the work around is in place)

This is why I think upon startup of the Zabbix proxy it has memory allocation issues if the host configuration is in place and enabled.

When Broken:

  1. Loads hosts configuration in memory (All hosts enabled)
  2. Starts main process
  3. Forks pollers - child processes (contains hosts configuration in memory)
  4. Pollers write to their read only copy of memory - duplicates memory for writing
  5. Quickly runs out of memory and cant fork any more pollers (hence the 'cant allocate memory' error message)

When Working:

  1. Loads NO hosts configuration in memory (All hosts disabled)
  2. Starts main process
  3. Forks pollers - child processes (contains NO hosts configuration in memory)
  4. Pollers write to their read only copy of memory - duplicates memory for writing
  5. Finishes start....
  6. Enable hosts to monitor
  7. Reloads configuration and starts monitoring (already forked - so no duplication of memory etc)

I have a spare proxy now at work so I will attempt to force a fail and collect smaps in this condition this or next week (as I am on training).

 

Cheers

James

Comment by Vladislavs Sokurenko [ 2019 Sep 10 ]

Main process perform forks so it should be it's memory that gets duplicated, can you please be so kind and provide smaps for it ? And maybe even dump of the biggest regions ?
For example:

gdb -p PID
dump memory /tmp/memory.dump 0x0266b000 0x02753000

Main suspicion is that it performs configuration sync once in main process and some memory is left over for other processes.
Do you have many unsupported items with invalid update intervals ?

Comment by James Cook [ 2019 Sep 12 ]

Hi Vladislavs,

I'm currently on training and will get some additional data for you next week.

There are no invalid update intervals that are reported in the logs or anything.

Regards

James

 

 

Comment by Christian Anton [ 2019 Sep 17 ]

We are apparently having the same issue here. One of our proxies, handling roughly 1000 hosts and 600 NVPS with CacheSize of 256M,  HistoryCacheSize of 128M and HistoryIndexCacheSize of 64M. System works nicely when started without a sqlite database file and the system has enough available memory:

zmonitor-prx-fr1 root ~]# free -m
total used free shared buff/cache available
Mem: 7803 1386 5699 150 717 5975
Swap: 2047 165 1882

When restarting the proxy, it recognizes the database is there, starts to start its processes and immediately gets killed by OOM killer. 

Comment by Vladislavs Sokurenko [ 2019 Sep 17 ]

It might be related to the main process using the database and then forks get the same memory

Comment by Aigars Kadikis [ 2019 Nov 19 ]

Hello [email protected]. You have been done some good investigations so far.

Would you mind to continue the process by providing smaps for it? And maybe even dump of the biggest regions? For example:

gdb -p PID
dump memory /tmp/memory.dump 0x0266b000 0x02753000

Main suspicion is that it performs configuration sync once in main process and some memory is left over for other processes.
Do you have many unsupported items with invalid update intervals ?

 

Comment by Aigars Kadikis [ 2019 Nov 20 ]

For those who are relatively new to this thread, but would like to investigate and understand what kind of content is lying in memory, here are sequential steps how to look on "poller process #1" memory heap:

# take a note of preocess id
ps aux|grep "[z]abbix_proxy: poller #1 "

# output
zabbix   13349  0.8  2.0 156260 20968 ?        S    Nov06 157:33 /usr/sbin/zabbix_proxy: poller #1 [got 0 values in 0.000049 sec, idle 1 sec]

# take a look in smaps registry
cat /proc/13349/smaps

# look for the section containing heap with a decent size for example "1572 kB":
026c8000-02851000 rw-p 00000000 00:00 0          [heap]
Size:               1572 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                1316 kB
Pss:                1316 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:      1316 kB
Referenced:         1316 kB
Anonymous:          1316 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
VmFlags: rd wr mr mw me ac

# enter gdb mode for the same process we are debugging right now:
gdb -p 13349  

# dump the memory block. we are using a memory address from previous output + insert '0x' in at beginning of each address:
dump memory /tmp/13349.dump 0x026c8000 0x02851000
quit

# Make the memory block more readable to human eye:
hexdump -C /tmp/13349.dump  /tmp/13349.dump.content

# see what is inside:
cat /tmp/13349.dump.content | less
Comment by Vladislavs Sokurenko [ 2019 Nov 24 ]

Please try patch ZBX-9084-test.patch if you have test instance.

Comment by Vladislavs Sokurenko [ 2019 Dec 04 ]

Issue is reproduced on CentOS Linux release 7.7.1908, VirtualBox Version 6.0.14 r133895 and SQLite 3.7.17
Patch helps and putting sleep between forks helps, seems that issue is only happening if forks are done at the same time.

To reproduce the issue, simply discover 200k trapper items on one host, restart Zabbix proxy after it received configuration from Zabbix server and see out of memory error.

Upgrading SQLite on CentOS Linux release 7.7.1908 to SQLite 3.30.1 2019-10-10 20:19:45 also solves the issue

Issue is also reproducible on Fedora release 30 (Thirty) with SQLite 3.7.17 2013-05-20

Comment by Vladislavs Sokurenko [ 2019 Dec 10 ]

Fixed in:

  • pre-4.0.16rc1 321360349f1
  • pre-4.4.4rc1 46b8ecec5f6
  • pre-5.0.0alpha1 (master) fa3b3fbbdc7
Comment by Vladislavs Sokurenko [ 2019 Dec 10 ]

(1) [D] High memory usage during process startup with SQLite 3.7.17 on Centos/RHEL 7. Startup process has been improved in 4.0.16 and 4.4.4 to avoid similar issues with other databases as well.

martins-v Added to known issues for 4.0, 4.4. RESOLVED

vso CLOSED

Generated at Tue May 20 07:18:05 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.