[ZBX-25050] System uses virtual memory instead of available memory for preprocessing Created: 2024 Aug 15 Updated: 2025 Apr 30 |
|
Status: | Sign off by Support |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 7.0.2, 7.0.3 |
Fix Version/s: | 7.0.6rc1, 7.2.0beta1 |
Type: | Problem report | Priority: | Major |
Reporter: | Shane Arnold | Assignee: | Maksym Buz |
Resolution: | Unresolved | Votes: | 4 |
Labels: | memoryleak, preprocessing, server | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | 1h | ||
Original Estimate: | Not Specified |
Attachments: |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
||||||||||||||||||||||||||||||||||||
Issue Links: |
|
||||||||||||||||||||||||||||||||||||
Team: | |||||||||||||||||||||||||||||||||||||
Sprint: | Prev.Sprint, S24-W44/45, S24-W50/51/52/1 | ||||||||||||||||||||||||||||||||||||
Story Points: | 0.125 |
Description |
Steps to reproduce:
Result: Preprocessing manager to not use (and continue to increase usage) of virtual memory when physical memory is available. |
Comments |
Comment by Shane Arnold [ 2024 Aug 15 ] | ||||||||||||||||||||||||||||||
Note: as a troubleshooting step, I have increased number of preprocessors to 10. I noticed when looking at the running processes that each of the existing preprocessing threads had about 500 values queued at any given time. It is also noted that when monitoring slab usage with slabtop -s c, a high amount of radix_tree_node allocation is present, which is also released when restarting or terminating zabbix_server (and as a result the preprocessing manager). | ||||||||||||||||||||||||||||||
Comment by Alexander Vladishev [ 2024 Aug 15 ] | ||||||||||||||||||||||||||||||
Zabbix doesn't directly control the use of virtual memory. That is managed by the operating system. How much available memory do you have on your system? Many Zabbix processes have own caches, and it's normal for memory consumption to increase for some time after the server starts. | ||||||||||||||||||||||||||||||
Comment by Shane Arnold [ 2024 Aug 16 ] | ||||||||||||||||||||||||||||||
Thanks sasha, see below. RAM: 16GB The memory increase as shown in the attached graph shows a constant memory increase in the order of gigabytes. This will eventually exhaust all available memory. | ||||||||||||||||||||||||||||||
Comment by Alexander Vladishev [ 2024 Aug 16 ] | ||||||||||||||||||||||||||||||
Could you please run the following commands on a long running server and share the output: ps -eo size,pid,user,command --sort -size | grep zabbix_server cat /proc/meminfo free -m What types of items are you using for monitoring? | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Aug 16 ] | ||||||||||||||||||||||||||||||
Could you please be so kind and provide zabbix_server -R diaginfo, what kind of preprocessing is mostly used ? | ||||||||||||||||||||||||||||||
Comment by Shane Arnold [ 2024 Aug 16 ] | ||||||||||||||||||||||||||||||
Thanks sasha and vso , please see attached zabbix diaginfo.txt The zabbix_server processes were restarted yesterday so I cannot fulfil the 'long running server' criteria, however I can see already that the virtual memory is increasing despite physical memory being available. See attached 'zabbix_server potential memory leak - memory and processes.txt For types of preprocessing items, see below. I have chosen itemid>100000 as I believe this would closely represent any items added that weren't simple oobe templates. The largest amount of preprocessed items would be from Veeam Backup and Replication by HTTP, AWS by HTTP and Azure by HTTP discovery;
SELECT COUNT(*), type FROM item_preproc WHERE itemid > '100000' GROUP BY TYPE ORDER BY COUNT ASC
| ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Aug 16 ] | ||||||||||||||||||||||||||||||
Could you please be so kind to also provide output of: In diaginfo only interesting information is count of cached items, otherwise looks good: == preprocessing diagnostic information == Cached items:31185 pending tasks:0 finished tasks:0 task sequences:0 time:0.000309 Could it be that cached items are large ? | ||||||||||||||||||||||||||||||
Comment by Shane Arnold [ 2024 Aug 16 ] | ||||||||||||||||||||||||||||||
See attached zabbix_agent2-proc.get.txt | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Aug 16 ] | ||||||||||||||||||||||||||||||
PSS is 387 megabytes, however peak memory usage is high 3246 megabytes, it appears there was lots of data to process by preprocessing manager at some moment, could you please show Zabbix server health graphs ? { "pid": 894951, "ppid": 894924, "name": "zabbix_server", "cmdline": "/usr/sbin/zabbix_server: preprocessing manager #1 [queued 424, processed 1038 values, idle 4.986044 sec during 5.016775 sec]", "user": "zabbix", "group": "zabbix", "uid": 113, "gid": 119, "vsize": 3246260224, "pmem": 1.785468047381302, "rss": 449212416, "data": 1516560384, "exe": 3588096, "hwm": 455725056, "lck": 0, "lib": 17494016, "peak": 3246260224, "pin": 0, "pte": 3297280, "size": 1520283648, "stk": 135168, "swap": 1039720448, "cputime_user": 2138.98, "cputime_system": 786.27, "state": "sleeping", "ctx_switches": 42628321, "threads": 11, "page_faults": 3188, "pss": 387652096 } For example I see that proxy poller peak memory usage was also high "pid": 894970, "ppid": 894924, "name": "zabbix_server", "cmdline": "/usr/sbin/zabbix_server: proxy poller #1 [exchanged data with 0 proxies in 0.000025 sec, idle 5 sec]", "user": "zabbix", "group": "zabbix", "uid": 113, "gid": 119, "vsize": 1314439168, "pmem": 0.025494825087754455, "rss": 6414336, "data": 3227648, "exe": 3588096, "hwm": 10686464, "lck": 0, "lib": 17494016, "peak": 1314439168, "pin": 0, "pte": 126976, "size": 6950912, "stk": 135168, "swap": 2166784, "cputime_user": 0.64, "cputime_system": 0.16, "state": "sleeping", "ctx_switches": 17990, "threads": 1, "page_faults": 1, "pss": 179712 }, | ||||||||||||||||||||||||||||||
Comment by Shane Arnold [ 2024 Aug 16 ] | ||||||||||||||||||||||||||||||
See below past 24 hours from health dashboard. Peaks are housekeeper. Also note the increased value cache effectiveness and queue size changes are directly correlated to starting the local zabbix agent after having been stopped the night before. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Aug 16 ] | ||||||||||||||||||||||||||||||
Its most likely shared memory, what are the values for it in configuration file and does changing them for example 2x smaller reduce memory usage ? However actual memory usage appears to be reasonable 387 megabytes by preprocessing manager, currently there is no indication of an issue. | ||||||||||||||||||||||||||||||
Comment by Shane Arnold [ 2024 Aug 20 ] | ||||||||||||||||||||||||||||||
Thanks vso, to confirm, you see the 11.6G of virtual memory allocated to preprocessing manager to be normal? I am struggling to understand this change in behaviour, as it seemed to coincide with our upgrade to Zabbix 7.x from 6.4. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Aug 22 ] | ||||||||||||||||||||||||||||||
It's possible that there is an issue but having large virtual size is not issue itself, it can be observed that it grows immediately once CacheSize is set too larger value, please also provide cat /proc/894951/smaps_rollup for analysis | ||||||||||||||||||||||||||||||
Comment by Shane Arnold [ 2024 Aug 27 ] | ||||||||||||||||||||||||||||||
Thanks, see attached smaps_rollup for the preprocessing PID, as well as a pmap sorted by Kbytes in zabbix pre-processor pmap.txt For what it's worth, I killed the pre-processing manager parent process with kill -9, and it immediately released all virtual memory that was allocated to it. In regard to cache sizing, it is very small (128M), I don't believe any zabbix_server cache configurations would be attributing to this, and indeed none have changed. This looks exactly the same as | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Aug 27 ] | ||||||||||||||||||||||||||||||
It could be due to the fact that there are many javascript preprocessing, they are all cached and can consume memory, same goes for throttling. Is same javascript used for many items ? | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Aug 27 ] | ||||||||||||||||||||||||||||||
Please apply history_size_7_0.diff | ||||||||||||||||||||||||||||||
Comment by Shane Arnold [ 2024 Aug 28 ] | ||||||||||||||||||||||||||||||
Scratch that, immediately crashed with an out of memory error. | ||||||||||||||||||||||||||||||
Comment by Edgar Akhmetshin [ 2024 Aug 28 ] | ||||||||||||||||||||||||||||||
Hello [email protected], If you can install patched package, please share what operating system version you have and backend type used, we will share patched package to find the root cause. Regards, | ||||||||||||||||||||||||||||||
Comment by Shane Arnold [ 2024 Aug 29 ] | ||||||||||||||||||||||||||||||
Thanks edgar.akhmetshin; OS: PRETTY_NAME="Ubuntu 22.04.4 LTS" Kernel: Linux 5.15.0-119-generic #129-Ubuntu SMP Fri Aug 2 19:25:20 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux DB: PostgreSQL 14.13 (Ubuntu 14.13-0ubuntu0.22.04.1) | ||||||||||||||||||||||||||||||
Comment by Edgar Akhmetshin [ 2024 Sep 02 ] | ||||||||||||||||||||||||||||||
Hello [email protected], Here is the Server package, provide diaginfo information since memory leak is visible and noticeable as mentioned by vso: Regards, | ||||||||||||||||||||||||||||||
Comment by Shane Arnold [ 2024 Sep 03 ] | ||||||||||||||||||||||||||||||
Thanks edgar.akhmetshin, this package is now installed, and I will report back in about a week with the results. | ||||||||||||||||||||||||||||||
Comment by Shane Arnold [ 2024 Sep 12 ] | ||||||||||||||||||||||||||||||
Hi edgar.akhmetshin , since running Zabbix Server using the package provided, the behaviour seems to have resolved, as shown below; I don't know if this was the expected result or a co-incidence, but it looks like the symptom isn't being reproduced at this time. See also zabbix_server_patched_diaginfo.txt | ||||||||||||||||||||||||||||||
Comment by Edgar Akhmetshin [ 2024 Sep 12 ] | ||||||||||||||||||||||||||||||
Hello [email protected], We have one env with the same issue and memory usage by preprocessing manager started to grow after 1 week of a usage. Please keep monitoring, currently we are investigating already existing information from another environment. Regards, | ||||||||||||||||||||||||||||||
Comment by Shane Arnold [ 2024 Sep 13 ] | ||||||||||||||||||||||||||||||
Hi edgar.akhmetshin, it looks like it has begun again. See below. I have also noted any activities that were being performed in Zabbix around that time; - Created maintenance period | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Sep 13 ] | ||||||||||||||||||||||||||||||
It's normal for memory usage to grow a little if new items are added, did configuration cache usage also grow ? | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Oct 01 ] | ||||||||||||||||||||||||||||||
How many preprocessing workers are configured ? Please try adding as many as possible to avoid large queue, please set at least to StartPreProcessor=100 and see if issue persists | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Oct 01 ] | ||||||||||||||||||||||||||||||
Please also provide graphs for history cache usage and zabbix[vps,written] | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Oct 03 ] | ||||||||||||||||||||||||||||||
Increasing preprocessing workers from 3 to 10 decrease speed of memory consumption. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Oct 03 ] | ||||||||||||||||||||||||||||||
If lots of values are processed then please increase to 100 or more and see if memory usage stops to increase, unfortunately it's possibility that lots of values are queued if there are not enough works and in that case it can cause memory usage to increase, it should stabilise at some moment though when peak is reached, if there are more workers then queue should not grow so much. If peaks are rare then a solution could be to force memory to be released back to system. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Oct 16 ] | ||||||||||||||||||||||||||||||
Confirmed, default count of preprocessing workers should be increased to match core count, see system_cpu_num() function ### Option: StartPreprocessors # Number of pre-started instances of preprocessing worker threads, should be set to no less than available CPU core count. More workers should be set if preprocessing is not CPU bound and has lots of network requests. # # Mandatory: no # Range: 1-1000 # Default (automatically set to core count): # StartPreprocessors= Also malloc_trim should be called on a daily basis to trim memory above 128 MB in case there were spikes in queue | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Oct 16 ] | ||||||||||||||||||||||||||||||
Does it help setting to StartPreprocessors to number of cores in system Dimasmir ? | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Oct 16 ] | ||||||||||||||||||||||||||||||
I have VM with 8 vCPU. Increasing Preprocessors to 10 or to 20 not help. I attach memory usage praph, output of "/proc/cpuinfo" and output of "ps -aux --sort -rss" I can try to reduce Preprocessors to 8. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Oct 16 ] | ||||||||||||||||||||||||||||||
Please check graphs for zabbix[preprocessing_queue] item what is the peak value ? | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Oct 16 ] | ||||||||||||||||||||||||||||||
Peak 191, avg 0.5. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Oct 16 ] | ||||||||||||||||||||||||||||||
Thank you, could you please check over the longer period, preferably since server was started ? | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Oct 16 ] | ||||||||||||||||||||||||||||||
It is for last 5 days, when preprocessors qty was increased. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Oct 16 ] | ||||||||||||||||||||||||||||||
Thank you, could you please also share pss instead of rss ? RSS also includes shared memory so it can be misleading. ps -eo pss,pid,user,command --sort -pss | \ awk '{ hr=$1/1024 ; printf("%13.2f Mb ",hr) } { for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }' |\ cut -d "" -f2 | cut -d "-" -f1 | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Oct 17 ] | ||||||||||||||||||||||||||||||
ps does not accept pss, returns "error: unknown user-defined format specifier "pss". Maybe output of /proc/'pid of preprocessing manager #1'/smaps can help? I've recently restarted the service, let's wait until the memory usage increases. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Oct 17 ] | ||||||||||||||||||||||||||||||
Could check cat /proc/<pid>/smaps_rollup and if Zabbix server config file have too much shared memory allocated. | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Oct 17 ] | ||||||||||||||||||||||||||||||
VMwareCacheSize=128M - Utilization 14% CacheSize=2G - Utilization 50% HistoryIndexCacheSize=32M - Utilization 13% TrendCacheSize=64M - Utilization 47% ValueCacheSize=256M - Utilization 34%
cat smaps_rollup below, but so far little memory has been used up
cat /proc/4052990/smaps_rollup 55a9af6ec000-7ffc893c7000 ---p 00000000 00:00 0 [rollup] Rss: 680720 kB Pss: 339401 kB Shared_Clean: 9060 kB Shared_Dirty: 387184 kB Private_Clean: 60 kB Private_Dirty: 284416 kB Referenced: 678844 kB Anonymous: 286256 kB LazyFree: 0 kB AnonHugePages: 151552 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB
| ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Oct 19 ] | ||||||||||||||||||||||||||||||
Actual statistics
ps -aux --sort -rss USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND zabbix 882954 7.4 39.8 12529160 9820940 ? Sl Oct17 244:03 /usr/sbin/zabbix_server: preprocessing manager #1 [queued 9989, processed 10989 values, idle 4.878511 sec during 5.000763 sec] cat /proc/882954/smaps_rollup 55fa44c95000-7ffdb15fc000 ---p 00000000 00:00 0 [rollup] Rss: 9825624 kB Pss: 9486237 kB Shared_Clean: 9252 kB Shared_Dirty: 382996 kB Private_Clean: 60 kB Private_Dirty: 9433316 kB Referenced: 9823744 kB Anonymous: 9434768 kB LazyFree: 0 kB AnonHugePages: 9318400 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB
One more 12 hours later
cat /proc/882954/smaps_rollup 55fa44c95000-7ffdb15fc000 ---p 00000000 00:00 0 [rollup] Rss: 12281460 kB Pss: 11937922 kB Shared_Clean: 9236 kB Shared_Dirty: 387212 kB Private_Clean: 60 kB Private_Dirty: 11884952 kB Referenced: 12272084 kB Anonymous: 11887052 kB LazyFree: 0 kB AnonHugePages: 11235328 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Nov 11 ] | ||||||||||||||||||||||||||||||
Fixed in:
| ||||||||||||||||||||||||||||||
Comment by Arturs Dancis [ 2024 Nov 14 ] | ||||||||||||||||||||||||||||||
Documentation updated:
| ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Nov 25 ] | ||||||||||||||||||||||||||||||
Looks like 7.0.6 not resolve memory leaking. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Nov 25 ] | ||||||||||||||||||||||||||||||
Could you please be so kind and provide smaps_rollup for preprocessing manager ? | ||||||||||||||||||||||||||||||
Comment by Christian Anton [ 2024 Nov 25 ] | ||||||||||||||||||||||||||||||
I am having the same or at least very similar behavior with one Zabbix installations: As you can see Zabbix Server available memory shrinks down until eaten up including swap space, then it crashes. This happens every ~6 days since upgrade to Zabbix 7.0 Process lists shows that Preprocessing Manager process is the one eating up the memory (right now 50%): zabbix 1102 6.7 50.1 15896064 12332900 ? Sl Nov21 381:11 /usr/sbin/zabbix_server: preprocessing manager #1 [queued 5467, processed 6314 values, idle 4.812578 sec during 5.000574 sec] In this installation, we have roughly 5k NVPS, running Zabbix Server + frontend on an Ubuntu 22 VM with 24 GiB of RAM, 4 cores. I have started 50 Preprocessors in zabbix_server.conf. I was hoping 7.0.6 to fix this issue, but I can see the memory consumption still goes the same direction as before. | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Nov 25 ] | ||||||||||||||||||||||||||||||
vso , I just increased the number of preprocessors from 20 to 40 and restarted the server. Let's wait a little bit.
| ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Nov 25 ] | ||||||||||||||||||||||||||||||
Please provide smaps_rollup for preprocessing manager christiananton toe see if there is really an issue | ||||||||||||||||||||||||||||||
Comment by Christian Anton [ 2024 Nov 25 ] | ||||||||||||||||||||||||||||||
cat /proc/1102/smaps_rollup | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Nov 25 ] | ||||||||||||||||||||||||||||||
Indeed memory usage is high christiananton, could you please check if after upgrade to 7.0.6 it drops daily, this could give some hints on why it was happening. | ||||||||||||||||||||||||||||||
Comment by Christian Anton [ 2024 Nov 25 ] | ||||||||||||||||||||||||||||||
vso according to the graph posted above since upgrade to 7.0.6 (from formerly 7.0.5), no changes. Available memory continuously going down in exactly the same speed as before and over several days now. Will see whether it will behave equally than before, eating up all memory, and all swap space, and killing itself afterwards. I suppose it will. | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Nov 27 ] | ||||||||||||||||||||||||||||||
ps -aux --sort -rss
cat /proc/759250/smaps_rollup | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Nov 27 ] | ||||||||||||||||||||||||||||||
Please provide both memory utilisation and preprocessing manager queue together to see if there is correlation | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Nov 27 ] | ||||||||||||||||||||||||||||||
I see Preprocessing queue metric in Zabbix server health template and information in ps output. Which one will be useful? | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Nov 27 ] | ||||||||||||||||||||||||||||||
In template please | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Nov 27 ] | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Nov 27 ] | ||||||||||||||||||||||||||||||
Please check cat /proc/759250/smaps_rollup daily if possible Dimasmir to see if it stabilises or not, it's possible that there was some kind of spike of values that filled preprocessing queue quickly and then released, for example there are queued 5442, processed 6186 values, currently it is unknown how big each value was but if each value is for example 1MB then it could take 5442 MB over short period, but memory should be released back to system after 24 hours. | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Nov 27 ] | ||||||||||||||||||||||||||||||
Ок. Zabbix server started 2024-11-27 11:08:34. Stats for now: ps -aux | grep "preprocessing manager"
cat /proc/2661309/smaps_rollup | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Nov 27 ] | ||||||||||||||||||||||||||||||
Could you please bet so kind and provide graph for zabbix[vps,written] and screenshot for Zabbix server health dashboard | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Nov 27 ] | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Nov 28 ] | ||||||||||||||||||||||||||||||
Stats for now: cat /proc/2661309/smaps_rollup | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Nov 28 ] | ||||||||||||||||||||||||||||||
If possible please try launching zabbix with jemalloc https://github.com/jemalloc/jemalloc/wiki/getting-started, here is example:
LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libjemalloc.so.2" ./sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf --foreground
| ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Nov 29 ] | ||||||||||||||||||||||||||||||
Stats for now: cat /proc/2661309/smaps_rollup | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Nov 29 ] | ||||||||||||||||||||||||||||||
Started, bun locks the console and under root user. I can set up shell for zabbix user and use nohup to free console?
LD_PRELOAD="/usr/lib64/libjemalloc.so.2" /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf --foreground
| ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Nov 29 ] | ||||||||||||||||||||||||||||||
I think simplest way is to use tmux, but you can remove foreground option and probably update systemd file, just prepend previous command with LD_PRELOAD="/usr/lib64/libjemalloc.so.2" | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Nov 29 ] | ||||||||||||||||||||||||||||||
Also created | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Dec 02 ] | ||||||||||||||||||||||||||||||
I have connected the library in the test environment. Attached is the mips from the test, can you confirm that there is the necessary data? Then I will be able to apply it to production. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Dec 02 ] | ||||||||||||||||||||||||||||||
Is it possible to attach smaps instead of maps please ? Did memory stopped increasing ? | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Dec 03 ] | ||||||||||||||||||||||||||||||
It looks like the memory consumption hasn't stopped. smaps attached. Unfortunately, we can't leave the jemallock attached because all icmp checks stopped working with error ERROR: ld.so: object 'libjemalloc.so.2' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. app-qv-04.domainname : xmt/rcv/%loss = 1/1/0%, min/avg/max = 0.313/0.313/0.313
| ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Dec 03 ] | ||||||||||||||||||||||||||||||
I have just tried and fling works for me with LD_PRELOAD, maybe there are some permission issues, if it's not possible to test then lets see after | ||||||||||||||||||||||||||||||
Comment by Christian Anton [ 2024 Dec 04 ] | ||||||||||||||||||||||||||||||
Little update from my side: Current status of used memory on system: So, we are very near to the next crash here. ps axu | grep preprocessing zabbix 1203 7.1 55.5 22521348 18249396 ? Sl Nov28 595:39 /usr/sbin/zabbix_server: preprocessing manager #1 [queued 6707, processed 9355 values, idle 4.859893 sec during 5.006683 sec] Preprocessing manager is eating 21.4 GB of VSZ and 17.4 GB of RSS. StartPreprocessors is set to 50 Utilization is almost nothing: Preprocessing queue is also rather low. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Dec 04 ] | ||||||||||||||||||||||||||||||
Thank you christiananton for you report, from graph it looks like memory is freed every 24 hours and then climbs up again ? It's better to check pss instead of rss: ps -eo pss,pid,user,command --sort -pss | \ awk '{ hr=$1/1024 ; printf("%13.2f Mb ",hr) } { for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }' |\ cut -d "" -f2 | cut -d "-" -f1 Current suspicion is that some values come in faster than they can be preprocessed, maybe discovery values, you can check it following command:
zabbix_server -R diaginfo="preprocessing";
But might need to wait for | ||||||||||||||||||||||||||||||
Comment by Christian Anton [ 2024 Dec 04 ] | ||||||||||||||||||||||||||||||
Memory doesn't free every 24h, instead it continues running full until eaten entirely, then it eats up swap, then system crashes. That happens approx. every 5-6 days. Your ps command gives me "error: unknown user-defined format specifier "pss"" on Ubuntu 22. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Dec 04 ] | ||||||||||||||||||||||||||||||
Could you please also provide output of zabbix_server -R diaginfo="preprocessing" now christiananton ? | ||||||||||||||||||||||||||||||
Comment by Christian Anton [ 2024 Dec 04 ] | ||||||||||||||||||||||||||||||
Right now, yet on 7.0.6, this is the output: zabbix_server -R diaginfo="preprocessing" I am going to install a pre-release built to check whether | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Dec 04 ] | ||||||||||||||||||||||||||||||
Actually cached items:186958 could itself be a problem if discard unchanged is used and there are big values stored to check for discard, this could require it to be fixed in other way by storing hash instead of last values but we have tried history_size_7_0.diff | ||||||||||||||||||||||||||||||
Comment by Christian Anton [ 2024 Dec 04 ] | ||||||||||||||||||||||||||||||
I have installed 7.0.7-rc1 from I also see a changed output of the diagnostics info for preprocessing, so it seems it has zabbix_server -R diaginfo="preprocessing"
Of course, up to now there is no big memory consumption of preprocessing manager (and workers) because I have started 7.0.7rc1 not more than 20 minutes ago. I will keep it running and check if I can see anything interesting. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Dec 05 ] | ||||||||||||||||||||||||||||||
That looks correct, please note that statistics is reset every 24 hours so it's best to collect before the time expires | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Dec 06 ] | ||||||||||||||||||||||||||||||
vso, we were able to attach jemalloc without errors and waited for high memory consumption. smaps attached.
smaps 9 hours later, before reboot
| ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Dec 12 ] | ||||||||||||||||||||||||||||||
Currently it is not clear how to reproduce issue, if there is test installation that experiences this issue could try debugging it with following commands. When compiling do not strip debug symbols also best to compile with following CFLAGS:
export CFLAGS="-g -O0"
Run with tcmalloc
LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libtcmalloc.so" HEAPPROFILE=./heap_profile HEAP_PROFILE_ALLOCATION_INTERVAL=0 HEAP_PROFILE_INUSE_INTERVAL=4294967296 HEAPPROFILESIGNAL=5 MALLOCSTATS=1 ./sbin/zabbix_server -f -c /etc/zabbix/zabbix_server.conf
Identify pid that is consuming lots of memory and make it dump profiling, replace 2724852 with pid of preprocessing manager as an example: kill -5 2724852 Then print profile: google-pprof -text ./sbin/zabbix_server ./heap_profile.0001.heap Using local file ./sbin/zabbix_server. Using local file ./heap_profile.0001.heap. Total: 1078.1 MB 1076.8 99.9% 99.9% 1076.8 99.9% zbx_malloc2 1.0 0.1% 100.0% 1.0 0.1% __GI___strdup 0.2 0.0% 100.0% 0.2 0.0% CRYPTO_zalloc@@OPENSSL_3.0.0 0.1 0.0% 100.0% 0.1 0.0% OPENSSL_LH_insert@@OPENSSL_3.0.0 0.0 0.0% 100.0% 0.0 0.0% zbx_realloc2 0.0 0.0% 100.0% 0.1 0.0% PKCS7_decrypt@@OPENSSL_3.0.0 0.0 0.0% 100.0% 0.0 0.0% find_best_tree_node 0.0 0.0% 100.0% 0.0 0.0% CRYPTO_strndup@@OPENSSL_3.0.0 | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Dec 13 ] | ||||||||||||||||||||||||||||||
I am sorry Dimasmir did you have smaps_rollup with jemalloc for preprocessing manager, it seemed that memory usage was much lower, maybe something else consumed rest of the memory ? | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2024 Dec 16 ] | ||||||||||||||||||||||||||||||
vso, I can restart Zabbix with jemalloc. let's agree on a methodology for collecting metrics?
We can try to make a copy of the production in the hope that the problem will repeat there. But it will still be different because of the ACL and all that. | ||||||||||||||||||||||||||||||
Comment by Christian Anton [ 2024 Dec 19 ] | ||||||||||||||||||||||||||||||
One thing that I have noticed today is that many of the items in preprocessing manager's diaginfo are actually unsupported items. Some of them from LLD rules not discovering those entities anymore, and some just unsupported. == preprocessing diagnostic information == ...and the output of preprocessing diaginfo doesn't change at all. Don't know how it works internally (read about reset every 24h above) but it makes me suspect that if unsupported items are actually using memory in preprocessing manager and workers, that might be the reason for huge memory consumption in this specific installation: it has quite huge number of unsupported items (>60k)... -------- | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Jan 29 ] | ||||||||||||||||||||||||||||||
We moved VMware monitoring and some part of SNMP monitoring to proxy and the memory leak problem moved to proxy. The move was related to worker load balancing. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Jan 30 ] | ||||||||||||||||||||||||||||||
If possible please provide output of diaginfo as in | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Jan 30 ] | ||||||||||||||||||||||||||||||
vso, now we deploy additional VM for Zabbix Proxy for VMware monitoring to determine exactly that it is VMware monitoring. Next, there is an idea to disable some items to understand which one exactly leads to problems. | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Jan 31 ] | ||||||||||||||||||||||||||||||
Looks like problem somewhere in VMware Guest template. Memory leak moves to a new proxy for VMware monitoring. Interesting that temporary disable all items in VMware Guest stops memory consumption increase.
| ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Jan 31 ] | ||||||||||||||||||||||||||||||
Maybe it's related to | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Jan 31 ] | ||||||||||||||||||||||||||||||
I use 7.0.8. I will analyze diaginfo and will try to find item which when disabled, the memory is stabilized. Can anyone here who uses zabbix confirm that you use VMware monitoring also? | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Feb 03 ] | ||||||||||||||||||||||||||||||
A see lots of "allocated memory" messages in preprocessing worker log. Hundreds time/sec. Much much more than on other proxy where is no memory leak.
cat /var/log/zabbix/zabbix_proxy.log | grep "allocated" 783020:20250203:154605.055 [10] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.055 [10] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.056 [10] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.056 [10] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.056 [10] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.056 [10] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.057 [10] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.057 [9] End of zbx_es_execute():SUCCEED allocated memory: 116018 max allocated or requested memory: 117512 max allowed memory: 536870912 783020:20250203:154605.057 [9] End of zbx_es_execute():SUCCEED allocated memory: 116018 max allocated or requested memory: 117512 max allowed memory: 536870912 783020:20250203:154605.058 [5] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.058 [5] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.059 [5] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.059 [5] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.060 [11] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.060 [11] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.060 [9] End of zbx_es_execute():SUCCEED allocated memory: 116018 max allocated or requested memory: 117512 max allowed memory: 536870912 783020:20250203:154605.061 [5] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 205082 max allowed memory: 536870912 783020:20250203:154605.062 [5] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.062 [5] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.062 [5] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.063 [8] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.063 [8] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.063 [8] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.063 [8] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.064 [8] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.064 [8] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.064 [8] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116173 max allowed memory: 536870912 783020:20250203:154605.064 [8] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.065 [8] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.065 [8] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.065 [8] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.066 [2] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.066 [12] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.066 [7] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.066 [15] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.067 [15] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.067 [8] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.068 [8] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.068 [8] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.068 [2] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.068 [2] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.069 [4] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.069 [4] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.070 [4] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.070 [4] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.070 [4] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.071 [4] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.071 [4] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.071 [14] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.072 [12] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.072 [5] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.072 [3] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.072 [10] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.073 [6] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.074 [9] End of zbx_es_execute():SUCCEED allocated memory: 116018 max allocated or requested memory: 117512 max allowed memory: 536870912 783020:20250203:154605.075 [2] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.075 [7] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.076 [13] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.076 [4] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.077 [16] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.077 [15] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912 783020:20250203:154605.078 [1] End of zbx_es_execute():SUCCEED allocated memory: 114854 max allocated or requested memory: 116348 max allowed memory: 536870912
zabbix_proxy -R diaginfo == history cache diagnostic information == Items:0 values:0 time:0.000022 Memory.data: size: free:16776832 used:0 chunks: free:1 used:0 min:16776832 max:16776832 buckets: 256+:1 Memory.index: size: free:16704104 used:72632 chunks: free:2 used:4 min:164776 max:16539328 buckets: 256+:2 Top.values: == == preprocessing diagnostic information == Cached items:20454 pending tasks:0 finished tasks:0 task sequences:0 time:0.001537 Top.sequences: Top.peak: itemid:9009500 tasks:2 itemid:8977370 tasks:2 itemid:8963168 tasks:2 itemid:8973320 tasks:2 itemid:8962466 tasks:2 itemid:8428893 tasks:2 itemid:8970350 tasks:2 itemid:9110179 tasks:2 itemid:8980556 tasks:2 itemid:8966624 tasks:2 itemid:8966084 tasks:2 itemid:8995190 tasks:2 itemid:8966840 tasks:2 itemid:8428902 tasks:2 itemid:8977694 tasks:2 itemid:8976830 tasks:2 itemid:8988062 tasks:2 itemid:8981096 tasks:2 itemid:8995406 tasks:2 itemid:9002264 tasks:2 itemid:8984390 tasks:2 itemid:8999024 tasks:2 itemid:8959226 tasks:2 itemid:8984606 tasks:2 itemid:8428930 tasks:2 == == locks diagnostic information == Locks: ZBX_MUTEX_LOG:0x7fe8abe18000 ZBX_MUTEX_CACHE:0x7fe8abe18028 ZBX_MUTEX_TRENDS:0x7fe8abe18050 ZBX_MUTEX_CACHE_IDS:0x7fe8abe18078 ZBX_MUTEX_SELFMON:0x7fe8abe180a0 ZBX_MUTEX_CPUSTATS:0x7fe8abe180c8 ZBX_MUTEX_DISKSTATS:0x7fe8abe180f0 ZBX_MUTEX_VALUECACHE:0x7fe8abe18118 ZBX_MUTEX_VMWARE:0x7fe8abe18140 ZBX_MUTEX_SQLITE3:0x7fe8abe18168 ZBX_MUTEX_PROCSTAT:0x7fe8abe18190 ZBX_MUTEX_PROXY_HISTORY:0x7fe8abe181b8 ZBX_MUTEX_MODBUS:0x7fe8abe181e0 ZBX_MUTEX_TREND_FUNC:0x7fe8abe18208 ZBX_MUTEX_REMOTE_COMMANDS:0x7fe8abe18230 ZBX_MUTEX_PROXY_BUFFER:0x7fe8abe18258 ZBX_MUTEX_VPS_MONITOR:0x7fe8abe18280 ZBX_RWLOCK_CONFIG:0x7fe8abe182a8 ZBX_RWLOCK_CONFIG_HISTORY:0x7fe8abe182e0 ZBX_RWLOCK_VALUECACHE:0x7fe8abe18318 == == proxy buffer diagnostic information == Memory: size: free:16626424 used:106248 chunks: free:2 used:2758 min:58056 max:16568368 buckets: 256+:2 ==
| ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Feb 03 ] | ||||||||||||||||||||||||||||||
It's javascript preprocessing, can you please provide more details about 9009500 and other items mentioned in diaginfo, note that diaginfo only show 25 most used values, so there could be much more similar items, current suspicion is that for short amount of time there was huge amount of data received but it seems that it was discarded afterwards if it did not affect history write cache. Is event log filtered by javascript preprocessing ? | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Feb 03 ] | ||||||||||||||||||||||||||||||
There are 3 Get sensors and 22 Get snapshots. Get sensors have no preprocessing, Get snapshots have only discard unchanged with heartbeat. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Feb 03 ] | ||||||||||||||||||||||||||||||
how frequent are those checks ? it seems that this snapshot is large, probably need to think about introducing statistics to display biggest values received by preprocessing manager. top.peak only shows items with preprocessing Most probably what is happening is that 22 get snapshots are very big, maybe 100 MB each or more and perhaps grow over time. Also please collect how busy is preprocessing worker, please provide as much internal monitoring graphs as possible. | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Feb 03 ] | ||||||||||||||||||||||||||||||
Get snapshots value takes every hour on each VM. 1000 VMs, so 1000 values per hour for whole Zabbix Proxy. Get sensors value takes every minute on each hypervisor. 56 HV, so 56 values per minute. Get snapshot returns json about VM snapshot, which creates during VM backup. Most of the time it returns empty because of no snapshot exists. It returns results like {"snapshot":[{"name":"Acronis_Tue Feb 04 03:42:10 2025","description":"","createtime":"2025-02-04T00:42:12.918158Z","size":8468172927177,"uniquesize":8468172927177}],"count":1,"latestdate":"2025-02-04T00:42:12.918158Z","latestage":20893,"oldestdate":"2025-02-04T00:42:12.918158Z","oldestage":20893,"size":8468172927177,"uniquesize":8468172927177}
Get sensors is bigger - 32 kbyte of json, i wont quote it here. I will try disable both this items. | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Feb 04 ] | ||||||||||||||||||||||||||||||
After Server/Proxy restart some time memory is stable, but something happening and leak begins near 20:00. Can it be interesting for you to do a confcall to collect some diagnostics data?
cat /var/log/zabbix/zabbix_proxy.log | grep "20250203:19" 783022:20250203:190311.165 executing housekeeper 783022:20250203:190311.171 housekeeper [deleted 0 records in 0.000181 sec, idle for 1 hour(s)] 783011:20250203:190416.688 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 73941 783011:20250203:190518.089 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 66020 783011:20250203:190741.104 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 145033 783011:20250203:192935.552 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 51125 783011:20250203:194108.154 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 28826 783022:20250203:200311.192 executing housekeeper 783022:20250203:200311.197 housekeeper [deleted 0 records in 0.000186 sec, idle for 1 hour(s)] 783011:20250203:200423.813 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 73941 783011:20250203:200545.834 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 145339 783011:20250203:200606.235 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 64865 783011:20250203:200707.426 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 67135 783011:20250203:201344.915 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 145643 783011:20250203:201941.611 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 145337 783011:20250203:202548.629 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 145643 783011:20250203:202639.504 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 145337 783011:20250203:203458.895 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 53423 783011:20250203:203640.839 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 145643 783011:20250203:203843.082 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 145031 783011:20250203:204045.306 received configuration data from server at "srv-zabbix-01.parfum3.local", datalen 144727 | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Feb 04 ] | ||||||||||||||||||||||||||||||
As mentioned here it's possible to try finding root cause with tcmalloc but it might be tricky: When compiling do not strip debug symbols also best to compile with following CFLAGS:
export CFLAGS="-g -O0"
Run with tcmalloc
LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libtcmalloc.so" HEAPPROFILE=./heap_profile HEAP_PROFILE_ALLOCATION_INTERVAL=0 HEAP_PROFILE_INUSE_INTERVAL=4294967296 HEAPPROFILESIGNAL=5 MALLOCSTATS=1 ./sbin/zabbix_server -f -c /etc/zabbix/zabbix_server.conf
Identify pid that is consuming lots of memory and make it dump profiling, replace 2724852 with pid of preprocessing manager as an example: kill -5 2724852 Then print profile: google-pprof -text ./sbin/zabbix_server ./heap_profile.0001.heap Using local file ./sbin/zabbix_server. Using local file ./heap_profile.0001.heap. Total: 1078.1 MB 1076.8 99.9% 99.9% 1076.8 99.9% zbx_malloc2 1.0 0.1% 100.0% 1.0 0.1% __GI___strdup 0.2 0.0% 100.0% 0.2 0.0% CRYPTO_zalloc@@OPENSSL_3.0.0 0.1 0.0% 100.0% 0.1 0.0% OPENSSL_LH_insert@@OPENSSL_3.0.0 0.0 0.0% 100.0% 0.0 0.0% zbx_realloc2 0.0 0.0% 100.0% 0.1 0.0% PKCS7_decrypt@@OPENSSL_3.0.0 0.0 0.0% 100.0% 0.0 0.0% find_best_tree_node 0.0 0.0% 100.0% 0.0 0.0% CRYPTO_strndup@@OPENSSL_3.0.0 We can try also enhancing diaginfo with memory usage statistics, are you open to test the patch ? | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Feb 04 ] | ||||||||||||||||||||||||||||||
Yes. I think I am open to test the patch. Please note that we use a Proxy to reduce the impact of service restarts. We will deploy an additional VM and build a debugging version of the Proxy there. | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Feb 07 ] | ||||||||||||||||||||||||||||||
I build proxy with CFLAGS and it start and collect vmware metrics successfully. But when I start it with Maybe it is necessary to change some of the above heap parameters? | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Feb 07 ] | ||||||||||||||||||||||||||||||
I am sorry to hear that, then only option is to compile with following patch link_tc_malloc_7_0_v2.diff
./sbin/zabbix_proxy -R diaginfo="preprocessing"
google-pprof -text ./sbin/zabbix_proxy ./pp_manager.0001.heap
Should check latest ./pp_manager.0001.heap, number can be higher zabbix proxy should be launched as usually, just "/usr/local/sbin/zabbix_proxy" If performance will be bottleneck then can try 7_0_trim_hourly.diff Have you checked if there are eventlog items that send lots of data ? | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Feb 10 ] | ||||||||||||||||||||||||||||||
Compiled successfully and work well. Let's wait for high memory consumption. | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Feb 13 ] | ||||||||||||||||||||||||||||||
Still no memory leak. I'll try to restart Proxy. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Feb 13 ] | ||||||||||||||||||||||||||||||
It can be due to this change in patch that will trim memory every hour:
if (SEC_PER_HOUR <= sec - time_trim)
It can be changed back to SEC_PER_DAY to see if trimming memory every hour helps with the issue or just try 7_0_trim_hourly.diff | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Feb 13 ] | ||||||||||||||||||||||||||||||
Ok. I will wait 24 hrs and than turn back release memory in case of peak periods | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Feb 24 ] | ||||||||||||||||||||||||||||||
I turn back release memory in case of peak periods from hour to day and wait for a week, but memory does not leak on the version compiled from sources. During compiling i've installed some required packages. Could it be that I build with some other versions of libraries?
sqlite-devel 3.26.0-19.el8_9 sqlite-libs 3.26.0-19.el8_9 net-snmp-devel 1:5.8-30.el8 libssh2-devel 1.10.0-1.el8 libevent-devel 2.1.8-5.el8 pcre-devel 8.42-6.el8 libxml2-devel 2.9.7-18.el8_10.1 libcurl-devel 7.61.1-34.el8_10.2 gperftools-libs 1:2.7-9.el8 gperftools 1:2.7-9.el8 automake 1.16.1-8.el8 | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Feb 24 ] | ||||||||||||||||||||||||||||||
It's highly unlikely, maybe due to optimization options, what kind of CFLAGS were used during compilation ? Please also share ldd ./zabbix_proxy | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Feb 24 ] | ||||||||||||||||||||||||||||||
Looks like CFLAGS has been lost beetween compilations and was not set. Should I recompile with the flag CFLAGS=-g -O0?
ldd /usr/local/sbin/zabbix_proxy linux-vdso.so.1 (0x00007fff60fb9000) libsqlite3.so.0 => /lib64/libsqlite3.so.0 (0x00007fe456cd3000) libxml2.so.2 => /lib64/libxml2.so.2 (0x00007fe45696b000) libnetsnmp.so.35 => /lib64/libnetsnmp.so.35 (0x00007fe4565bf000) libssh2.so.1 => /lib64/libssh2.so.1 (0x00007fe45637f000) libz.so.1 => /lib64/libz.so.1 (0x00007fe456167000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fe455f47000) libevent_core-2.1.so.6 => /lib64/libevent_core-2.1.so.6 (0x00007fe455d0e000) libevent_extra-2.1.so.6 => /lib64/libevent_extra-2.1.so.6 (0x00007fe455aea000) libevent_pthreads-2.1.so.6 => /lib64/libevent_pthreads-2.1.so.6 (0x00007fe4558e7000) libcurl.so.4 => /lib64/libcurl.so.4 (0x00007fe455658000) libtcmalloc.so.4 => /lib64/libtcmalloc.so.4 (0x00007fe45525c000) libm.so.6 => /lib64/libm.so.6 (0x00007fe454eda000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fe454cd6000) libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fe454abe000) libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fe45484d000) libc.so.6 => /lib64/libc.so.6 (0x00007fe454477000) liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fe454250000) libssl.so.1.1 => /lib64/libssl.so.1.1 (0x00007fe453fbb000) libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x00007fe453ad0000) /lib64/ld-linux-x86-64.so.2 (0x00007fe456fe7000) libnghttp2.so.14 => /lib64/libnghttp2.so.14 (0x00007fe4538a9000) libidn2.so.0 => /lib64/libidn2.so.0 (0x00007fe45368b000) libssh.so.4 => /lib64/libssh.so.4 (0x00007fe45341b000) libpsl.so.5 => /lib64/libpsl.so.5 (0x00007fe45320a000) libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007fe452fb5000) libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007fe452cca000) libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007fe452ab3000) libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007fe4528af000) libldap-2.4.so.2 => /lib64/libldap-2.4.so.2 (0x00007fe452660000) liblber-2.4.so.2 => /lib64/liblber-2.4.so.2 (0x00007fe452450000) libbrotlidec.so.1 => /lib64/libbrotlidec.so.1 (0x00007fe452243000) libunwind.so.8 => /lib64/libunwind.so.8 (0x00007fe45202b000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fe451c96000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fe451a7e000) libunistring.so.2 => /lib64/libunistring.so.2 (0x00007fe4516fd000) librt.so.1 => /lib64/librt.so.1 (0x00007fe4514f5000) libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007fe4512e4000) libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007fe4510e0000) libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007fe450ec2000) libbrotlicommon.so.1 => /lib64/libbrotlicommon.so.1 (0x00007fe450ca1000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fe450a76000) libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fe45084d000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fe4505c9000)
| ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Feb 24 ] | ||||||||||||||||||||||||||||||
You have libtcmalloc still linked and it might be the reason why there is no "leak" as it probably handle fragmentation in a better way. libtcmalloc.so.4 => /lib64/libtcmalloc.so.4 (0x00007fe45525c000) There is another ticket that could address fragmentation: | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Feb 24 ] | ||||||||||||||||||||||||||||||
Using MACRO is easier for me. And I should also test it on Zabbix packages from repo. What MACRO should I use? | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Feb 24 ] | ||||||||||||||||||||||||||||||
Any macro, just {$M} it will not cache javascript if there are macros, however it might not help if there is discard so something to keep in mind. I believe safest option for now is to test if trim hourly helps when built without tcmalloc: | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Feb 28 ] | ||||||||||||||||||||||||||||||
patch 7_0_trim_hourly.diff did not help. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Feb 28 ] | ||||||||||||||||||||||||||||||
It appears workaround for now is to use LD_PRELOAD="/usr/lib64/libtcmalloc.so", this should work without needing to recompile. | ||||||||||||||||||||||||||||||
Comment by bunkzilla [ 2025 Mar 04 ] | ||||||||||||||||||||||||||||||
I'm having this issue as well on 7.2.4 amazonlinux 2023 | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Mar 05 ] | ||||||||||||||||||||||||||||||
vso, do I understand correctly that if | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Mar 05 ] | ||||||||||||||||||||||||||||||
If it was launched like this: cat /proc/3502575/maps | grep libtcmalloc ff1e88800000-ff1e88845000 r-xp 00000000 fd:02 11825622 /usr/lib/aarch64-linux-gnu/libtcmalloc.so.4.5.16 ff1e88845000-ff1e8885f000 ---p 00045000 fd:02 11825622 /usr/lib/aarch64-linux-gnu/libtcmalloc.so.4.5.16 ff1e8885f000-ff1e88860000 r--p 0004f000 fd:02 11825622 /usr/lib/aarch64-linux-gnu/libtcmalloc.so.4.5.16 ff1e88860000-ff1e88861000 rw-p 00050000 fd:02 11825622 /usr/lib/aarch64-linux-gnu/libtcmalloc.so.4.5.16 | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Mar 05 ] | ||||||||||||||||||||||||||||||
Hm... So then it looks like it doesn't help. Starting with LD_PRELOAD="/usr/lib64/libtcmalloc.so" /usr/local/sbin/zabbix_proxy
| ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Mar 05 ] | ||||||||||||||||||||||||||||||
Thank you for detailed analysis, looks like when heap tracking is enabled then it keeps track of memory better so it's freed, lets hope that ZBX-25752 will solve the issue. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Mar 10 ] | ||||||||||||||||||||||||||||||
Please provide profiling when tcmalloc with heap profiling was used, maybe there is something interesting | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Mar 11 ] | ||||||||||||||||||||||||||||||
This also solves the issue in my tests with glibc Dimasmir : | ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Mar 13 ] | ||||||||||||||||||||||||||||||
"Please provide profiling when tcmalloc with heap profiling was used, maybe there is something interesting". Do I understand correctly that I need to execute?
./sbin/zabbix_proxy -R diaginfo="preprocessing"
google-pprof -text ./sbin/zabbix_proxy ./pp_manager.0001.heap
Now I am using version build from source with link_tc_malloc_7_0_v2.diff patch. Rocky Linux doesn't have google-pprof, only pprof, but I'm having some errors when using it. Am I doing what needs to be done?
"export MALLOC_TRIM_THRESHOLD_=134217728" This must work with version, downloaded from repo? | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Mar 13 ] | ||||||||||||||||||||||||||||||
Yes, it should work from repo but need to set it from same location where zabbix proxy is launched so it's visible, but it's not for tcmalloc, only glibc. Could also try pprof or attach generated file here. Actually it should periodically write memory usage to log so you can check that to see and print with pprof, also could try applying patch ZBX-26154 to see if there is lots of data pushed to preprocessing manager, new key zabbix[preprocessing_size] is added there. | ||||||||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Apr 09 ] | ||||||||||||||||||||||||||||||
Was it possible to determine items that caused memory to grow or no exact item Dimasmir ?
select key_,status from items where key_ like 'vmware.eventlog%' and status=0;
| ||||||||||||||||||||||||||||||
Comment by Smirnov Dmitriy [ 2025 Apr 11 ] | ||||||||||||||||||||||||||||||
Disabling all items and discovery rules one by one did not reduce the leakage.
|