[ZBX-9722] postgresql memory leak Created: 2015 Jul 21  Updated: 2017 May 30  Resolved: 2015 Jul 21

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.4.5
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: Peter Slavov Assignee: Unassigned
Resolution: Won't fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File zabbix_server.conf    

 Description   

Hi,
For some time my company is using zabbix. We have a few hundred servers that we monitor with zabbix with a postgresql database. The database is ~50 GB.
The problem is happening when zabbix server is restarted for some reason and some cache from zabbix have to be flushed to the database. I see active queries in the database similar to this:

update trends set num=60,value_min=100.000000,value_avg=100.000000,value_max=100.000000 where itemid=37068 and clock=1437400800; 
 update trends set num=60,value_min=99.910962,value_avg=99.910995,value_max=99.911000 where itemid=27705 and clock=1437400800;  
update trends set num=60,value_min=0.000000,value_avg=0.000000,value_max=0.000000 where itemid=31660 and clock=1437400800; 
 update trends set num=60,value_min=96.644629,value_avg=96.644700,value_max=96.644709 where itemid=37192 and clock=1437400800; 
... more like this

When this happens, on the postgresql server (which is different server) the full amount of RAM + swap starts to be taken until the whole memory is full and the queries crashes with "out of memory" error.
After a while (~10-20min) all queries like this are over and the server load is back to normal.
The problem is that the updates them-self are not so consuming at all, and I have no idea why it is consuming so much memory. Also in the other time the server is not loaded at all for weeks.
I am attaching my server conf



 Comments   
Comment by Aleksandrs Saveljevs [ 2015 Jul 21 ]

It should be noted just in case that the way you have obfuscated private information in "zabbix_server.conf" seems suspicious. You might wish to remove the current file and reattach a file obfuscated properly.

Comment by richlv [ 2015 Jul 21 ]

that does not seem to be a zabbix issue at all - maybe pgsql is configured to use more memory for buffers than available, maybe something else. overall db activity should not push it out of available memory.

Comment by Peter Slavov [ 2015 Jul 21 ]

yes I did that thanks. This was some old configurations that wore commented a while back...

Comment by Peter Slavov [ 2015 Jul 21 ]

@richlv

Shared buffers in postgresql are configured to use 1GB and the memory consumed here is 4GB RAM + 8GB swap;
There is no other DB activity when this happens - I am monitoring the active processes in the databases;
Yes you are right this don't seems to be a zabbix issue even that the SQL query for mass update can be optimized here for postgres to be one query and not multiple which is probably also more optimized. I just wondered if you have seen this before and if you have some idea ..?

Comment by richlv [ 2015 Jul 21 ]

the sql queries to update multiple rows in one go are more complicated and we don't have tests showing performance being better one way or another.

if you do tests, please do attach the results here, then we would have something to work from.

Comment by Peter Slavov [ 2015 Jul 21 ]

The difference here can be one execution plan and one lock VS multiple execution plans and multiple locks over the table,
but you are probably correct - I don't have any testing yet - maybe in the future.
And anyway in this case even if there is any difference it should be small, and not crash the whole database... I think.

Comment by Peter Slavov [ 2015 Jul 22 ]

Hi,
I realize now that I didn't tell you that a long time ago I had to partition the historical tables including trends and trends_uint, because zabbix was very hard to use back then and this was the solution I found on the net.
Also I saw that this was the problem in this case - I removed partitions now and I cannot replicate the issue any more, but I can tell you what my test showed:
1. WHEN tables are partitioned with 50 updates:

  • multiple updates in one string crashed the database and cannot finish most of the time;
  • one update statement - finishes for ~ 40 sec, can be optimized for less then a second;

2. WHEN tables are not partitioned with 50 updates:

  • both update types finishes for less then a 200ms;

Anyway - I realize that this will never be changed, because there is no value for you, because you don't recommend partitioning.
Thanks for the help

Generated at Fri Mar 29 03:27:23 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.