[ZBX-10783] Zabbix memory usage after upgrade from 2.4 to 3.0 Created: 2016 May 12  Updated: 2017 May 30  Resolved: 2016 Jul 04

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.0.2
Fix Version/s: 3.0.4rc1, 3.2.0alpha1

Type: Incident report Priority: Critical
Reporter: Dennis Nijhuis Assignee: Unassigned
Resolution: Won't fix Votes: 0
Labels: memoryleak
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File topmap.sh     File topslab.sh     Text File valgrind output.txt     PNG File zabbix-server.png     PNG File zabbix-server.png     PNG File zabbix.png    

 Description   

After upgrading our Zabbix server from v2.4 to v3.0.1 and also v3.0.2, Zabbix is using much more memory and the server get lack of free swap space.



 Comments   
Comment by Aleksandrs Saveljevs [ 2016 May 12 ]

How much more memory does Zabbix 3.0 use compared to Zabbix 2.4? Is there any sign of a memory leak? Are there any particular types of Zabbix processes (e.g., history syncers) which started using more memory after the upgrade?

Comment by Dennis Nijhuis [ 2016 May 12 ]

Our old server had 3GB memory and not all memory was used and our new server with version 3.0.2 have 4GB. There is a sign of a memory leak. The poller and history syncers use more and more memory after a restart of there zabbix-server service. On the memory graph, you can see the differences in memory usage after the upgrade. I have decrease the number of pollers, but after a time the processes use the memory and swap again.

Comment by Aleksandrs Saveljevs [ 2016 May 12 ]

The memory usage seems to grow quite fast - about 1 GB per day. What item types (Zabbix agent, Zabbix agent (active), SNMP, IPMI, simple checks) do you use (heavily) in your installation? Do you have a lot of unsupported items or triggers in an unknown state? Do you use proxies?

Comment by Dennis Nijhuis [ 2016 May 12 ]

We use zabbix agent, zabbix agent (active) and SNMP heavily in our installation. I have checked the unsupported items and triggers with an unknown state. We have a lot of unsupported items and in the log I see some items that become supported and unsupported. We have also a lot of triggers in an unknown state but nearly all these triggers are linked to disabled hosts.
Since a few weeks we use also a proxy server for a part of our hosts with zabbix agent and zabbix agent (active).

Comment by Aleksandrs Saveljevs [ 2016 May 12 ]

Was there a memory leak before you started using proxies?

Comment by Dennis Nijhuis [ 2016 May 12 ]

Yes we had the memory leak before we started using a proxy.

Comment by Aleksandrs Saveljevs [ 2016 May 12 ]

I have tried reproducing the issue based on the conjecture that it might have to do with a lot of unsupported items that change their state from supported to not supported and back (thus potentially leaking error messages), but the memory leak was not there.

Do you think it would be possible for you to install a proxy and clone some of the hosts to be monitored by that proxy to try to identify the cause (e.g., are SNMP items at fault)?

Comment by Dennis Nijhuis [ 2016 May 13 ]

I will install a proxy and will look at the memory usage on the proxy. I will also add some SNMP devices to another proxy.
Next week I will give you the results.

Comment by Dennis Nijhuis [ 2016 May 24 ]

I have connected two proxies to our zabbix server. One proxy for Windows servers and one proxy for Linux servers and SNMP devices like switches and access points. Since I connect the SNMP device to the proxy, the swap space grows slowly as you can see on the new memory graph. Some of these SNMP device became supported and unsupported when they are connected to the server. Since they are connected to the proxy. I don't see these messages in the logs.

Comment by Aleksandrs Saveljevs [ 2016 May 25 ]

So it seems like the memory leak is related to SNMP? What version of SNMP do you use for monitoring?

Comment by Dennis Nijhuis [ 2016 Jun 01 ]

I thought the memory leak was related to SNMP, but I one thing was not configured on the proxy. I had not installed unixODBC and FreeTDS. After installing this, I had the memory leak problem again, but now on the proxy.

We monitor some ODBC items with the mysql driver, but I had not installed this driver (also not on our new Zabbix 3.0 server) It seems like the memory leak is related to this.

Comment by Aleksandrs Saveljevs [ 2016 Jun 01 ]

Thank you for the update, Dennis!

Do you only monitor MySQL databases using ODBC or you monitor using FreeTDS driver (http://www.unixodbc.org/doc/FreeTDS.html), too? Do you use ODBC low-level discovery ( https://www.zabbix.com/documentation/3.0/manual/discovery/low_level_discovery#discovery_using_odbc_sql_queries )? Could you please post the library vesions? What operating system is Zabbix server running on?

Comment by Dennis Nijhuis [ 2016 Jun 01 ]

We only monitor MySQL databases with mysql-connector-odbc. For MS SQL databases we are using FreeTDS driver.
We do not use ODBC low-level discovery.

The Zabbix server and proxies are running on CentOS 7.2.

Library versions:

unixODBC.x86_64 2.3.1-11.el7
unixODBC-devel.x86_64 2.3.1-11.el7
freetds.x86_64 0.95.81-1.el7

Comment by Aleksandrs Saveljevs [ 2016 Jun 01 ]

Does the memory leak occur with MySQL, FreeTDS, or both?

Comment by Dennis Nijhuis [ 2016 Jun 01 ]

I think only with MySQL, but maybe also with FreeTDS.
I have disabled the items for MySQL monitoring and now the memory is not growing fast.
The FreeTFS driver is installed, but I forget to install the mysql-connector-odbc driver.
With the items enabled but no driver installed, the problem occured.

Comment by Dennis Nijhuis [ 2016 Jun 03 ]

The problem is with no mysql-connector-odbc driver installed and trying monitoring mysql items.
On new graph you see the memory usage of a proxy server with only one VM monitored.
I tried to install mysql-connector-odbc, but then the proxy crash every few minutes and start again.

Comment by Dennis Nijhuis [ 2016 Jun 17 ]

Any update for the crash problem with mysql-odbc connector installed and memory leak with item that will use mysql-odbc-connector but not installed?

Comment by Aleksandrs Saveljevs [ 2016 Jun 20 ]

Regarding the crash, if you are using MySQL for both Zabbix backend and ODBC monitoring, then it is likely to be a duplicate of ZBX-7665.

Regarding the leak, it seems that I have managed to reproduce it: memory usage grows quite quickly if we refer to a valid DSN that is present in odbc.ini, but whose *.so file specified in odbcinst.ini cannot be loaded.

Comment by Aleksandrs Saveljevs [ 2016 Jun 20 ]

This is not related to the leak, but the following code in odbc_Diag() looks suspicious:

while (0 != SQL_SUCCEEDED(SQLGetDiagRec(h_type, h, (SQLSMALLINT)rec_nr, sql_state, &native_err_code,
		err_msg, sizeof(err_msg), NULL)))
{
	zabbix_log(LOG_LEVEL_DEBUG, "%s(): rc_msg:'%s' rec_nr:%d sql_state:'%s' native_err_code:%ld "
			"err_msg:'%s'", __function_name, rc_msg, rec_nr, sql_state,
			(long)native_err_code, err_msg);
	if (sizeof(diag_msg) > offset)
	{
		offset += zbx_snprintf(diag_msg + offset, sizeof(diag_msg) - offset, "[%s][%ld][%s]|",
				sql_state, (long)native_err_code, err_msg);
	}
	rec_nr++;
}
*(diag_msg + offset) = '\0';

The last statement looks like it may access memory outside of "diag_msg" if "sizeof(diag_msg) == offset".

Comment by Aleksandrs Saveljevs [ 2016 Jun 20 ]

Setting to "Confirmed", although I am not sure yet that the problem is on Zabbix side.

Comment by Sandis Neilands (Inactive) [ 2016 Jun 21 ]

Tested on Linux Mint with unixODBC and MySQL driver from distribution's package repository:

  • unixodbc: 2.2.14p2-5ubuntu5 (2.2.1 was released in (!!!) 2002);
  • libmyodbc: 5.1.10-3.

During normal usage Valgrind complains about using uninitialized memory in unixodbc library.

==11431== Conditional jump or move depends on uninitialised value(s)
==11431==    at 0x10868677: sqlchar_as_sqlwchar (in /usr/lib/x86_64-linux-gnu/odbc/libmyodbc.so)
==11431==    by 0x1084C05E: SQLConnect (in /usr/lib/x86_64-linux-gnu/odbc/libmyodbc.so)
==11431==    by 0x58F35B3: SQLConnect (in /usr/lib/x86_64-linux-gnu/libodbc.so.1.0.0)
==11431==    by 0x50843E: odbc_DBconnect (odbc.c:207)
==11431==    by 0x4426A2: db_odbc_select (checks_db.c:172)
==11431==    by 0x4424B2: get_value_db (checks_db.c:239)
==11431==    by 0x434C72: get_value (poller.c:448)
==11431==    by 0x433D07: get_values (poller.c:661)
==11431==    by 0x4328E7: poller_thread (poller.c:852)
==11431==    by 0x4C33B0: zbx_thread_start (threads.c:128)
==11431==    by 0x41FDA9: MAIN_ZABBIX_ENTRY (server.c:948)
==11431==    by 0x4C0B4F: daemon_start (daemon.c:392)
==11431==  Uninitialised value was created by a heap allocation
==11431==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11431==    by 0x4E898B3: my_malloc (in /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18.0.0)
==11431==    by 0x10868727: sqlchar_as_sqlwchar (in /usr/lib/x86_64-linux-gnu/odbc/libmyodbc.so)
==11431==    by 0x1084C05E: SQLConnect (in /usr/lib/x86_64-linux-gnu/odbc/libmyodbc.so)
==11431==    by 0x58F35B3: SQLConnect (in /usr/lib/x86_64-linux-gnu/libodbc.so.1.0.0)
==11431==    by 0x50843E: odbc_DBconnect (odbc.c:207)
==11431==    by 0x4426A2: db_odbc_select (checks_db.c:172)
==11431==    by 0x4424B2: get_value_db (checks_db.c:239)
==11431==    by 0x434C72: get_value (poller.c:448)
==11431==    by 0x433D07: get_values (poller.c:661)
==11431==    by 0x4328E7: poller_thread (poller.c:852)
==11431==    by 0x4C33B0: zbx_thread_start (threads.c:128)

Also there is a memory leak when a nonexistent driver library is specified in the odbcinst.ini file.

==12197== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
==12201== 32,976 (128 direct, 32,848 indirect) bytes in 1 blocks are definitely lost in loss record 1,179 of 1,186
==12201==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12201==    by 0x799FE4B: __gconv_open (gconv_open.c:195)
==12201==    by 0x799F961: iconv_open (iconv_open.c:71)
==12201==    by 0x59271D1: ??? (in /usr/lib/x86_64-linux-gnu/libodbc.so.1.0.0)
==12201==    by 0x58F08C4: ??? (in /usr/lib/x86_64-linux-gnu/libodbc.so.1.0.0)
==12201==    by 0x58F356A: SQLConnect (in /usr/lib/x86_64-linux-gnu/libodbc.so.1.0.0)
==12201==    by 0x50843E: odbc_DBconnect (odbc.c:207)
==12201==    by 0x4426A2: db_odbc_select (checks_db.c:172)
==12201==    by 0x4424B2: get_value_db (checks_db.c:239)
==12201==    by 0x434C72: get_value (poller.c:448)
==12201==    by 0x433D07: get_values (poller.c:661)
==12201==    by 0x4328E7: poller_thread (poller.c:852)

Next I'll check if the same problems exist with the latest version of unixodbc.

Comment by Sandis Neilands (Inactive) [ 2016 Jun 21 ]

Compiled Zabbix with unixodbc 2.3.4. The first problem still persists (the problem is in mysql driver), however the second problem (memory leak) doesn't occur any more.

Comment by Sandis Neilands (Inactive) [ 2016 Jun 21 ]

Dennis, most likely this is a problem in either unixodbc or mysql driver. However it is strange that the problem manifested only after upgrade.

Just to confirm - memory usage when ODBC checks are disabled is normal (about the same as in 2.4)?

Would it be possible for you to perform the following?
1. Turn off the affected proxy.
2. Start the proxy in Valgrind (it will have significant performance hit). This will produce quite a lot of output in console, don't worry about that.

sudo valgrind --tool=memcheck --read-var-info=yes --track-origins=yes --leak-check=full ./zabbix_proxy

3. Wait for a few minutes.
4. Turn off the proxy again.
5. Attach everything that Valgrind prints after that to this thread.

Comment by Dennis Nijhuis [ 2016 Jun 21 ]

The memory usage is about the same as before the upgrade. Only difference is that on the new server the slab memory usage is higher than on the old server with Zabbix 2.4
Attached the output of a proxy with only 3 servers with mysql ODBC checks.

Comment by Sandis Neilands (Inactive) [ 2016 Jun 22 ]

Thank you for the output from Valgrind. Unfortunately it didn't provide any further leads.

Have you looked at /proc/slabinfo? You can use the attached topslab.sh script to monitor slab usage by type.

sudo watch -d -n 5 ./topslab.sh

You could also monitor the growth of the memory segments of the suspected Zabbix processes with the attached topmap.sh script (replace <PID> with the PID of the monitored process).

sudo watch -d -n 5 ./topmap.sh <PID>

Let us know (and attach output from the commands above) if you find a slab or memory segment of a Zabbix process growing inappropriately.

FYI, 23.06., 24.06. are bank holidays in Latvia, I'll get back to you earliest on Monday.

Comment by Sandis Neilands (Inactive) [ 2016 Jul 04 ]

dnijhuis, any update regarding information request above?

Summary of the investigation:

  • in older releases of UnixODBC there is a memory leak when driver library cannot be found. This is fixed at least in the UnixODBC 2.3.4 (but possibly in earlier releases as well).

Do you agree to close this ZBX?

Comment by Dennis Nijhuis [ 2016 Jul 04 ]

A few weeks ago I installed UnixODBC v2.3.4 and now it looks like the problem is solved.
After the upgrade I ran the above scripts but I saw no grow of Zabbix processes.
I agree to close this issue.

Comment by Sandis Neilands (Inactive) [ 2016 Jul 04 ]

Closing the issue since the memory leak is fixed in the latest version of UnixODBC. Nothing to fix in Zabbix.

Generated at Thu Apr 25 10:16:05 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.