[ZBX-4532] Change the type of memory to calculate "proc.mem" key from VmSize to VmRSS Created: 2012 Jan 10  Updated: 2017 May 30  Resolved: 2014 Jan 02

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 1.8.10
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: Oleksii Zagorskyi Assignee: Unassigned
Resolution: Won't fix Votes: 14
Labels: memory
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File oleksiy's_top.png    
Issue Links:
Duplicate
duplicates ZBXNEXT-1078 add fifth parameter to the key "proc.... Closed

 Description   

It's suggested to change VmSize to the ... hmm ... seems ... VmRSS memory.
VmRSS will be better because VmSize is almost meaningless.

In the sources ./src/libs/zbxsysinfo/linux/proc.c#266 we can see that namely VIRT is calculated:
if (0 != strncmp(tmp, "VmSize:\t", 8))

Note in the "top" command VmRSS is a RSS column and VmSize is a VIRT column.

See examples:
nscd:

  1. egrep 'VmSize|VmRSS' /proc/`pidof nscd`/status
    VmSize: 225328 kB
    VmRSS: 2904 kB

synergy:

  1. egrep 'VmSize|VmRSS' /proc/`pidof synergys`/status
    VmSize: 286688 kB
    VmRSS: 38888 kB

automount:

  1. egrep 'VmSize|VmRSS' /proc/`pidof automount`/status
    VmSize: 314872 kB
    VmRSS: 4908 kB

See second example - attached "oleksiy's_top.png"
We are trying to monitor a memory consumed by zabbix_server daemon.

1st stage )
"zabbix_server18node1" is a daemon started with big caches in configuration:
CacheSize=1G
TrendCacheSize=256M
HistoryCacheSize=1G
HistoryTextCacheSize=256M
"zabbix_server18" is a daemon started with the default configuration.
2nd stage)
only "zabbix_server18" daemon running with the default configuration.

As you see amount of used memory for whole OS almost not changed for first and second stage.

According to the # man top:

VIRT – Virtual Image (kb)
The total amount of virtual memory used by the task. It includes all code, data and shared libraries plus pages that have been swapped out and pages that have been mapped but not used.

RES – Resident size (kb)
The non-swapped physical memory a task has used.

These values according to the # man ps:
VSZ virtual memory size of the process in KiB (1024-byte units). Device mappings are currently excluded; this is subject to change. (alias vsize).

RSS resident set size, the non-swapped physical memory that a task has used (in kiloBytes). (alias rssize, rsz).

Values for my configuration:

  1. zabbix_get -s localhost -k 'proc.mem[,,,zabbix18]'
    4259160064
  2. zabbix_get -s localhost -k 'proc.mem[,,,18node1]'
    78514339840

Calculation for "zabbix_server18node1" daemon: 28 process (default configuration) * 2.6G (VIRT per process) = ~73GB which ~ equals received 78514339840

So the key "proc.mem[]" calculates virtual memory (VIRT) which seems is not the same as "memory used" according to documentation.
That is why it should be reconsidered.



 Comments   
Comment by dimir [ 2012 Jan 11 ]

After talking to sasha we decided to not include that into ZBXNEXT-1024 as this looks like a different issue.

Comment by Oleksii Zagorskyi [ 2012 Jan 11 ]

Mentioned screenshot attached (sorry, I forgot about it )

Comment by Oleksii Zagorskyi [ 2012 Jan 15 ]

Ok, after googling, reading couple of articles, thinking and analyzing I have couple additional thoughts.

Recently in the ZBXNEXT-1024 we reconsidered and have significantly improved/fixed calculation of used/free memories for whole system. That's great!
The memory used for particular process has similar "problematic", i.e. each process has different types of memory used.

So maybe instead of use suggested VmRSS by default we have to perform intelligent parsing/calculation for the key "proc.mem" ?

Articles:
http://virtualthreads.blogspot.com/2006/02/understanding-memory-usage-on-linux.html
http://www.opennet.ru/base/sys/pmap_memory.txt.html (translated to Russian)

Here is results of some tests, used default zabbix_server.conf, server compiled with all possible external libraries enabled.

Part of server log (sorted):

 31229:20120115:133618.422 server #0 started [main process]
 31230:20120115:133618.416 server #1 started [configuration syncer #1]
 31231:20120115:133618.416 server #2 started [db watchdog #1]
 31232:20120115:133619.346 server #3 started [poller #1]
 31233:20120115:133619.426 server #4 started [poller #2]
 31234:20120115:133619.300 server #5 started [poller #3]
 31235:20120115:133619.332 server #6 started [poller #4]
 31236:20120115:133619.140 server #7 started [poller #5]
 31237:20120115:133619.335 server #8 started [unreachable poller #1]
 31238:20120115:133618.430 server #9 started [trapper #1]
...

Part of `top` (customized, added extra columns):

  PID S USER      VIRT  RES  SHR SWAP CODE DATA %CPU %MEM    TIME+   PPID COMMAND
31229 S zabbix    143m 2912 1724 141m  572 1068    0  0.0   0:00.00     1 zabbix_server18
31230 S zabbix    143m 2188  988 141m  572 1068    0  0.0   0:00.02 31229 zabbix_server18
31231 S zabbix    143m 1936  740 141m  572 1068    0  0.0   0:00.02 31229 zabbix_server18
31232 S zabbix    166m  13m 1584 153m  572  11m    0  0.2   0:00.42 31229 zabbix_server18
31233 S zabbix    166m  13m 1584 153m  572  11m    0  0.2   0:00.42 31229 zabbix_server18
31234 S zabbix    166m  13m 1584 153m  572  11m    0  0.2   0:00.43 31229 zabbix_server18
31235 S zabbix    166m  13m 1584 153m  572  11m    0  0.2   0:00.42 31229 zabbix_server18
31236 S zabbix    166m  13m 1948 153m  572  11m    0  0.2   0:00.46 31229 zabbix_server18
31237 S zabbix    166m  13m 1776 153m  572  11m    0  0.2   0:00.44 31229 zabbix_server18
31238 S zabbix    143m 1868  680 142m  572 1068    0  0.0   0:00.00 31229 zabbix_server18
...

Comparing of a trapper (31238) and a poller (31232) respectively:

# pmap -d 31238 | grep mapped
mapped: 147332K    writeable/private: 2164K    shared: 36920K

# pmap -d 31232 | grep mapped
mapped: 170928K    writeable/private: 13124K    shared: 36920K
# cat /proc/31238/status | grep Vm
VmPeak:	  147336 kB
VmSize:	  147332 kB
VmLck:	       0 kB
VmHWM:	    1868 kB
VmRSS:	    1868 kB
VmData:	     980 kB
VmStk:	      88 kB
VmExe:	     572 kB
VmLib:	   13480 kB
VmPTE:	     212 kB
# cat /proc/31232/status | grep Vm
VmPeak:	  170928 kB
VmSize:	  170928 kB
VmLck:	       0 kB
VmHWM:	   13660 kB
VmRSS:	   13660 kB
VmData:	   11692 kB
VmStk:	     336 kB
VmExe:	     572 kB
VmLib:	   13480 kB
VmPTE:	     248 kB

Just conclusion: the poller(s) has much bigger VmData (DATA in the top) than trapper(s), but it's possible that even it could be shared between processes. Or not?

Only a Discoverer process has the same increased value of VmData, all other processes have it the same small, similar to the trapper.

Comment by Frédéric DROUET [ 2012 Feb 08 ]

Can we expect to have this in a near futur for Linux ?

Comment by Oleksii Zagorskyi [ 2012 Feb 08 ]

Frédéric, I think we have great chances that it will be somehow implemented in the future, but when - it's an open question.
I agree that would be nice to have it for agent v2.0.0 namely because it's a significant change and it would be better to implement it in a major version.
2.0.0 will be next major version.

Comment by Oleksii Zagorskyi [ 2012 Mar 15 ]

This issue will be reconsidered only after closing ZBX-4602
At the moment I'm not sure in the sense of changing VmSize to VmRSS.

Comment by Alexander Vladishev [ 2012 May 09 ]

Related issue: ZBX-3897

Comment by Jim Riggs [ 2012 May 09 ]

I can provide FreeBSD code if needed when this proceeds.

Comment by Steve mushero [ 2012 Sep 01 ]

I don't understand why we can't this changed or added, as VSZ is totally useless on Linux, but RSS invaluable to tracking real RAM use by things like Apache, MySQL, etc. (even better would be RSS+swapped segments but this is hard). This is so important we are working on modifying the source to get this. Help me understand why making this actually useful is so difficult.

Then I'd like to have some options to the key like sum, min, max, avg, etc. for all processes with that name - we need this mostly to sum all the RSS used by Apache which is critical to monitoring RAM usage on very busy web servers, or any multi-process system.

Comment by Yoav Steinberg [ 2012 Sep 02 ]

As a workaround we add the following to the agent's conf:
UserParameter=proc.mem.rss[*],ps ho rss `pgrep -of "$1"`

Comment by Oleksii Zagorskyi [ 2012 Sep 02 ]

Yoav, your approach will calculate memory only for ONE process, it's not very usable in practice.

Comment by Tom Llewellyn-Smith [ 2012 Dec 10 ]

In the past I have needed to calculate total RSS memory for a group of processes (apache2).

Below is a link to the script I use (only tested on Linux), you will need to add a UserParameter to your agent config and install the script on the servers you wish to monitor.

http://onixconsulting.co.uk/scripts/process-rss.pl.txt

Hopefully it will help somebody,

Tom

Comment by Levente Farkas [ 2013 Jul 22 ]

+1 to add replace virt with rss.

Comment by richlv [ 2014 Jan 02 ]

just switching to rss would not be good enough as, according to docs, that would ignore any swap usage.

i suspect that nothing we could come up would work for everybody - somebody might want to include vmlib, somebody might want to exclude it... maybe allowing to specify what should be included is a solution ?

Comment by Yoav Steinberg [ 2014 Jan 02 ]

Specifying what you want (rss/virt/..) in the key sounds ideal to me.

Comment by Oleksii Zagorskyi [ 2014 Jan 02 ]

Unexpected that I did not previously mention here my own ZBXNEXT-1078

Looks like here almost no chances that current issue will be implemented as originally requested, so we have no choice than close it and hope for the ZBXNEXT-1078

CLOSED as won't fix.

Comment by Andris Mednis [ 2014 Oct 22 ]

This issue is being solved as part of ZBXNEXT-1078.

Generated at Tue Apr 23 19:19:12 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.