[ZBX-9142] Incorrect available memory calculation Created: 2014 Dec 14  Updated: 2017 May 30  Resolved: 2015 Jun 04

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 2.2.7, 2.4.2
Fix Version/s: 2.5.0

Type: Incident report Priority: Blocker
Reporter: Alexey Pustovalov Assignee: Unassigned
Resolution: Fixed Votes: 3
Labels: linux, memory
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux with high SReclaimable


Issue Links:
Duplicate

 Description   

Zabbix agent does not consider SReclaimable kernel option when calculate free memory. The option is like cached memory and should be considered as free memory:

MemAvailable: An estimate of how much memory is available for starting new applications, without swapping. Calculated from MemFree, SReclaimable, the size of the file LRU lists, and the low watermarks in each zone. The estimate takes into account that the system needs some page cache to function well, and that not all reclaimable slab will be reclaimable, due to items being in use. The impact of those factors will vary from system to system.

https://www.kernel.org/doc/Documentation/filesystems/proc.txt



 Comments   
Comment by Aleksandrs Saveljevs [ 2015 May 07 ]

Currently, we calculate "vm.memory.size[available]" as follows (written in terms of /proc/meminfo):

MemFree + Buffers + Cached

The proposal in the issue description does not define the solution precisely, but one attempt at taking SReclaimable into account would be to calculate "vm.memory.size[available]" as follows:

MemFree + Buffers + Cached + SReclaimable

This seems to be how Monit does it: see https://bitbucket.org/tildeslash/monit/src/bbf3e4a22918774b58ff0366b9ca6b1ab57572d0/src/process/sysdep_LINUX.c?at=master#cl-333 for source code and https://bitbucket.org/tildeslash/monit/issue/71/ for a similar ticket.

The problem seems to be that this calculation is different from MemAvailable, as quoted in the issue. Consider the following example:

$ cat /proc/meminfo
MemTotal:        8066548 kB
MemFree:         2967140 kB
MemAvailable:    4909836 kB
Buffers:          268648 kB
Cached:          2098364 kB
...
SReclaimable:     221116 kB

Calculation:

2967140 (MemFree) + 268648 (Buffers) + 2098364 (Cached) + 221116 (SReclaimable) = 5555268

This is different from MemAvailable, which is 4909836.

So, since we are reading /proc/meminfo for "vm.memory.size[available]" anyway, I wonder whether we can just take MemAvailable, because it is a system's native estimate.

Note that, from the quote in the issue description, "The estimate takes into account that the system needs some page cache to function well, and that not all reclaimable slab will be reclaimable, due to items being in use". Therefore, it seems that just adding SReclaimable to our current "vm.memory.size[available]" will not be correct.

Comment by Aleksandrs Saveljevs [ 2015 May 14 ]

So it was decided to fix the issue in trunk by taking "MemAvailable" from /proc/meminfo, if possible. Otherwise, we do the calculation as before, because simply adding "SReclaimable" to our current algorithm is not exactly correct.

In addition to the link in the issue description, useful reading regarding "MemAvailable" can be found on http://blog.famzah.net/2014/09/24/memavailable-metric-for-linux-kernels-before-3-14-in-procmeminfo/ (a nice, summarizing blog post), https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34e431b0ae398fc54ea69ff85ec700722c9da773 (commit that introduces "MemAvailable" in Linux 3.14), and https://github.com/famzah/linux-memavailable-procfs (a script by the author of the blog post to emulate "MemAvailable" on Linux kernels prior to 3.14, which we might theoretically use in the future).

Quoting Linux commit message here, because it is a nice explanation of why we should use "MemAvailable" for vm.memory.size[available]:

Many load balancing and workload placing programs check /proc/meminfo to estimate how much free memory is available. They generally do this by adding up "free" and "cached", which was fine ten years ago, but is pretty much guaranteed to be wrong today.

It is wrong because Cached includes memory that is not freeable as page cache, for example shared memory segments, tmpfs, and ramfs, and it does not include reclaimable slab memory, which can take up a large fraction of system memory on mostly idle systems with lots of files.

Currently, the amount of memory that is available for a new workload, without pushing the system into swap, can be estimated from MemFree, Active(file), Inactive(file), and SReclaimable, as well as the "low" watermarks from /proc/zoneinfo.

However, this may change in the future, and user space really should not be expected to know kernel internals to come up with an estimate for the amount of free memory.

It is more convenient to provide such an estimate in /proc/meminfo. If things change in the future, we only have to change it in one place.

Another useful link is https://bugzilla.kernel.org/show_bug.cgi?id=77141 , which explains why "MemFree" can be higher than "MemAvailable".

Comment by Aleksandrs Saveljevs [ 2015 May 15 ]

Using "MemAvailable" (when possible) is implemented in development branch svn://svn.zabbix.com/branches/dev/ZBX-9142-trunk .

The fix improves function byte_value_from_proc_file() in such a way that we can read either "MemAvailable" or "Cached" (exactly one of which is required for "vm.memory.size[available]") in one go, without the need to open "/proc/meminfo" two times.

Comment by Andris Zeila [ 2015 May 22 ]

Successfully tested

Comment by Aleksandrs Saveljevs [ 2015 May 27 ]

Available in pre-2.5.0 (trunk) r53807.

Comment by Aleksandrs Saveljevs [ 2015 May 27 ]

Documented at the following locations:

sasha CLOSED

Generated at Fri Apr 26 11:50:06 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.