[ZBX-2966] negative value vfs.fs.size amount of free space on partition Created: 2010 Aug 31  Updated: 2017 May 30  Resolved: 2015 Nov 13

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 1.8.3
Fix Version/s: 3.0.0alpha4

Type: Incident report Priority: Major
Reporter: rootd Assignee: Unassigned
Resolution: Fixed Votes: 9
Labels: agent, freebsd
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

FreeBSD 8.1 RELEASE


Attachments: JPEG File 2010-08-31_170544.jpg     JPEG File 2010-08-31_170614.jpg     File zabbix-2.4.6-negative-vfs-fs-size.patch    
Issue Links:
Duplicate
is duplicated by ZBX-3388 Partition size lower than 0 show as 1... Closed
is duplicated by ZBX-9482 %Used overflow on RAID ~80TB Closed

 Description   

When there is no space left on a partition, FreeBSD has a negative space available,
As a result, the value vfs.fs.size out of range, and takes great importance. As a result, did not trigger. The problem is there. No alerts.

See screenshots....



 Comments   
Comment by Frank Wall [ 2011 Feb 22 ]

I have the same problem. It does NOT appear with FreeBSD 7.1, but after upgrading to FreeBSD 7.3 I ran into the same issue. The Zabbix Frontend shows 16 EB free disk space, while it actually is 0 Bytes (or in FreeBSD notation: -40 MB).

Comment by Ilyas [ 2011 Apr 19 ]

The problem affects also vfs.fs.size[/somepath,pfree] and vfs.fs.size[/somepath,pused].
Seems like bug which following monitoring to unusable state.

[[email protected]]# zabbix_get -s zbxagent.local -k 'vfs.fs.size["/mnt/disk",pused]'
-2090496434960.400879
[[email protected]]# zabbix_get -s zbxagent.local -k 'vfs.fs.size["/mnt/disk",pfree]'
2090496435060.400879

[[email protected] ~]# df -h | grep /mnt/disk
/dev/ufs/yaj13pgx 1.8T 1.7T -7.0G 100% /mnt/disk

In our cluster we have above 500 hard drives for first time and its count continuously growing.

There is more one trouble looks like (cut from zabbix_server.log):
92667:20110419:084334.369 Item [zbxagent.local:vfs.fs.size["/mnt/ybj0beyd",used]] error: Received value [1313742694400.000000] is not suitable for value type [Numeric (float)]
92662:20110419:084339.394 Item [zbxagent.local:vfs.fs.size["/mnt/yaj0gkjx",used]] error: Received value [1803575007232.000000] is not suitable for value type [Numeric (float)]
92664:20110419:084344.422 Item [zbxagent.local:vfs.fs.size["/mnt/yaj13pgx",used]] error: Received value [1814660216832.000000] is not suitable for value type [Numeric (float)]

Agent and server both:
Zabbix Agent v1.9.3 (revision 18740) (28 March 2011)
Zabbix Agent v1.9.3 (revision 18740) (28 March 2011)

(yes, we're ugins auto discovery for filesystems)

OS is FreeBSD 8.2-STABLE.

Comment by Ilyas [ 2011 Apr 19 ]

Oh, I forget write, we using 2TB hard drives.

Yes, if you could provide patches I run zabbix with its.

Comment by Alexander Vladishev [ 2013 Feb 17 ]

Related issues: ZBX-1274, ZBX-5804

Comment by Sap [ 2013 Mar 15 ]

Same problem

FreeBSD 8.2-RELEASE #0: Fri Feb 18 02:24:46 UTC 2011

# zabbix_agent -V
Zabbix Agent v1.8.3 (revision 13928) (16 August 2010)
Compilation time: Dec 13 2010 22:12:16

# df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/ad0s1a    4.5G    4.1G    -27M   101%    /
devfs          1.0K    1.0K      0B   100%    /dev

# zabbix_agent -t 'vfs.fs.size[/,pfree]'
vfs.fs.size[]                                 [d|854069308670196.500000]

After remove some files to free space:

# df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/ad0s1a    4.5G    3.7G    418M    90%    /
devfs          1.0K    1.0K      0B   100%    /dev

# zabbix_agent -t 'vfs.fs.size[/,pfree]'
vfs.fs.size[]                                 [d|9.916499]

-------
And most important: zabbix item got big number and switched to "Not avaliable state", so I never be informed about this issue.

Comment by Riaan Olivier [ 2013 Jun 03 ]

Got the same issue on FreeBSD 7.3 and 8.1

# zabbix_agent -V
Zabbix agent v2.0.5 (revision 33558) (12 February 2013)
Compilation time: Apr 12 2013 11:05:51
# df -h
Filesystem            Size    Used   Avail Capacity  Mounted on
/dev/mirror/gm0s1f    1.9G    1.9G   -158M   109%    /var
# zabbix_agentd -t "vfs.fs.size[/var,free]"
vfs.fs.size[/var,free][/var,free]             [u|18446744073544177664]

# zabbix_agentd -t "vfs.fs.size[/var,pfree]"
vfs.fs.size[/var,pfree][/var,pfree]           [d|1979319602661605.750000]

Can we request for this to be fixed as soon as possible, because it causes triggers to be missed and downtime on production environments.

Comment by Aleksandrs Saveljevs [ 2015 Mar 09 ]

Taking FreeBSD code in src/libs/zbxsysinfo/freebsd/diskspace.c as an example, below is the status of the current implementation, which seems to have remained unchanged since Zabbix 1.7. There are no comments regarding the implementation, but it seems that it was written with the intention to mimic df output.

Our code is as follows, supplemented with comments in each conditional describing the corresponding output in df, according to df source code at https://www.gitorious.org/freebsd/freebsd-head/source/214589d0d7e189b66514f6098f7c2a2c9b61dd87:bin/df/df.c#L239 :

#ifdef HAVE_SYS_STATVFS_H
#	define ZBX_STATFS	statvfs
#	define ZBX_BSIZE	f_frsize
#else
#	define ZBX_STATFS	statfs
#	define ZBX_BSIZE	f_bsize
#endif

struct ZBX_STATFS	s;

if (0 != ZBX_STATFS(fs, &s))
	return SYSINFO_RET_FAIL;

// uint64_t   f_blocks;	     /*	total data blocks in filesystem	*/
// uint64_t   f_bfree;	     /*	free blocks in filesystem */
// (u)int64_t f_bavail;	     /*	free blocks avail to non-superuser */

if (NULL != total)
{
	// Size:
	// f_blocks

	*total = (zbx_uint64_t)s.f_blocks * s.ZBX_BSIZE;
}

if (NULL != free)
{
	// Avail:
	// f_avail

	*free = (zbx_uint64_t)s.f_bavail * s.ZBX_BSIZE;
}

if (NULL != used)
{
	// Used:
	// f_blocks - f_bfree

	*used = (zbx_uint64_t)(s.f_blocks - s.f_bfree) * s.ZBX_BSIZE;
}

if (NULL != pfree)
{
	if (0 != s.f_blocks - s.f_bfree + s.f_bavail)
		*pfree = (double)(100.0 * s.f_bavail) / (s.f_blocks - s.f_bfree + s.f_bavail);
	else
		*pfree = 0;
}

if (NULL != pused)
{
	// Capacity:
	// (f_blocks - f_bfree) / (f_blocks - f_bfree + f_avail)

	if (0 != s.f_blocks - s.f_bfree + s.f_bavail)
		*pused = 100.0 - (double)(100.0 * s.f_bavail) / (s.f_blocks - s.f_bfree + s.f_bavail);
	else
		*pused = 0;
}

There are several things to note:

  • In our code, "pfree" is not the same as "free" / "total" and "pused" is not the same as "used" / "total".
  • "total", "free", "used" coincide with df output fields "Size", "Avail", "Used".
  • "pfree" does not have a corresponding field in df.
  • "pused" is different from "Capacity" field in df.
  • If statvfs() is used, "f_bavail" is unsigned. If statfs() is used, "f_bavail" is signed.
  • If statvfs() is used, even though "f_bavail" is unsigned, the value stored there may be a negative integer:
    # df -h
    Filesystem    Size    Used   Avail Capacity  Mounted on
    /dev/da0p2    3.7G    3.4G    -28M   101%    /
    devfs         1.0k    1.0k      0B   100%    /dev
    

    This output corresponds to "f_avail" of -7208, even though the field itself is unsigned.

Comment by Aleksandrs Saveljevs [ 2015 Mar 09 ]

Some of the complications with mimicking df are described above. Here is another one: if we wish "free" to return a negative value as in df, then we have to make it return a float if "f_bavail" is negative. However, we still have to return an unsigned integer in case "f_bavail" is non-negative, because our floats are only limited to 10^12 and this value is too low for modern drives.

This would be good enough for "zabbix_get", but it is of no use for Zabbix server: it is not possible for an item to accept both large unsigned integers and negative values. Therefore, an item will become unsupported in case "free" is negative.

One solution is to let it be and offer users to trigger based on "pfree" (which we shall also fix to return negative percentage) instead of "free": it is a floating-point item and should always work. Item "free" will then remain unreliable for triggering.

Another solution would be to abandon the idea of mimicking df behavior, but the changes in 1.7 were made specifically to look like df. Going back might not be an option.

Comment by Aleksandrs Saveljevs [ 2015 Jun 04 ]

Specification is necessary before proceeding with development.

Comment by dimir [ 2015 Oct 28 ]

Man tunefs

-m minfree
             Specify the percentage of space held back from normal users; the
             minimum free space threshold.  The default value used is 8%.
             Note that lowering the threshold can adversely affect performance:

             o   Settings of 5% and less force space optimization to always be
                 used which will greatly increase the overhead for file
                 writes.

             o   The file system's ability to avoid fragmentation will be
                 reduced when the total free space, including the reserve,
                 drops below 15%.  As free space approaches zero, throughput
                 can degrade by up to a factor of three over the performance
                 obtained at a 10% threshold.

             If the value is raised above the current usage level, users will
             be unable to allocate files until enough files have been deleted
             to get under the higher threshold

So, there is a reserved disk space that is available to root but not others. And when this reserved space is hit the value of available space becomes negative:

[build@freebsd73 ~]$ df -h /
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/da0s1a    1.7G    1.6G    -12K   100%    /

[build@freebsd73 ~]$ whoami
build

[build@freebsd73 ~]$ pwd
/home/build

[build@freebsd73 ~]$ echo foo > foo
/: write failed, filesystem is full
-bash: echo: write error: No space left on device

[build@freebsd73 ~]$ su  
Password:

[root@freebsd73 /home/build]# whoami
root

[root@freebsd73 /home/build]# pwd
/home/build

[root@freebsd73 /home/build]# echo foo > foo

[root@freebsd73 /home/build]# cat foo 
foo
[root@freebsd73 /home/build]# uname -a
FreeBSD freebsd73 7.3-RELEASE FreeBSD 7.3-RELEASE #0: Sun Mar 21 06:15:01 UTC 2010     [email protected]:/usr/obj/usr/src/sys/GENERIC  i386
Comment by dimir [ 2015 Oct 30 ]

For FreeBSD we have 2 options how to get filesystem data:

  • statfs
  • statvfs

The decision is made at compile time and statvfs() is preferable. Here's a snippet from statvfs man page

IMPLEMENTATION NOTES
     The statvfs() and fstatvfs() functions are implemented as wrappers around
     the statfs() and fstatfs()	functions, respectively.  Not all the informa-
     tion provided by those functions is made available	through	this inter-
     face.

There is a difference in statfs and statvfs structures which makes it impossible to detect negative value when using statvfs:

struct statfs {
     [...]
     int64_t  f_bavail;		     /*	free blocks avail to non-superuser */
     [...]
}
---------------------------------------------------------------------------------------------------------------
typedef __uint64_t __fsblkcnt_t;
typedef __fsblkcnt_t fsblkcnt_t;
struct statvfs {
     [...]
     fsblkcnt_t f_bavail;
     [...]
}

So, seems that when using statvfs we loose the signedness of filesystem sizes which makes it impossible for us to detect it. Using statfs we can get the negative value.

Comment by dimir [ 2015 Oct 30 ]

Too bad I missed same things already mentioned by asaveljevs.

Comment by dimir [ 2015 Oct 30 ]

We just had an internal discussion and it was decided that in this case we should detect negative size and change it to 0 (zero). We thought from a user perspective this would be the most convenient decision.

Comment by dimir [ 2015 Nov 04 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-2966 .

Comment by Andris Zeila [ 2015 Nov 06 ]

Successfully tested, please review minor changes in r56581

Comment by dimir [ 2015 Nov 06 ]

(1) [G] Moved ZBX_IS_TOP_BIT_SET macro sysinfo.h -> zbxtypes.h , please check.

wiper CLOSED, please review another improvement in ZBX_IS_TOP_BIT_SET macro r56583

Comment by dimir [ 2015 Nov 06 ]

The fix will only be available for trunk (3.0).

Comment by dimir [ 2015 Nov 06 ]

Fixed in pre-3.0.0alpha4 (r56585).

The fix is only available for 3.0 because the change might affect the returned value which might cause a regression. E. g. AIX with 32-bit stat[v]fs interface with big disks might report 0 if available disk space is 16 TB (considering 4096 block size). This behavior wasn't noticed but is possible in theory.

The patch for 2.4.6 is attached.

Comment by dimir [ 2015 Nov 09 ]

(2) [D] Documented here.

wiper CLOSED

Comment by Alexander Vladishev [ 2015 Nov 12 ]

(3) The code of vfs.fs.inode must be also reviewed and fixed

<dimir> Actually I have checked and tested vfs.fs.inode and there is no issue. The issue is with stat[v]fs structure field f_bavail, which is not used in case of vfs.fs.inode.

RESOLVED

sasha Thanks! CLOSED

Generated at Sat Apr 20 10:55:41 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.