[ZBX-22567] latest data shows 99% used space, trigger not firing up Created: 2023 Mar 22  Updated: 2024 May 08  Resolved: 2024 May 08

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S), Templates (T)
Affects Version/s: 6.0.14
Fix Version/s: 6.0.30rc1, 6.4.15rc1, 7.0.0rc1

Type: Problem report Priority: Minor
Reporter: Patrick Assignee: Andrey Tocko (Inactive)
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File screen 2.png     PNG File screen 3.png     PNG File screen1.png     PNG File screen4.png     PNG File screen5.png    
Issue Links:
Sub-task
part of ZBXNEXT-6698 Improve and fix filesystem free space... Closed
Team: Team INT

 Description   

Steps to reproduce:

  1. add host, install agent, apply template Linux by Zabbix agent active (stock, not modified at all)
  2. wait until mntpoint is discovered, and filed with stuff
  3. observe trigger not firing

 

df -h:
/dev/mapper/stor11tblvm   17T   17T   14M 100% /stor11tb2lv

full.

# sudo -u zabbix df -h | grep stor
/dev/mapper/storage-stor11tblvm   17T   17T   14M 100% /stor11tb2lv

works here as well, no crazy security policy applied

gui screenshots attached. No problems indicated, but latest data shows 99% pused. Macros inherited correctly.

I've hit against this issue several times - versions 6.2 and 6.0. Only on relatively larger fs > 12TB at last.  I've also observed situtation, where trigger fired and disapeared after unknown time (was unable to reproduce this situation though). Seems like some condition is making trigger shut.

Monitored host is virtual machine run on Proxmox.



 Comments   
Comment by Patrick [ 2023 Mar 22 ]

it's an error in calculating trigger and/or, though I too fresh to debug it.

It you leave only one expression, e.g.

last(/Linux by Zabbix agent active/vfs.fs.size[{#FSNAME},pused])>{$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"}

and remove second joint of estimated-time-to-disk-fillup and  - The disk free space is less than
{$VFS.FS.FREE.MIN.CRIT:"{#FSNAME}"} - it kicks in.

It's default template, it people relay on it, and drives just got bigger and bigger.

 

EDIT: Pasted another incorrect attempt of cloning template (DF_Linux..), should be original as I stated previously and as it rwas)

Comment by Alex Kalimulin [ 2023 Mar 23 ]

Was your database created from scratch for 6.0 or was it upgraded from the previous versions? If latter, did you convert to the extended double range? What do you see for vfs.fs.dependent.size[{#FSNAME},used] and vfs.fs.dependent.size[{#FSNAME},total] in Latest data?

Comment by Patrick [ 2023 Mar 24 ]

when I first hit this issue, it was upgrade from 6.0 to 6.2 with failed timescaledb attempt, so I thought (as you suggest) it was a problem with upgrades of some sort or my other misconfigurations. To eliminate this, I've set up brand new installation ) of 6.0 (later 6.2 same results), switched hosts to new instance and got the same result.

As for now from affected host:

 

 

 

Comment by Alex Kalimulin [ 2023 Mar 24 ]

The disk free space is less than {$VFS.FS.FREE.MIN.CRIT:"{#FSNAME}"} 

Are you sure? Did you change {$VFS.FS.FREE.MIN.CRIT}? The default value in the standard template is 5GB whereas your remaining free space is 2TB which is well above the trigger threshold.

Comment by Patrick [ 2023 Mar 24 ]

I did not touch this macro at all.  It's still there.  I was playing with {$VFS.FS.PUSED.MAX.WARN} and .CRIT , which was a bit more natural when migrating from Icinga.

So I guess I hit some kind of border conditions, where at large filesystems VFS.FS.FREE.MIN.CRIT/WARN are never met, as it goes:

 

 

 

  1. condition checks whether $VFS.FS.PUSED.MAX.CRIT is met (in my case: yes)
  2. next, we have AND with one of two other conditions:
    1. difference between fs.size (total - used) which should be less than VS.FS.FREE.MIN.CRIT/WARN (it is false in my case, still lot of free place, nevermind percentage of use - still more than 5/10GB)
    2. timeleft - let's asume there was not enough data to calculate, or whatever - false

so I got: true AND (false OR false) resulting in false.

 

If I'm getting this right after your hints - that would explain all my mess.  Still - default values works fine in many other cases (even with my other hosts) with filesystems smaller.  One have to be aware of how the trigger is calculated and enumerate own points for $VFS.FS.FREE.MIN.CRIT/WARN in GB to have it working anytime/anywhere. Since hdd are getting rather bigger than smaller, this problem may surface again.

 

 

Comment by Patrick [ 2023 Mar 24 ]

Just got another branch of this issue:

 

windows host, drive c: 255GB, 93% full,16GB free space,

 

{$VFS.FS.PUSED.MAX.CRIT} = 90

{$VFS.FS.PUSED.MAX.WARN} = 80

VFS.FS.FREE.MIN.CRIT/WARN = 5/10GB.

 

No trigger, no warning, no nothing, no problems. So it not only appears with larger filesystems. It will be too late if I see that trigger on.

 

 

 

 

Comment by Thomas Espe [ 2023 Dec 29 ]

This really should let you alert on PUSED.WARN/CRIT only, and disregard the hardcoded GB-values (or at least choose to rely only on the percentages). Those does not make any sense for larger file systems. And the timelimit "the disk will be full in less than 24 hours" is also kinda non-sensical (and it seems it can't be tweaked?), sometimes you need alerting well before 24 hours before a filesystem goes full to plan expansions.

Generated at Sun Jun 08 15:16:03 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.