[ZBX-22567] latest data shows 99% used space, trigger not firing up Created: 2023 Mar 22 Updated: 2024 May 08 Resolved: 2024 May 08 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S), Templates (T) |
Affects Version/s: | 6.0.14 |
Fix Version/s: | 6.0.30rc1, 6.4.15rc1, 7.0.0rc1 |
Type: | Problem report | Priority: | Minor |
Reporter: | Patrick | Assignee: | Andrey Tocko (Inactive) |
Resolution: | Fixed | Votes: | 1 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: |
![]() ![]() ![]() ![]() ![]() |
||||||||
Issue Links: |
|
||||||||
Team: |
Description |
Steps to reproduce:
df -h: /dev/mapper/stor11tblvm 17T 17T 14M 100% /stor11tb2lv full. # sudo -u zabbix df -h | grep stor /dev/mapper/storage-stor11tblvm 17T 17T 14M 100% /stor11tb2lv works here as well, no crazy security policy applied gui screenshots attached. No problems indicated, but latest data shows 99% pused. Macros inherited correctly. I've hit against this issue several times - versions 6.2 and 6.0. Only on relatively larger fs > 12TB at last. I've also observed situtation, where trigger fired and disapeared after unknown time (was unable to reproduce this situation though). Seems like some condition is making trigger shut. Monitored host is virtual machine run on Proxmox. |
Comments |
Comment by Patrick [ 2023 Mar 22 ] | ||
it's an error in calculating trigger and/or, though I too fresh to debug it. It you leave only one expression, e.g.
last(/Linux by Zabbix agent active/vfs.fs.size[{#FSNAME},pused])>{$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"}
and remove second joint of estimated-time-to-disk-fillup and - The disk free space is less than It's default template, it people relay on it, and drives just got bigger and bigger.
EDIT: Pasted another incorrect attempt of cloning template (DF_Linux..), should be original as I stated previously and as it rwas) | ||
Comment by Alex Kalimulin [ 2023 Mar 23 ] | ||
Was your database created from scratch for 6.0 or was it upgraded from the previous versions? If latter, did you convert to the extended double range? What do you see for vfs.fs.dependent.size[{#FSNAME},used] and vfs.fs.dependent.size[{#FSNAME},total] in Latest data? | ||
Comment by Patrick [ 2023 Mar 24 ] | ||
when I first hit this issue, it was upgrade from 6.0 to 6.2 with failed timescaledb attempt, so I thought (as you suggest) it was a problem with upgrades of some sort or my other misconfigurations. To eliminate this, I've set up brand new installation ) of 6.0 (later 6.2 same results), switched hosts to new instance and got the same result. As for now from affected host:
| ||
Comment by Alex Kalimulin [ 2023 Mar 24 ] | ||
Are you sure? Did you change {$VFS.FS.FREE.MIN.CRIT}? The default value in the standard template is 5GB whereas your remaining free space is 2TB which is well above the trigger threshold. | ||
Comment by Patrick [ 2023 Mar 24 ] | ||
I did not touch this macro at all. It's still there. I was playing with {$VFS.FS.PUSED.MAX.WARN} and .CRIT , which was a bit more natural when migrating from Icinga. So I guess I hit some kind of border conditions, where at large filesystems VFS.FS.FREE.MIN.CRIT/WARN are never met, as it goes:
so I got: true AND (false OR false) resulting in false.
If I'm getting this right after your hints - that would explain all my mess. Still - default values works fine in many other cases (even with my other hosts) with filesystems smaller. One have to be aware of how the trigger is calculated and enumerate own points for $VFS.FS.FREE.MIN.CRIT/WARN in GB to have it working anytime/anywhere. Since hdd are getting rather bigger than smaller, this problem may surface again.
| ||
Comment by Patrick [ 2023 Mar 24 ] | ||
Just got another branch of this issue:
windows host, drive c: 255GB, 93% full,16GB free space,
{$VFS.FS.PUSED.MAX.CRIT} = 90 {$VFS.FS.PUSED.MAX.WARN} = 80 VFS.FS.FREE.MIN.CRIT/WARN = 5/10GB.
No trigger, no warning, no nothing, no problems. So it not only appears with larger filesystems. It will be too late if I see that trigger on.
| ||
Comment by Thomas Espe [ 2023 Dec 29 ] | ||
This really should let you alert on PUSED.WARN/CRIT only, and disregard the hardcoded GB-values (or at least choose to rely only on the percentages). Those does not make any sense for larger file systems. And the timelimit "the disk will be full in less than 24 hours" is also kinda non-sensical (and it seems it can't be tweaked?), sometimes you need alerting well before 24 hours before a filesystem goes full to plan expansions. |