-
Problem report
-
Resolution: Fixed
-
Trivial
-
5.4.0rc1
-
None
-
FreeBSD, and probably NetBSD, OpenBSD, and OSX too.
-
Sprint 76 (May 2021)
-
0.5
Steps to reproduce:
- Use VFS_FS_DISCOVERY on a host with many ZFS file systems and heavy control path activity (zfs snapshot, zfs destroy, zfs recv, etc)
Result:
VFS Discovery will be very slow. It can easily exceed the maximum timeout period. These leads to plentiful false alarms about "Zabbix agent down on host ..."
Analysis:
The VFS_FS_DISCOVERY function tries to discovery every mounted file system. On the BSDs, it uses `getmntinfo`. But it sets the mode argument to `MNT_WAIT`. That means that the kernel effectively calls `statfs` on every single file system in order to ensure that fields like `f_bfree` are up to date. Not only is that expensive in general, but on ZFS such calls can block for a long time if there are operations like a `zfs destroy` in process.
In fact, Zabbix doesn't even use `f_bfree` or any of the other fields that require frequent updates. The only fields that VFS_FS_DISCOVERY uses are `f_mntonname` and `f_fstypename`. Those two will always be up-to-date even without `MNT_WAIT`, except temporarily while a file system is being unmounted. So there's no reason to use `MNT_WAIT`.
On Solaris, Zabbix simply reads `/etc/mnttab`, and on Linux it reads `/proc/mounts`. Neither of those does anything like what `getmntinfo` does with `MNT_WAIT`. In fact, they don't even report the used space of each file system.
In conclusion, Zabbix should replace all uses of `MNT_WAIT` with `MNT_NOWAIT`.