Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-21466

Unavailable mount error in zabbix-agent2

XMLWordPrintable

    • Sprint 92 (Sep 2022)
    • 0.5

      With zabbix-agent2 running on virtual machines sometimes we get unsupported items for file system monitoring.

      In such cases random file systems on random hosts become unavailable with following error message:

      ZBX_NOTSUPPORTED: 'mount '***' is unavailable' to '***'
      

      Debug logs says something like that:

      Aug 11 11:59:13 zabbix_agent2[14758]: received passive check request: 'vfs.fs.size[/opt,total]' from '10.9.49.35'
      Aug 11 11:59:13 zabbix_agent2[14758]: [1] processing update request (1 requests)
      Aug 11 11:59:13 zabbix_agent2[14758]: [1] adding new request for key: 'vfs.fs.size[/opt,total]'
      Aug 11 11:59:13 zabbix_agent2[14758]: [1] created direct exporter task for plugin 'VfsFs' itemid:0 key 'vfs.fs.size[/opt,total]'
      Aug 11 11:59:13 zabbix_agent2[14758]: executing direct exporter task for key 'vfs.fs.size[/opt,total]'
      Aug 11 11:59:13 zabbix_agent2[14758]: failed to execute direct exporter task for key 'vfs.fs.size[/opt,total]' error: 'mount '/opt' is unavailable'
      Aug 11 11:59:13 zabbix_agent2[14758]: sending passive check response: ZBX_NOTSUPPORTED: 'mount '/opt' is unavailable' to '10.9.49.35'
      

      Such behavior starts with log message.

      $ sudo journalctl -u zabbix-agent2 --since '3 days ago' | grep 'timed out'
      Aug 08 15:33:26 nl-build17.local.profee.com zabbix_agent2[14758]: check 'vfs.fs.size[/opt,free]' is not supported: operation on mount '/opt' timed out
      

      After that file system never become available. Helps only restart of zabbix-agent2 service.

      We looked thru the source code of `VfsFs` module. It seems like this branch of code that makes file system available never will be executed.

      https://github.com/zabbix/zabbix/blob/master/src/go/plugins/vfs/fs/fscaller.go#L64-L70

      func (f *fsCaller) execute(path string) {
      	stats, err := f.fsFunc(path)
      
      	if isStuck(path) {
      		f.p.Debugf("mount '%s' has become available", path)
      		stuckMux.Lock()
      		stuckMounts[path] = false
      		stuckMux.Unlock()
      		return
      	}
              # ...
      

      Only one call of the 'execute' happens inside of 'run' function.

      https://github.com/zabbix/zabbix/blob/master/src/go/plugins/vfs/fs/fscaller.go#L41-L46

      func (f *fsCaller) run(path string) (stat *FsStats, err error) {
      	if isStuck(path) {
      		return nil, fmt.Errorf("mount '%s' is unavailable", path)
      	}
      
      	go f.execute(path)
              # ...
      

      This pieces of code looks vary strange. In both cases it checks the same condition. But in one case file system become available, in another remains unavailable.

        1. 2023-05-02-08.51.16-screenshot.png
          33 kB
          Ruedi Schwegler
        2. FIX_2_1_sec_timeout, 1%_no_timeout.png
          949 kB
          Artjoms Rimdjonoks
        3. fix_2_10_sec_timeout, 99% chance of timeout.png
          2.15 MB
          Artjoms Rimdjonoks
        4. NO_FIX_timeout_1_second_every_100th_second.jpg
          516 kB
          Artjoms Rimdjonoks
        5. Screenshot 2022-08-23 at 10.34.09.png
          836 kB
          Artjoms Rimdjonoks
        6. with_fix_1 _sec_timeout_every_100th_second.png
          956 kB
          Artjoms Rimdjonoks
        7. WITH_FIX_10_sec timeout every 100th second.png
          2.23 MB
          Artjoms Rimdjonoks
        8. ZBX-21466.png
          221 kB
          Mathieu M.
        9. ZBX-21466-1.png
          221 kB
          Mathieu M.

            arimdjonoks Artjoms Rimdjonoks
            skokhanovskiy Stepan Kokhanovskiy
            Team C
            Votes:
            2 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:
              Resolved: