XMLWordPrintable

    • Icon: Problem report Problem report
    • Resolution: Duplicate
    • Icon: Trivial Trivial
    • None
    • 5.4.2
    • Agent2 plugin (N)

      proc.num returns the wrong value.

       

      this happens for multiple proc.num checks in our environment. 

       

      service uptime = 4 months 

       

       

      **i believe this issue is related to this line of code: here

      for entries, err = f.Readdir(1); err != io.EOF; entries, err = f.Readdir(1) {
      

       

      working strace: contains the proc or the process being opened. 

       

      newfstatat(AT_FDCWD, "/proc/31745", {st_mode=S_IFDIR|0555, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
      8271  <... nanosleep resumed>NULL)      = 0
      8274  openat(AT_FDCWD, "/proc/31745/stat", O_RDONLY) = 5
      8271  futex(0xc000478148, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
      8274  read(5,  <unfinished ...>
      8286  <... futex resumed>)              = 0
      8274  <... read resumed>"31745 (sssd) S 1 31745 31745 0 -1 1077944576 1908 0 29 0 468 1189 0 0 20 0 1 0 1782782995 275206144 947 18446744073709551615 94783524737024 94783524813844 140724449739856 140724449738568 140195446729904 0 4224 1052672 84483 18446744072478930117 0 0 17 2 0 0 3 0 0 94783526912744 94783526916512 94783536095232 140724449742658 140724449742691 140724449742691 140724449742825 0\n", 2048) = 375
      8271  <... futex resumed>)              = 1
      8286  futex(0xc000478148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
      8274  close(5 <unfinished ...>
      8286  <... futex resumed>)              = -1 EAGAIN (Resource temporarily unavailable)
      8274  <... close resumed>)              = 0
      8271  futex(0xc000478148, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
      8286  futex(0xc000478148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
      8274  openat(AT_FDCWD, "/proc/31745/cmdline", O_RDONLY) = 5
      

       

       not working strace: last newfstatat systemcall looking at the /proc files

      8441  newfstatat(AT_FDCWD, "/proc/8454", 0xc00038b628, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
      

       

       
      This server has a lot of processes running (>700)

      I think between the time of assigning the variable here

      f, err := os.Open("/proc")
      

      and the for loop running the process (8441) ends causing the read error which stops the loop?

       

        1. image-2021-07-16-15-33-40-021.png
          image-2021-07-16-15-33-40-021.png
          47 kB
        2. image-2021-07-16-15-33-56-662.png
          image-2021-07-16-15-33-56-662.png
          55 kB
        3. image-2021-07-16-15-44-03-155.png
          image-2021-07-16-15-44-03-155.png
          141 kB
        4. image-2021-07-16-15-44-42-037.png
          image-2021-07-16-15-44-42-037.png
          16 kB
        5. image-2021-08-02-13-58-00-439.png
          image-2021-08-02-13-58-00-439.png
          107 kB
        6. image-2021-08-02-13-58-59-747.png
          image-2021-08-02-13-58-59-747.png
          62 kB
        7. proc_counts.py
          1 kB
        8. image-2021-08-06-10-03-27-398.png
          image-2021-08-06-10-03-27-398.png
          163 kB
        9. image-2021-08-06-10-04-06-525.png
          image-2021-08-06-10-04-06-525.png
          92 kB
        10. test_agent.sh
          0.7 kB

            zabbix.dev Zabbix Development Team
            shaned Shane Davidson
            Votes:
            7 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: