Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-14792

Zabbix-proxy leaking memory

XMLWordPrintable

    • Icon: Incident report Incident report
    • Resolution: Cannot Reproduce
    • Icon: Critical Critical
    • None
    • 4.0.0beta1
    • Proxy (P)
    • None
    • Ubuntu 18.04
      Tried both SQLite3 and MariaDB10.1 - same issue

      There are possibly two unrelated issues with memory leaks!!!

      Steps to reproduce:

      1. Update config for the proxy
        1. VMwareCacheSize=2G
          CacheSize=256M
          HistoryCacheSize=256M
          HistoryIndexCacheSize=256M
        2. StartVMwareCollectors=32
          VMwareFrequency=30
          VMwarePerfFrequency=30
      2. Enable VMware monitoring via vCenter v5.5.0 (37 hosts)

      Result:
      Memory starts leaking slowly (the increase is due to a single VMware collector). After 5 hours of "slow leakage", memory instantaneously becomes all used up (free memory=0 and all swap is used) - possibly for a different reason (see below)

      As a result, in 40 minutes one zabbix-proxy subprocess dies (due to OOM), see below 

      Aug 30 12:48:41 aumelzbxproxy01 kernel: [20245.087653] [ 2020] 111 2020 771956 164 339968 1665 0 zabbix_proxy
      Aug 30 12:48:41 aumelzbxproxy01 kernel: [20245.087655] Out of memory: Kill process 1933 (zabbix_proxy) score 418 or sacrifice child
      Aug 30 12:48:41 aumelzbxproxy01 kernel: [20245.090069] Killed process 1933 (zabbix_proxy) total-vm:4844788kB, anon-rss:1754648kB, file-rss:1728kB, shmem-rss:1015156kB
      Aug 30 12:48:41 aumelzbxproxy01 kernel: [20245.210083] oom_reaper: reaped process 1933 (zabbix_proxy), now anon-rss:0kB, file-rss:0kB, shmem-rss:1015156kB
      

      The process that is killed by OOM is the following one

      zabbix 1933 1423 0 07:11 ? 00:00:00 /usr/sbin/zabbix_proxy: poller #24 [got 24 values in 1.331793 sec, getting values]
      

      And after it dies, "syncing history data" starts and takes 4 hours, during which zabbix-proxy does NOT send any data to the server. Then zabbix-proxy restarts (see in logs, by systemd)

      1908:20180830:124831.130 slow query: 4.665959 sec, "select taskid,type,clock,ttl from task where status=1 and type in (2, 6) order by taskid"
       1951:20180830:124831.813 slow query: 9.917240 sec, "commit;"
       1946:20180830:124831.868 slow query: 10.056336 sec, "commit;"
       1973:20180830:124831.870 slow query: 10.056378 sec, "commit;"
       1857:20180830:124831.870 slow query: 10.054214 sec, "commit;"
       1942:20180830:124832.230 slow query: 10.579126 sec, "commit;"
       1942:20180830:124832.381 resuming Zabbix agent checks on host "AUSYDADMIN01": connection restored
       1857:20180830:124837.434 slow query: 5.529120 sec, "begin;"
       1856:20180830:124838.693 slow query: 6.580776 sec, "begin;"
       1952:20180830:124838.750 slow query: 6.372370 sec, "begin;"
       1978:20180830:124840.334 slow query: 19.614870 sec, "insert into proxy_autoreg_host (clock,host,listen_ip,listen_dns,listen_port,host_metadata) values (1535633300,'AUTSTMYDOTPDV01','10.60.16.32','autstmydotpd
      v01.myobtest.net',10050,'Windows AUTSTMYDOTPDV01 6.1.7601 Microsoft Windows Server 2008 R2 Standard Service Pack 1 x64')"
       1970:20180830:124842.059 slow query: 25.347796 sec, "update hosts set errors_from=1535633296,disable_until=1535633299 where hostid=10445"
       1423:20180830:124842.065 One child process died (PID:1933,exitcode/signal:9). Exiting ...
      zabbix_proxy [1423]: Error waiting for process with PID 1933: [10] No child processes
       1423:20180830:124844.659 syncing history data...
       1423:20180830:124854.005 syncing history data... 0.046460%
       1423:20180830:124904.005 syncing history data... 0.093412%
      .....
      1423:20180830:170724.004 syncing history data... 99.933678%
       1423:20180830:170734.003 syncing history data... 99.999017%
       1423:20180830:170734.521 syncing history data done
       1423:20180830:170734.557 Zabbix Proxy stopped. Zabbix 4.0.0beta1 (revision 84219).
       6877:20180830:170745.113 Starting Zabbix Proxy (active) [aumelzbxproxy01]. Zabbix 4.0.0beta1 (revision 84219).
       6877:20180830:170745.123 **** Enabled features ****
       6877:20180830:170745.123 SNMP monitoring: YES
      

      See screenshot... Time is +10GMT
      See log file... Time is UTC

      Expected:
      1) No "slow memory leak" and no "instantaneous memory hike, bringing it to 0".

      2) When a subprocess dies, it should not take zabbix-proxy 4 hours to do "syncing history data", during which nothing is sent to zabbix-server

        1. aumelzbxproxy01-proclist-zabbix-proxy.txt
          43 kB
        2. aumelzbxvmware01-proclist-zabbix-proxy.txt
          21 kB
        3. OOM-in-kern.log
          4 kB
        4. OOM-in-syslog.log
          37 kB
        5. zabbix_proxy.conf
          18 kB
        6. zabbix_server.conf
          17 kB
        7. zabbix cache free.png
          zabbix cache free.png
          41 kB
        8. zabbix memory.png
          zabbix memory.png
          60 kB
        9. zabbix process busy.png
          zabbix process busy.png
          146 kB
        10. zabbix-proxy-process-full-names.txt
          30 kB
        11. zabbix since yesterday.png
          zabbix since yesterday.png
          197 kB
        12. zbx_reg_proc_mem4.sh
          0.7 kB

            Unassigned Unassigned
            ilyakruchinin Ilya Kruchinin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: