-
Type:
Problem report
-
Resolution: Unresolved
-
Priority:
Major
-
Affects Version/s: None
-
Component/s: Agent (G), Agent2 (G)
-
None
-
Sprint candidates
I am reporting a persistent high CPU usage issue (100%) on a host with a massive number of processes (54,000+), including many in a <defunct> (zombie) state. This issue has been observed in both Zabbix Agent 7.0.8 (C version) and Zabbix Agent 2, even though the Docker monitoring plugin is NOT being used.
Environment:
- Zabbix Version: 7.0.8
- Host Context: High-density container environment (Docker) creating a large process table, though Zabbix is only configured to monitor specific system processes via Discovery (LLD).
- Status: Over 54,000 PIDs active in the OS.
Technical Findings: The agent is configured to discover and monitor only a few specific processes. However, the logs indicate that the internal collector still spends excessive resources scanning the entire /proc directory to update CPU statistics and process counts.
In both agents, we see the collector getting "stuck" or taking a significant amount of time to complete zbx_proc_get_processes():
716967:20260128:113824.772 End of zbx_proc_get_processes(): SUCCEED, processes:54717 716967:20260128:113827.204 End of zbx_proc_get_processes(): SUCCEED, processes:54717 716967:20260128:113829.545 End of zbx_proc_get_processes(): SUCCEED, processes:54720
Key Points for Support:
- Parity of Issue: The problem is independent of the agent version (happens in both C and Go implementations).
- Collection Overhead: Even when monitoring only a few specific processes, the agent's core seems to iterate through the entire process table (54k+ entries), causing the CPU to hit 100% due to the overhead of reading /proc at scale.
- Impact of Zombies: There is a high count of <defunct> processes. We suspect the agent might be struggling to handle or skip these entries during the scan, leading to format_metric_results():FAIL errors.
- No Docker Plugin: To clarify, we are not using the Docker template/plugin. We are using standard process monitoring (proc.num, proc.get, or LLD for processes).