[ZBX-6045] proc.num intermittently slow on solaris Created: 2013 Jan 02  Updated: 2024 Apr 10  Resolved: 2020 Sep 09

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 2.0.4
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: Andrew Howell Assignee: Andris Mednis
Resolution: Won't fix Votes: 1
Labels: agent, items, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Solaris 10 on Sun Fire X4270


Issue Links:
Duplicate
is duplicated by ZBX-8838 Zabbix agent is too slow on Solaris h... Closed
Team: Team A
Sprint: Sprint 66 (Jul 2020), Sprint 67 (Aug 2020), Sprint 68 (Sep 2020)

 Description   

On several of our Solaris 10 systems the time it takes to get a proc.num value spikes to several seconds. Often taking so long the server gives an error

Zabbix agent item [proc.num[]] on host [xxxx] failed: first network error, wait for 15 seconds

Doing some testing with zabbix_get it usually takes under 5 ms, but regularly spikes to several seconds. These systems are roughly 90% CPU idle.



 Comments   
Comment by Andris Mednis [ 2020 Sep 09 ]

I tested on Solaris 10 SPARC server Sun-Fire-V215, Zabbix agent 5.0.4rc1 with item proc.num[sleep,andris,all,sleep] (to make it more work than a plain proc.num[]).

Number of processes proc.num[sleep,andris,all,sleep].s
57 0.0040
3057 0.141
9057 0.413
24057 1.103

Duration grows linearly with number of processes. With 24057 processes, the duration exceeds 1 second. If Zabbix server is configured with Timeout=1, then timeouts show up:

 32415:20200909:185112.350 Zabbix agent item "proc.num[sleep,andris,all,sleep]" on host "Solaris 10 SPARC" failed: first network error, wait for 15 seconds

On a loaded system it probably can reach default Timeout=3 parameter.

Algorithm used by proc.num[] on Solaris in Zabbix 2.0 and 5.0 is the same - traversing /proc file system and collecting data process-by-process.

I considered using kvm_getproc()/kvm_nextproc() as alternative. Maybe it is faster, but kvm_* functions are positioned as "old way", less portable between Solaris releases than /proc filesystem interface.

So, I do not see major problems with the current approach. Increasing Timeout parameter in Zabbix server/proxy configuration file could help.

Closing with WONT'FIX.

Generated at Sat Apr 05 03:11:16 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.