[ZBX-6045] proc.num intermittently slow on solaris Created: 2013 Jan 02 Updated: 2024 Apr 10 Resolved: 2020 Sep 09 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 2.0.4 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Trivial |
Reporter: | Andrew Howell | Assignee: | Andris Mednis |
Resolution: | Won't fix | Votes: | 1 |
Labels: | agent, items, performance | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Solaris 10 on Sun Fire X4270 |
Issue Links: |
|
||||||||
Team: | |||||||||
Sprint: | Sprint 66 (Jul 2020), Sprint 67 (Aug 2020), Sprint 68 (Sep 2020) |
Description |
On several of our Solaris 10 systems the time it takes to get a proc.num value spikes to several seconds. Often taking so long the server gives an error Zabbix agent item [proc.num[]] on host [xxxx] failed: first network error, wait for 15 seconds Doing some testing with zabbix_get it usually takes under 5 ms, but regularly spikes to several seconds. These systems are roughly 90% CPU idle. |
Comments |
Comment by Andris Mednis [ 2020 Sep 09 ] | ||||||||||
I tested on Solaris 10 SPARC server Sun-Fire-V215, Zabbix agent 5.0.4rc1 with item proc.num[sleep,andris,all,sleep] (to make it more work than a plain proc.num[]).
Duration grows linearly with number of processes. With 24057 processes, the duration exceeds 1 second. If Zabbix server is configured with Timeout=1, then timeouts show up: 32415:20200909:185112.350 Zabbix agent item "proc.num[sleep,andris,all,sleep]" on host "Solaris 10 SPARC" failed: first network error, wait for 15 seconds On a loaded system it probably can reach default Timeout=3 parameter. Algorithm used by proc.num[] on Solaris in Zabbix 2.0 and 5.0 is the same - traversing /proc file system and collecting data process-by-process. I considered using kvm_getproc()/kvm_nextproc() as alternative. Maybe it is faster, but kvm_* functions are positioned as "old way", less portable between Solaris releases than /proc filesystem interface. So, I do not see major problems with the current approach. Increasing Timeout parameter in Zabbix server/proxy configuration file could help. Closing with WONT'FIX. |