-
New Feature Request
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
We have many very high performance server that run out of interrupt cycles on the single CPU that handles interrupts for a NIC card. This does not show up on normal CPU stats which average all CPUs together, so on an 8 core machine we might see 12.5% CPU, but CPU1 is 100% on SI/Soft Interrupts and the network dies.
I see we now can monitor Si in addition to user, idle, nice, system, but more importantly is to monitor max of all CPUs, so if a CPU has high SI above 80%, we can trigger on that.
Thus, extend current CPU system.cpu.util[<cpu>,<type>,<mode>] with a CPU value of 'max' to get max value of any CPUs - from that I can get what I need.
This type of thing is a critical feature for high-performance servers, including on VMs/clouds which much less efficient NIC interrupt handling, often dying at 10,000 packets/second or less.
For now maybe we'll system.cpu.util[,softirq,avg1] * system.cpu.num[], and also track total interrupts but no way to that (mpstat -I SUM 1)