[ZBX-10372] network interface statistics return 0 on Solaris 11 Created: 2016 Feb 10 Updated: 2017 May 30 Resolved: 2016 Mar 13 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 2.4.7 |
Fix Version/s: | 2.2.12rc1, 3.0.2rc1, 3.2.0alpha1 |
Type: | Incident report | Priority: | Major |
Reporter: | Dehaen Pierre | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | agent, items, solaris | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Solaris 11 |
Issue Links: |
|
Description |
In src/libs/zbxsysinfo/solaris/net.c a "kstat_lookup(kc, NULL, -1, (char *)name)" is done without specifying the module (NULL) nor the instance (-1). On Solaris 10 this was working because the module was unique (and was the interface name without the number, for instance "bge" for interface "bge0"): $ kstat -n bge0 -s class -p bge:0:bge0:class net $ kstat -n bge0 -s obytes64 -p bge:0:bge0:obytes64 1766201164 But on Solaris 11, with its completely new network stack, it does not work anymore because, for one interface, you have statistics in the "unix" module and in the "link" module, and useful statistics are coming from the link module only: $ kstat -n net0 -s class -p link:0:net0:class net unix:0:net0:class flow $ kstat -n net0 -s obytes64 -p link:0:net0:obytes64 319278589464 ...and we are not lucky, the loose kstat_lookup() returns the bad one, the "unix" one, which does not have the stats we need. So, for Solaris 11 and following only, the net.c line 39 should like like: if (NULL == (kp = kstat_lookup(kc, "link", 0, (char *)name))) I set the instance number to 0 because, on the global zone, you may get several net0 interfaces in the link kstats: for the global zone (the first instance to be created, so 0) and for some other zones. |
Comments |
Comment by Andris Zeila [ 2016 Feb 15 ] |
I'm wondering if using class name 'net' would not be better (at least compatible with older versions). However there doesn't seem to be an easy way of doing it. |
Comment by Dehaen Pierre [ 2016 Feb 15 ] |
Using the following ktest.c program you can get all instances: #include <stdio.h> #include <kstat.h> kstat_ctl_t *kc, *kco; kstat_t *kp; void main() { kc = kstat_open(); kco = kc; while(kc->kc_chain != NULL && ((kp = kstat_lookup(kc, NULL, -1, "net0")) != NULL)) { printf("module=%s, name=%s, instance=%d, class=%s\n", kp->ks_module, kp->ks_name, kp->ks_instance, kp->ks_class); kc->kc_chain = kp->ks_next; } kstat_close(kco); } $ gcc -o ktest -lkstat ktest.c $ ./ktest module=unix, name=net0, instance=0, class=flow module=link, name=net0, instance=0, class=net module=link, name=net0, instance=3, class=net module=link, name=net0, instance=1, class=net Well, this method is not documented in the man pages nor in the "Solaris Performance and Tools" book. Even /bin/kstat, which is a Perl script based on Sun::Solaris::Kstat which only implements the new() and update() methods. I think we should normally follow the kstat chain to find a statistic, unless we know the module-instance-name triplet for kstat_lookup(). So we could define a kstat_lookup_class() to supplement and replace kstat_lookup(). Proof of concept: #include <stdio.h> #include <kstat.h> #include <errno.h> kstat_t *kstat_lookup_class(kstat_ctl_t *kc, char *ks_module, int ks_instance, char *ks_name, char *ks_class) { kstat_t *kp; int i = 1024; while(--i >= 0) { if ((kp = kstat_lookup(kc, ks_module, ks_instance, ks_name)) == NULL) return kp; if (strcmp(kp->ks_class, ks_class) == 0) return kp; kc->kc_chain = kp->ks_next; if (kc->kc_chain == NULL) { errno = ENOENT; return NULL; } } errno = ENOENT; return NULL; } void main(int argc, char *argv[]) { kstat_ctl_t *kc; kstat_t *kp; char *iface="net0", *class="net"; if (argc == 3) { iface = argv[1]; class = argv[2]; } kc = kstat_open(); if ((kp = kstat_lookup_class(kc, NULL, -1, iface, class)) != NULL) { printf("module=%s, name=%s, instance=%d, class=%s\n", kp->ks_module, kp->ks_name, kp->ks_instance, kp->ks_class); } else { printf("interface=%s, class=%s: not found!\n", iface, class); } kstat_close(kc); } Result on Solaris 11: ./ktest2 net0 net module=link, name=net0, instance=0, class=net ./ktest2 net0 flow module=unix, name=net0, instance=0, class=flow Result on Solaris10: $ ./ktest2 nge0 net module=nge, name=nge0, instance=0, class=net $ ./ktest2 bge0 net module=bge, name=bge0, instance=0, class=net |
Comment by Andris Zeila [ 2016 Feb 16 ] |
Thanks for examples! I was not eager to manually iterate through kstat chain (though we are doing it for cpu stats). Currently I'm leaning to try getting the correct kstat module (link) as you described initially, and if that fails - assume this is not solaris 11+ and continue to use the original code. |
Comment by Andris Zeila [ 2016 Feb 19 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-10372 |
Comment by Andris Mednis [ 2016 Feb 26 ] |
(1) Apparently there is one common problem with get_kstat_named_field(), not related with this change. get_kstat_named_field():
Several functions in src/libs/zbxsysinfo/solaris/net.c call get_kstat_named_field() in the same manner. andris RESOLVED in r58849. |
Comment by Dehaen Pierre [ 2016 Feb 26 ] |
You are correct, probably are we most of the time lucky because what is freed by kstat_close() is the chain "snapshot", not the kstats themselves... at least for the statistics that did not disappear from the kernel since the close operation, and that is no so common as we are talking about network interfaces... The man page kstat_chain_update(3STAT) says:
Halting a zone of using dladm might remove netowrk interfaces... Of course the interface would have to be removed between the kstat_close() and statistic read, bu then trying to read a deleted kstat might cause some troubles. |
Comment by Andris Mednis [ 2016 Mar 02 ] |
(2) Proposed change (r58472) on Solaris 11 searches statistics in 'link' module as far as only 'netN' intefaces are involved. Once the loopback interface 'lo0' is queried which resides in its own 'lo' module then searching in 'link' module is turned off, even for 'netN' intefaces. andris RESOLVED in r58909. |
Comment by Andris Mednis [ 2016 Mar 02 ] |
Maybe these examples can help. Example with Solaris 11, interface 'net1' exists in global and non-global zones: $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep net1 module: link instance: 0 name: net1 class: net module: link instance: 1 name: net1 class: net module: net1 instance: 0 name: link class: net module: net1 instance: 1 name: link class: net module: unix instance: 0 name: testzone/net1 class: flow $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep lo0 module: lo instance: 0 name: lo0 class: net From test zone: $ kstat | awk '/^module:/ { module=$0;}; /^name:/ { print module,$0;}' | grep net1 module: link instance: 0 name: net1 class: net module: net1 instance: 1 name: link class: net module: unix instance: 0 name: net1 class: flow $ kstat | awk '/^module:/ { module=$0;}; /^name:/ { print module,$0;}' | grep lo0 module: lo instance: 0 name: lo0 class: net Solaris 10, one interface e1000g0 $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep e1000g0 module: e1000g instance: 0 name: e1000g0 class: net $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep lo0 module: lo instance: 0 name: lo0 class: net Solaris 8 global zone: $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep hme0 module: hme instance: 0 name: hme0 class: net $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep lo0 module: lo instance: 0 name: lo0 class: net kstat utility is a Perl script which uses 'sort()' inside. In a C program kstat elements can come in any order. |
Comment by Dehaen Pierre [ 2016 Mar 03 ] |
Only 'netN' interfaces? I have computers with other fancy names like xnf0 for instance when running in a Solaris VM on OVM... |
Comment by Andris Mednis [ 2016 Mar 03 ] |
'netN' was used only as example, it could be other interface names, as far as they are members of 'link' module. Issue (2) was about a problem that querying 'lo0' interface flips a flag as if there is no 'link' module anymore. |
Comment by Andris Mednis [ 2016 Mar 10 ] |
Solution back-ported to v.2.2 in development branch svn://svn.zabbix.com/branches/dev/ZBX-10372-22 . wiper reviewed |
Comment by Andris Mednis [ 2016 Mar 11 ] |
Fixed in versions:
|
Comment by Andris Mednis [ 2016 Mar 11 ] |
No changes to documentation |
Comment by Oleksii Zagorskyi [ 2016 Oct 05 ] |
It might cause regression (supposedly) in |