[ZBX-10372] network interface statistics return 0 on Solaris 11 Created: 2016 Feb 10  Updated: 2017 May 30  Resolved: 2016 Mar 13

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 2.4.7
Fix Version/s: 2.2.12rc1, 3.0.2rc1, 3.2.0alpha1

Type: Incident report Priority: Major
Reporter: Dehaen Pierre Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: agent, items, solaris
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Solaris 11


Issue Links:
Duplicate
is duplicated by ZBX-6046 net.if doesn't work on solaris 11 zones Closed

 Description   

In src/libs/zbxsysinfo/solaris/net.c a "kstat_lookup(kc, NULL, -1, (char *)name)" is done without specifying the module (NULL) nor the instance (-1).

On Solaris 10 this was working because the module was unique (and was the interface name without the number, for instance "bge" for interface "bge0"):

$ kstat -n bge0 -s class -p
bge:0:bge0:class        net
$ kstat -n bge0 -s obytes64 -p
bge:0:bge0:obytes64     1766201164

But on Solaris 11, with its completely new network stack, it does not work anymore because, for one interface, you have statistics in the "unix" module and in the "link" module, and useful statistics are coming from the link module only:

$ kstat -n net0 -s class -p
link:0:net0:class       net
unix:0:net0:class       flow
$ kstat -n net0 -s obytes64 -p
link:0:net0:obytes64    319278589464

...and we are not lucky, the loose kstat_lookup() returns the bad one, the "unix" one, which does not have the stats we need.

So, for Solaris 11 and following only, the net.c line 39 should like like:

  if (NULL == (kp = kstat_lookup(kc, "link", 0, (char *)name)))

I set the instance number to 0 because, on the global zone, you may get several net0 interfaces in the link kstats: for the global zone (the first instance to be created, so 0) and for some other zones.



 Comments   
Comment by Andris Zeila [ 2016 Feb 15 ]

I'm wondering if using class name 'net' would not be better (at least compatible with older versions). However there doesn't seem to be an easy way of doing it.

Comment by Dehaen Pierre [ 2016 Feb 15 ]

Using the following ktest.c program you can get all instances:

#include <stdio.h>
#include <kstat.h>

kstat_ctl_t     *kc, *kco;
kstat_t         *kp;

void main() {
  kc = kstat_open();
  kco = kc;
  while(kc->kc_chain != NULL && ((kp = kstat_lookup(kc, NULL, -1, "net0")) != NULL)) {
    printf("module=%s, name=%s, instance=%d, class=%s\n", kp->ks_module, kp->ks_name, kp->ks_instance, kp->ks_class);
    kc->kc_chain = kp->ks_next;
  }
  kstat_close(kco);
}
$ gcc -o ktest -lkstat ktest.c
$ ./ktest
module=unix, name=net0, instance=0, class=flow
module=link, name=net0, instance=0, class=net
module=link, name=net0, instance=3, class=net
module=link, name=net0, instance=1, class=net

Well, this method is not documented in the man pages nor in the "Solaris Performance and Tools" book. Even /bin/kstat, which is a Perl script based on Sun::Solaris::Kstat which only implements the new() and update() methods. I think we should normally follow the kstat chain to find a statistic, unless we know the module-instance-name triplet for kstat_lookup().

So we could define a kstat_lookup_class() to supplement and replace kstat_lookup(). Proof of concept:

#include <stdio.h>
#include <kstat.h>
#include <errno.h>

kstat_t *kstat_lookup_class(kstat_ctl_t *kc, char *ks_module, int ks_instance, char *ks_name, char *ks_class) {
  kstat_t     *kp;
  int         i = 1024;

  while(--i >= 0) {
    if ((kp = kstat_lookup(kc, ks_module, ks_instance, ks_name)) == NULL) return kp;
    if (strcmp(kp->ks_class, ks_class) == 0) return kp;
    kc->kc_chain = kp->ks_next;
    if (kc->kc_chain == NULL) { errno = ENOENT; return NULL; }
  }
  errno = ENOENT; return NULL;
}

void main(int argc, char *argv[]) {
  kstat_ctl_t  *kc;
  kstat_t      *kp;
  char         *iface="net0", *class="net";

  if (argc == 3) { iface = argv[1]; class = argv[2]; }
  kc = kstat_open();
  if ((kp = kstat_lookup_class(kc, NULL, -1, iface, class)) != NULL) {
    printf("module=%s, name=%s, instance=%d, class=%s\n", kp->ks_module, kp->ks_name, kp->ks_instance, kp->ks_class);
  } else {
    printf("interface=%s, class=%s: not found!\n", iface, class);
  }
  kstat_close(kc);
}

Result on Solaris 11:

./ktest2 net0 net
module=link, name=net0, instance=0, class=net
./ktest2 net0 flow
module=unix, name=net0, instance=0, class=flow

Result on Solaris10:

$ ./ktest2 nge0 net
module=nge, name=nge0, instance=0, class=net
$ ./ktest2 bge0 net
module=bge, name=bge0, instance=0, class=net
Comment by Andris Zeila [ 2016 Feb 16 ]

Thanks for examples! I was not eager to manually iterate through kstat chain (though we are doing it for cpu stats). Currently I'm leaning to try getting the correct kstat module (link) as you described initially, and if that fails - assume this is not solaris 11+ and continue to use the original code.

Comment by Andris Zeila [ 2016 Feb 19 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-10372

Comment by Andris Mednis [ 2016 Feb 26 ]

(1) Apparently there is one common problem with get_kstat_named_field(), not related with this change.

get_kstat_named_field():

  • calls kstat_open(),
  • searches the required data element, gets a pointer to it.
  • calls kstat_close(), "man kstat_open" says: "The kstat_close() function frees all resources that were associated with kc.".
  • returns a pointer to the required data element. After kstat_close() this pointer is stale.

Several functions in src/libs/zbxsysinfo/solaris/net.c call get_kstat_named_field() in the same manner.

andris RESOLVED in r58849.

Comment by Dehaen Pierre [ 2016 Feb 26 ]

You are correct, probably are we most of the time lucky because what is freed by kstat_close() is the chain "snapshot", not the kstats themselves... at least for the statistics that did not disappear from the kernel since the close operation, and that is no so common as we are talking about network interfaces...

The man page kstat_chain_update(3STAT) says:

During normal operation, the kernel creates new kstats and
delete old ones as various device instances are added and
removed, thereby causing the user's copy of the kstat chain
to become out of date.

Halting a zone of using dladm might remove netowrk interfaces... Of course the interface would have to be removed between the kstat_close() and statistic read, bu then trying to read a deleted kstat might cause some troubles.

Comment by Andris Mednis [ 2016 Mar 02 ]

(2) Proposed change (r58472) on Solaris 11 searches statistics in 'link' module as far as only 'netN' intefaces are involved. Once the loopback interface 'lo0' is queried which resides in its own 'lo' module then searching in 'link' module is turned off, even for 'netN' intefaces.

andris RESOLVED in r58909.

Comment by Andris Mednis [ 2016 Mar 02 ]

Maybe these examples can help.

Example with Solaris 11, interface 'net1' exists in global and non-global zones:
From global zone:
--------------------------

$ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep net1
module: link                            instance: 0      name:   net1                            class:    net
module: link                            instance: 1      name:   net1                            class:    net
module: net1                            instance: 0      name:   link                            class:    net
module: net1                            instance: 1      name:   link                            class:    net
module: unix                            instance: 0      name:   testzone/net1                   class:    flow

$ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep lo0
module: lo                              instance: 0      name:   lo0                             class:    net

From test zone:
----------------------

$ kstat | awk '/^module:/ { module=$0;}; /^name:/ { print module,$0;}' | grep net1
module: link                            instance: 0      name:   net1                            class:    net
module: net1                            instance: 1      name:   link                            class:    net
module: unix                            instance: 0      name:   net1                            class:    flow

$ kstat | awk '/^module:/ { module=$0;}; /^name:/ { print module,$0;}' | grep lo0
module: lo                              instance: 0      name:   lo0                             class:    net

Solaris 10, one interface e1000g0
From global zone:
--------------------------

$ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep e1000g0
module: e1000g                          instance: 0      name:   e1000g0                         class:    net

$ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep lo0    
module: lo                              instance: 0      name:   lo0                             class:    net

Solaris 8 global zone:
--------------------------

$ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep hme0
module: hme                             instance: 0      name:   hme0                            class:    net     

$ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep lo0 
module: lo                              instance: 0      name:   lo0                             class:    net

kstat utility is a Perl script which uses 'sort()' inside. In a C program kstat elements can come in any order.

Comment by Dehaen Pierre [ 2016 Mar 03 ]

Only 'netN' interfaces? I have computers with other fancy names like xnf0 for instance when running in a Solaris VM on OVM...

Comment by Andris Mednis [ 2016 Mar 03 ]

'netN' was used only as example, it could be other interface names, as far as they are members of 'link' module. Issue (2) was about a problem that querying 'lo0' interface flips a flag as if there is no 'link' module anymore.

Comment by Andris Mednis [ 2016 Mar 10 ]

Solution back-ported to v.2.2 in development branch svn://svn.zabbix.com/branches/dev/ZBX-10372-22 .

wiper reviewed

Comment by Andris Mednis [ 2016 Mar 11 ]

Fixed in versions:

  • pre-2.2.12rc1 r58954,
  • pre-3.0.2rc1 r58957,
  • pre-3.1.0 (trunk) r58959.
Comment by Andris Mednis [ 2016 Mar 11 ]

No changes to documentation

Comment by Oleksii Zagorskyi [ 2016 Oct 05 ]

It might cause regression (supposedly) in ZBX-11292

Generated at Thu Apr 18 18:49:47 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.