ZABBIX BUGS AND ISSUES
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-10372

network interface statistics return 0 on Solaris 11

    Details

      Description

      In src/libs/zbxsysinfo/solaris/net.c a "kstat_lookup(kc, NULL, -1, (char *)name)" is done without specifying the module (NULL) nor the instance (-1).

      On Solaris 10 this was working because the module was unique (and was the interface name without the number, for instance "bge" for interface "bge0"):

      $ kstat -n bge0 -s class -p
      bge:0:bge0:class        net
      $ kstat -n bge0 -s obytes64 -p
      bge:0:bge0:obytes64     1766201164
      

      But on Solaris 11, with its completely new network stack, it does not work anymore because, for one interface, you have statistics in the "unix" module and in the "link" module, and useful statistics are coming from the link module only:

      $ kstat -n net0 -s class -p
      link:0:net0:class       net
      unix:0:net0:class       flow
      $ kstat -n net0 -s obytes64 -p
      link:0:net0:obytes64    319278589464
      

      ...and we are not lucky, the loose kstat_lookup() returns the bad one, the "unix" one, which does not have the stats we need.

      So, for Solaris 11 and following only, the net.c line 39 should like like:

        if (NULL == (kp = kstat_lookup(kc, "link", 0, (char *)name)))
      

      I set the instance number to 0 because, on the global zone, you may get several net0 interfaces in the link kstats: for the global zone (the first instance to be created, so 0) and for some other zones.

        Issue Links

          Activity

          Hide
          Andris Zeila added a comment -

          I'm wondering if using class name 'net' would not be better (at least compatible with older versions). However there doesn't seem to be an easy way of doing it.

          Show
          Andris Zeila added a comment - I'm wondering if using class name 'net' would not be better (at least compatible with older versions). However there doesn't seem to be an easy way of doing it.
          Hide
          Dehaen Pierre added a comment - - edited

          Using the following ktest.c program you can get all instances:

          #include <stdio.h>
          #include <kstat.h>
          
          kstat_ctl_t     *kc, *kco;
          kstat_t         *kp;
          
          void main() {
            kc = kstat_open();
            kco = kc;
            while(kc->kc_chain != NULL && ((kp = kstat_lookup(kc, NULL, -1, "net0")) != NULL)) {
              printf("module=%s, name=%s, instance=%d, class=%s\n", kp->ks_module, kp->ks_name, kp->ks_instance, kp->ks_class);
              kc->kc_chain = kp->ks_next;
            }
            kstat_close(kco);
          }
          
          $ gcc -o ktest -lkstat ktest.c
          $ ./ktest
          module=unix, name=net0, instance=0, class=flow
          module=link, name=net0, instance=0, class=net
          module=link, name=net0, instance=3, class=net
          module=link, name=net0, instance=1, class=net
          

          Well, this method is not documented in the man pages nor in the "Solaris Performance and Tools" book. Even /bin/kstat, which is a Perl script based on Sun::Solaris::Kstat which only implements the new() and update() methods. I think we should normally follow the kstat chain to find a statistic, unless we know the module-instance-name triplet for kstat_lookup().

          So we could define a kstat_lookup_class() to supplement and replace kstat_lookup(). Proof of concept:

          #include <stdio.h>
          #include <kstat.h>
          #include <errno.h>
          
          kstat_t *kstat_lookup_class(kstat_ctl_t *kc, char *ks_module, int ks_instance, char *ks_name, char *ks_class) {
            kstat_t     *kp;
            int         i = 1024;
          
            while(--i >= 0) {
              if ((kp = kstat_lookup(kc, ks_module, ks_instance, ks_name)) == NULL) return kp;
              if (strcmp(kp->ks_class, ks_class) == 0) return kp;
              kc->kc_chain = kp->ks_next;
              if (kc->kc_chain == NULL) { errno = ENOENT; return NULL; }
            }
            errno = ENOENT; return NULL;
          }
          
          void main(int argc, char *argv[]) {
            kstat_ctl_t  *kc;
            kstat_t      *kp;
            char         *iface="net0", *class="net";
          
            if (argc == 3) { iface = argv[1]; class = argv[2]; }
            kc = kstat_open();
            if ((kp = kstat_lookup_class(kc, NULL, -1, iface, class)) != NULL) {
              printf("module=%s, name=%s, instance=%d, class=%s\n", kp->ks_module, kp->ks_name, kp->ks_instance, kp->ks_class);
            } else {
              printf("interface=%s, class=%s: not found!\n", iface, class);
            }
            kstat_close(kc);
          }
          

          Result on Solaris 11:

          ./ktest2 net0 net
          module=link, name=net0, instance=0, class=net
          ./ktest2 net0 flow
          module=unix, name=net0, instance=0, class=flow
          

          Result on Solaris10:

          $ ./ktest2 nge0 net
          module=nge, name=nge0, instance=0, class=net
          $ ./ktest2 bge0 net
          module=bge, name=bge0, instance=0, class=net
          
          Show
          Dehaen Pierre added a comment - - edited Using the following ktest.c program you can get all instances: #include <stdio.h> #include <kstat.h> kstat_ctl_t *kc, *kco; kstat_t *kp; void main() { kc = kstat_open(); kco = kc; while(kc->kc_chain != NULL && ((kp = kstat_lookup(kc, NULL, -1, "net0")) != NULL)) { printf("module=%s, name=%s, instance=%d, class=%s\n", kp->ks_module, kp->ks_name, kp->ks_instance, kp->ks_class); kc->kc_chain = kp->ks_next; } kstat_close(kco); } $ gcc -o ktest -lkstat ktest.c $ ./ktest module=unix, name=net0, instance=0, class=flow module=link, name=net0, instance=0, class=net module=link, name=net0, instance=3, class=net module=link, name=net0, instance=1, class=net Well, this method is not documented in the man pages nor in the "Solaris Performance and Tools" book. Even /bin/kstat, which is a Perl script based on Sun::Solaris::Kstat which only implements the new() and update() methods. I think we should normally follow the kstat chain to find a statistic, unless we know the module-instance-name triplet for kstat_lookup(). So we could define a kstat_lookup_class() to supplement and replace kstat_lookup(). Proof of concept: #include <stdio.h> #include <kstat.h> #include <errno.h> kstat_t *kstat_lookup_class(kstat_ctl_t *kc, char *ks_module, int ks_instance, char *ks_name, char *ks_class) { kstat_t *kp; int i = 1024; while(--i >= 0) { if ((kp = kstat_lookup(kc, ks_module, ks_instance, ks_name)) == NULL) return kp; if (strcmp(kp->ks_class, ks_class) == 0) return kp; kc->kc_chain = kp->ks_next; if (kc->kc_chain == NULL) { errno = ENOENT; return NULL; } } errno = ENOENT; return NULL; } void main(int argc, char *argv[]) { kstat_ctl_t *kc; kstat_t *kp; char *iface="net0", *class="net"; if (argc == 3) { iface = argv[1]; class = argv[2]; } kc = kstat_open(); if ((kp = kstat_lookup_class(kc, NULL, -1, iface, class)) != NULL) { printf("module=%s, name=%s, instance=%d, class=%s\n", kp->ks_module, kp->ks_name, kp->ks_instance, kp->ks_class); } else { printf("interface=%s, class=%s: not found!\n", iface, class); } kstat_close(kc); } Result on Solaris 11: ./ktest2 net0 net module=link, name=net0, instance=0, class=net ./ktest2 net0 flow module=unix, name=net0, instance=0, class=flow Result on Solaris10: $ ./ktest2 nge0 net module=nge, name=nge0, instance=0, class=net $ ./ktest2 bge0 net module=bge, name=bge0, instance=0, class=net
          Hide
          Andris Zeila added a comment -

          Thanks for examples! I was not eager to manually iterate through kstat chain (though we are doing it for cpu stats). Currently I'm leaning to try getting the correct kstat module (link) as you described initially, and if that fails - assume this is not solaris 11+ and continue to use the original code.

          Show
          Andris Zeila added a comment - Thanks for examples! I was not eager to manually iterate through kstat chain (though we are doing it for cpu stats). Currently I'm leaning to try getting the correct kstat module (link) as you described initially, and if that fails - assume this is not solaris 11+ and continue to use the original code.
          Hide
          Andris Zeila added a comment -

          Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-10372

          Show
          Andris Zeila added a comment - Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-10372
          Hide
          Andris Mednis added a comment - - edited

          (1) Apparently there is one common problem with get_kstat_named_field(), not related with this change.

          get_kstat_named_field():

          • calls kstat_open(),
          • searches the required data element, gets a pointer to it.
          • calls kstat_close(), "man kstat_open" says: "The kstat_close() function frees all resources that were associated with kc.".
          • returns a pointer to the required data element. After kstat_close() this pointer is stale.

          Several functions in src/libs/zbxsysinfo/solaris/net.c call get_kstat_named_field() in the same manner.

          Andris Mednis RESOLVED in r58849.

          Show
          Andris Mednis added a comment - - edited (1) Apparently there is one common problem with get_kstat_named_field(), not related with this change. get_kstat_named_field(): calls kstat_open(), searches the required data element, gets a pointer to it. calls kstat_close(), "man kstat_open" says: "The kstat_close() function frees all resources that were associated with kc.". returns a pointer to the required data element. After kstat_close() this pointer is stale. Several functions in src/libs/zbxsysinfo/solaris/net.c call get_kstat_named_field() in the same manner. Andris Mednis RESOLVED in r58849.
          Hide
          Dehaen Pierre added a comment -

          You are correct, probably are we most of the time lucky because what is freed by kstat_close() is the chain "snapshot", not the kstats themselves... at least for the statistics that did not disappear from the kernel since the close operation, and that is no so common as we are talking about network interfaces...

          The man page kstat_chain_update(3STAT) says:

          During normal operation, the kernel creates new kstats and
          delete old ones as various device instances are added and
          removed, thereby causing the user's copy of the kstat chain
          to become out of date.

          Halting a zone of using dladm might remove netowrk interfaces... Of course the interface would have to be removed between the kstat_close() and statistic read, bu then trying to read a deleted kstat might cause some troubles.

          Show
          Dehaen Pierre added a comment - You are correct, probably are we most of the time lucky because what is freed by kstat_close() is the chain "snapshot", not the kstats themselves... at least for the statistics that did not disappear from the kernel since the close operation, and that is no so common as we are talking about network interfaces... The man page kstat_chain_update(3STAT) says: During normal operation, the kernel creates new kstats and delete old ones as various device instances are added and removed, thereby causing the user's copy of the kstat chain to become out of date. Halting a zone of using dladm might remove netowrk interfaces... Of course the interface would have to be removed between the kstat_close() and statistic read, bu then trying to read a deleted kstat might cause some troubles.
          Hide
          Andris Mednis added a comment - - edited

          (2) Proposed change (r58472) on Solaris 11 searches statistics in 'link' module as far as only 'netN' intefaces are involved. Once the loopback interface 'lo0' is queried which resides in its own 'lo' module then searching in 'link' module is turned off, even for 'netN' intefaces.

          Andris Mednis RESOLVED in r58909.

          Show
          Andris Mednis added a comment - - edited (2) Proposed change (r58472) on Solaris 11 searches statistics in 'link' module as far as only 'netN' intefaces are involved. Once the loopback interface 'lo0' is queried which resides in its own 'lo' module then searching in 'link' module is turned off, even for 'netN' intefaces. Andris Mednis RESOLVED in r58909.
          Hide
          Andris Mednis added a comment - - edited

          Maybe these examples can help.

          Example with Solaris 11, interface 'net1' exists in global and non-global zones:
          From global zone:
          --------------------------

          $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep net1
          module: link                            instance: 0      name:   net1                            class:    net
          module: link                            instance: 1      name:   net1                            class:    net
          module: net1                            instance: 0      name:   link                            class:    net
          module: net1                            instance: 1      name:   link                            class:    net
          module: unix                            instance: 0      name:   testzone/net1                   class:    flow
          
          $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep lo0
          module: lo                              instance: 0      name:   lo0                             class:    net
          

          From test zone:
          ----------------------

          $ kstat | awk '/^module:/ { module=$0;}; /^name:/ { print module,$0;}' | grep net1
          module: link                            instance: 0      name:   net1                            class:    net
          module: net1                            instance: 1      name:   link                            class:    net
          module: unix                            instance: 0      name:   net1                            class:    flow
          
          $ kstat | awk '/^module:/ { module=$0;}; /^name:/ { print module,$0;}' | grep lo0
          module: lo                              instance: 0      name:   lo0                             class:    net
          

          Solaris 10, one interface e1000g0
          From global zone:
          --------------------------

          $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep e1000g0
          module: e1000g                          instance: 0      name:   e1000g0                         class:    net
          
          $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep lo0    
          module: lo                              instance: 0      name:   lo0                             class:    net
          

          Solaris 8 global zone:
          --------------------------

          $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep hme0
          module: hme                             instance: 0      name:   hme0                            class:    net     
          
          $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep lo0 
          module: lo                              instance: 0      name:   lo0                             class:    net
          

          kstat utility is a Perl script which uses 'sort()' inside. In a C program kstat elements can come in any order.

          Show
          Andris Mednis added a comment - - edited Maybe these examples can help. Example with Solaris 11, interface 'net1' exists in global and non-global zones: From global zone: -------------------------- $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep net1 module: link instance: 0 name: net1 class: net module: link instance: 1 name: net1 class: net module: net1 instance: 0 name: link class: net module: net1 instance: 1 name: link class: net module: unix instance: 0 name: testzone/net1 class: flow $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep lo0 module: lo instance: 0 name: lo0 class: net From test zone: ---------------------- $ kstat | awk '/^module:/ { module=$0;}; /^name:/ { print module,$0;}' | grep net1 module: link instance: 0 name: net1 class: net module: net1 instance: 1 name: link class: net module: unix instance: 0 name: net1 class: flow $ kstat | awk '/^module:/ { module=$0;}; /^name:/ { print module,$0;}' | grep lo0 module: lo instance: 0 name: lo0 class: net Solaris 10, one interface e1000g0 From global zone: -------------------------- $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep e1000g0 module: e1000g instance: 0 name: e1000g0 class: net $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep lo0 module: lo instance: 0 name: lo0 class: net Solaris 8 global zone: -------------------------- $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep hme0 module: hme instance: 0 name: hme0 class: net $ kstat | awk '/^module:/ \{ module=$0;\}; /^name:/ \{ print module,$0;\}' | grep lo0 module: lo instance: 0 name: lo0 class: net kstat utility is a Perl script which uses 'sort()' inside. In a C program kstat elements can come in any order.
          Hide
          Dehaen Pierre added a comment -

          Only 'netN' interfaces? I have computers with other fancy names like xnf0 for instance when running in a Solaris VM on OVM...

          Show
          Dehaen Pierre added a comment - Only 'netN' interfaces? I have computers with other fancy names like xnf0 for instance when running in a Solaris VM on OVM...
          Hide
          Andris Mednis added a comment - - edited

          'netN' was used only as example, it could be other interface names, as far as they are members of 'link' module. Issue (2) was about a problem that querying 'lo0' interface flips a flag as if there is no 'link' module anymore.

          Show
          Andris Mednis added a comment - - edited 'netN' was used only as example, it could be other interface names, as far as they are members of 'link' module. Issue (2) was about a problem that querying 'lo0' interface flips a flag as if there is no 'link' module anymore.
          Hide
          Andris Mednis added a comment - - edited

          Solution back-ported to v.2.2 in development branch svn://svn.zabbix.com/branches/dev/ZBX-10372-22 .

          Andris Zeila reviewed

          Show
          Andris Mednis added a comment - - edited Solution back-ported to v.2.2 in development branch svn://svn.zabbix.com/branches/dev/ZBX-10372-22 . Andris Zeila reviewed
          Hide
          Andris Mednis added a comment -

          Fixed in versions:

          • pre-2.2.12rc1 r58954,
          • pre-3.0.2rc1 r58957,
          • pre-3.1.0 (trunk) r58959.
          Show
          Andris Mednis added a comment - Fixed in versions: pre-2.2.12rc1 r58954, pre-3.0.2rc1 r58957, pre-3.1.0 (trunk) r58959.
          Hide
          Andris Mednis added a comment -

          No changes to documentation

          Show
          Andris Mednis added a comment - No changes to documentation
          Hide
          Oleksiy Zagorskyi added a comment -

          It might cause regression (supposedly) in ZBX-11292

          Show
          Oleksiy Zagorskyi added a comment - It might cause regression (supposedly) in ZBX-11292

            People

            • Assignee:
              Unassigned
              Reporter:
              Dehaen Pierre
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: