[ZBX-11292] net.if.in/out, net.if.discovery issues on Solaris 10 Created: 2016 Sep 29  Updated: 2017 May 30  Resolved: 2016 Oct 28

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 3.2.0
Fix Version/s: 2.2.16rc1, 3.0.6rc1, 3.2.2rc1, 3.4.0alpha1

Type: Incident report Priority: Blocker
Reporter: David Angelovich Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: networkmonitoring, notsupported, solaris
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Solaris 10


Issue Links:
Duplicate

 Description   

After upgrading the agent to 3.2.0, we see issues with discovery (some hosts discover no interfaces, others discover correctly).
Also seeing net.if.in/out returning an error.
Running the zabbix_agent as root has no effect on the behaviour.

Agent 3.0.0:

[user@server01 ~]$ /opt/zabbix/bin/zabbix_get -s localhost -k net.if.discovery
{"data":[{"{#IFNAME}":"lo0"},{"{#IFNAME}":"igb1"}]}
[user@server01 ~]$ /opt/zabbix/bin/zabbix_get -s localhost -k net.if.in[igb1]
18025317092167

Agent 3.2.0:

[user@server01 ~]$ sudo /opt/zabbix/bin/zabbix_get -s localhost -k net.if.discovery
{"data":[{"{#IFNAME}":"lo0"},{"{#IFNAME}":"igb1"}]}
[user@server01 ~]$ sudo /opt/zabbix/bin/zabbix_get -s localhost -k net.if.in[igb1]
ZBX_NOTSUPPORTED: Cannot look up interface "igb1" in kernel statistics facility
[user@server01 ~]$ sudo /opt/zabbix/bin/zabbix_get -s localhost -k net.if.out[igb1]
ZBX_NOTSUPPORTED: Cannot look up interface "igb1" in kernel statistics facility

Although it does appear that igb1 is in kstat:

[[email protected] ~]$ kstat | grep igb
module: igb                             instance: 0     
module: igb                             instance: 0     
name:   igb0                            class:    net
module: igb                             instance: 0     
module: igb                             instance: 0     
module: igb                             instance: 1     
module: igb                             instance: 1     
name:   igb1                            class:    net
module: igb                             instance: 1     
module: igb                             instance: 1     
module: igb                             instance: 2     
module: igb                             instance: 2     
module: igb                             instance: 2     
module: igb                             instance: 3     
module: igb                             instance: 3     
module: igb                             instance: 3     

Agent 3.2.0 failing to return interfaces in discovery:

[user@server02 ~]$ /opt/zabbix/bin/zabbix_get -s localhost -k net.if.discovery
{"data":[]}

[user@server02 ~]$ ifconfig -a
lo0:4: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
bnx532002:1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 6
        inet 10.0.0.1 netmask ffffff00 broadcast 10.0.0.255

I believe this is somehow related to ZBX-10372



 Comments   
Comment by Oleksii Zagorskyi [ 2016 Oct 05 ]

I also have the same/similar issue:

On agent 2.4.4 item "net.if.in" for all interfaces was working fine:

[root@pmgzzb01 bin]# ./zabbix_get -s<> -k"net.if.discovery"
{"data":[{"{#IFNAME}":"lo0"},{"{#IFNAME}":"bnx1"},{"{#IFNAME}":"e1000g0"},{"{#IFNAME}":"e1000g1"},{"{#IFNAME}":"e1000g2"}]}
[
root@pmgzzb01 bin]# ./zabbix_get -s<> -k"net.if.discovery"
{"data":[{"{#IFNAME}":"lo0"},{"{#IFNAME}":"bnx1"},{"{#IFNAME}":"e1000g0"},{"{#IFNAME}":"e1000g1"},{"{#IFNAME}":"e1000g2"}]}

[root@pmgzzb01 bin]# ./zabbix_get -s<> -k"net.if.in[e1000g0,bytes]"
62420663340

[root@pmgzzb01 bin]# ./zabbix_get -s<> -k"net.if.in[e1000g2,bytes]"
119450027608

But after upgrade it to agent 3.2.0, one particular interface stopped to work:

./zabbix_get -s<> -k"net.if.discovery"
{"data":[{"{#IFNAME}":"lo0"},{"{#IFNAME}":"bnx1"},{"{#IFNAME}":"e1000g0"},{"{#IFNAME}":"e1000g1"},{"{#IFNAME}":"e1000g2"}]}
NAME root 11:06:59 /usr/local/sbin # ./zabbix_agentd -t "net.if.in[e1000g0,bytes]"
net.if.in[e1000g0,bytes]                      [u|2821496044068]

NAME root 11:08:33 /usr/local/sbin # ./zabbix_agentd -t "net.if.in[e1000g1,bytes]"
net.if.in[e1000g1,bytes]                      [m|ZBX_NOTSUPPORTED] [Cannot look up interface "e1000g1" in kernel statistics facility]
zabbix_agentd (daemon) (Zabbix) 3.2.0
Revision 62447 13 September 2016, compilation time: Sep 13 2016 17:10:38

uname -a
SunOS NAME 5.10 Generic 150401-38 i86pc i86pc

Agent 3.2 downloaded from http://www.zabbix.com/downloads/3.2.0/zabbix_agents_3.2.0.solaris10.amd64.tar.gz

Comment by Oleksii Zagorskyi [ 2016 Oct 05 ]

David, you showed that agent 3.2.0 for key "net.if.discovery" successfully returns interfaces for "server01" host but does not for host "server02".
Can you comment that difference?

Comment by David Angelovich [ 2016 Oct 05 ]

I was unable to find a reason for it, however my understanding of Solaris NIC config is a pretty hazy.

I've only found two servers (rather, Solaris Containers) where no items are discovered. Happy to pull more info from the servers if you can tell me what commands/output you want.

Comment by Dehaen Pierre [ 2016 Oct 05 ]

David, it is normal that the net.if.discovery does not work in a zone for a "shared ip" interface. This is an layer 3 (ip) "interface" without a layer 2 in the zone, and it is at layer 2 that the kernel keeps network interface usage statistics. You can have layer 3 usage but not per interface.

A physical interface appears in the output of "ifconfig -a" as "igb1" for instance, but a logical (1) interface appears as "igb1:2" for instance.
(1) "logical", not "virtual", see ifconfig(1M): a logical interface on Solaris 10 and below can only be used to add another IP address to a physical interface, while a virtual interface as you can have in Solaris 11 appears as a layer 2 interface.

I'm at the origin of ZBX-10372 but I must say I'm not using the fix implemented there yet, I'm using an overlay library I wrote as a workaround as I didn't want to wait for the official fix to become available. Nowadays the latest stable releases of the agent compiled for Solaris 10 are available and I should fine some time to upgrade...

Nevertheless I downloaded the agent 3.2.0 (zabbix_agents_3.2.0.solaris10.amd64.tar.gz and sparc), untar'ed and I tried the following:

  • on Solaris 10:
    • sbin/zabbix_agentd --test 'net.if.discovery' : it worked in the GZ and in the NGZ with a "ip-type: exclusive", but as expected it did not return any interface in a NGZ with "ip-type: shared";
    • sbin/zabbix_agentd --test 'net.if.in[bge0]' : it worked in the GZ, not in the NGZ where the ip-type was shared;
    • sbin/zabbix_agentd --test 'net.if.in[bge1]' : the ip-type was exclusive and it did not work both in the GZ and in the NGZ. This is not expected and the problem comes from the fact bge1 is instance 1, not instance 0, and the code assumes the instance is always 0.
  • on Solaris 11:
    • sbin/zabbix_agentd --test 'net.if.discovery' : it worked in the GZ and in the NGZ
    • sbin/zabbix_agentd --test 'net.if.in[net0]' : it worked in the GZ and in the NGZ (but I'm only using exclusive ip-types)
Comment by Dehaen Pierre [ 2016 Oct 06 ]

Oleksiy, I confirm that, with the fix implemented in ZBX-10372, you must have an issue on Solaris 10 when there are multiple instances of the same kind of interface. Only the first can be queried.

As you can see here after each interface has a different instance number:

# kstat -p bge::bge*:obytes
bge:0:bge0:obytes       2519212624
bge:1:bge1:obytes       122227548
bge:2:bge2:obytes       3470083222

...but the code says:

		/* Assume that interfaces in our zone have instance 0. If instance > 0 then it belongs to other zone */
		/* and should be monitored by Zabbix agent running in that zone where it will have instance number 0. */
		if (0 != kp->ks_instance)
			continue;

That explains your problems.

To summarize:

  • In the GZ Solaris 10 has the following statistics, presented as "module:instance:name:statistic", see kstat(1M):
    # kstat -p bge:::obytes
    bge:0:bge0:obytes       2548301145
    bge:0:mac:obytes        2548301145
    bge:1:bge1:obytes       122238790
    bge:1:mac:obytes        122238790
    bge:2:bge2:obytes       3478116487
    bge:2:mac:obytes        3478116487
    bge:3:mac:obytes        0
    

    Notes:

    • bge0 is used as "ip-type: shared" in zones, bge1 is used as "ip-type: exclusive" in one zone, bge2 is used in the GZ only.
    • We can get layer 1 statistics (class=bge, name=mac) and layer 2 statistics (class=bge, name=bge1 for instance). There is no layer 2 stats for bge3 because it is not connected.
    • The instance number begins at 0 and is incremented with each physical interface.
    • Whenever possible I would preferably use the obytes64 stats rather than the obytes one.
  • In a NGZ on Solaris 10, be it exclusive or shared, the same statistics can be queried BUT, of course, the stats of a shared ip are global to the interface, they cannot be split by logical ip.
  • On Solaris 11, in the GZ:
    # kstat -p igb:::obytes
    igb:0:phys:obytes       0
    igb:1:phys:obytes       0
    igb:2:phys:obytes       0
    igb:3:phys:obytes       3056284643
    igb:4:phys:obytes       3454829594
    igb:5:phys:obytes       0
    igb:6:phys:obytes       0
    igb:7:phys:obytes       0
    # kstat -p link:::obytes
    link:0:net0:obytes      869151843
    link:0:net1:obytes      0
    link:0:net2:obytes      0
    link:0:net3:obytes      0
    link:0:net4:obytes      0
    link:0:net5:obytes      0
    link:0:net6:obytes      0
    link:0:net7:obytes      0
    link:0:net8:obytes      881682997
    link:1:lan0:obytes      587622975
    link:1:net0:obytes      444619752
    link:2:dmz0:obytes      1829404436
    link:3:net0:obytes      1336445929
    link:4:net0:obytes      2449511461
    link:5:net0:obytes      1722410698
    

    Notes:

    • The layer 1 (physical interface driver) is of type igb and can be found with name=phys, class=igb (the interface/driver type). The instance number is incremented with each interface.
    • The layer 2 (link) names are updatable in Solaris 11 (see dladm(1M)). Statistics can be found with class=link, name=datalink_name. The instance number is apparently 0 for the GZ datalinks but it may vary for vnics (like those created for NGZ with exclusive ip-type) to differentiate the stats.
  • For Solaris 11, in a NGZ, the instance number is 0, even if it was another number in the GZ. In fact nearly the same statistic can be found with class=datalink_name, name=link:
    # kstat -p link:1:lan0:obytes64 lan0:::obytes; zlogin myzone kstat -p link::lan0:obytes64 lan0:::obytes
    link:1:lan0:obytes64    52208370372
    lan0:1:link:obytes      52208370372
    link:0:lan0:obytes64    52208371593
    lan0:1:link:obytes      52208371593
    
Comment by Oleksii Zagorskyi [ 2016 Oct 06 ]

Dehaen, your input is amazing, thank you !

Comment by Andris Mednis [ 2016 Oct 17 ]

Hi!
Seems like there are many possible combinations produced by Solaris versions, different interface types, global or non-global zones, shared or exclusive IP, etc.

Can we find a single 'kstat' command which covers all cases ?

For example,
kstat -p -c net '::<interface_name>:'

If that works we have a model what to do in Zabbix agent.

Comment by Andris Mednis [ 2016 Oct 20 ]

I propose a very simple fix - delete instance number checking:

Index: src/libs/zbxsysinfo/solaris/net.c
===================================================================
--- src/libs/zbxsysinfo/solaris/net.c   (revision 63182)
+++ src/libs/zbxsysinfo/solaris/net.c   (working copy)
@@ -37,11 +37,6 @@
                if (0 != strcmp(name, kp->ks_name))             /* network interface name */
                        continue;
 
-               /* Assume that interfaces in our zone have instance 0. If instance > 0 then it belongs to other zone */
-               /* and should be monitored by Zabbix agent running in that zone where it will have instance number 0. */
-               if (0 != kp->ks_instance)
-                       continue;
-
                if (0 == strcmp("net", kp->ks_class))
                        break;
        }

Apparently it works on Solaris 10 and 11 tests. Can somebody test it in real-world Solaris configurations ?

Comment by Andrei Gushchin (Inactive) [ 2016 Oct 20 ]

Please see kstat output

XXXXX root 10:40:57 /usr/local/bin # kstat -p -c net '::e1000g0:'
e1000g:0:e1000g0:brdcstrcv 535115
e1000g:0:e1000g0:brdcstxmt 335994
e1000g:0:e1000g0:class net
e1000g:0:e1000g0:collisions 0
e1000g:0:e1000g0:crtime 187.334528519
e1000g:0:e1000g0:ierrors 0
e1000g:0:e1000g0:ifspeed 100000000
e1000g:0:e1000g0:ipackets 3633596669
e1000g:0:e1000g0:ipackets64 3633596669
e1000g:0:e1000g0:multircv 0
e1000g:0:e1000g0:multixmt 0
e1000g:0:e1000g0:norcvbuf 0
e1000g:0:e1000g0:noxmtbuf 0
e1000g:0:e1000g0:obytes 2970201381
e1000g:0:e1000g0:obytes64 226308500773
e1000g:0:e1000g0:oerrors 0
e1000g:0:e1000g0:opackets 2019264177
e1000g:0:e1000g0:opackets64 2019264177
e1000g:0:e1000g0:rbytes 3913518136
e1000g:0:e1000g0:rbytes64 3658930687032
e1000g:0:e1000g0:snaptime 6135201.07807392
e1000g:0:e1000g0:unknowns 0
XXXXXX root 10:44:03 /usr/local/bin # kstat -p -c net '::e1000g1:'
e1000g:1:e1000g1:brdcstrcv 2538824
e1000g:1:e1000g1:brdcstxmt 164358
e1000g:1:e1000g1:class net
e1000g:1:e1000g1:collisions 0
e1000g:1:e1000g1:crtime 187.53033686
e1000g:1:e1000g1:ierrors 0
e1000g:1:e1000g1:ifspeed 100000000
e1000g:1:e1000g1:ipackets 3902471694
e1000g:1:e1000g1:ipackets64 3902471694
e1000g:1:e1000g1:multircv 0
e1000g:1:e1000g1:multixmt 0
e1000g:1:e1000g1:norcvbuf 0
e1000g:1:e1000g1:noxmtbuf 0
e1000g:1:e1000g1:obytes 792608597
e1000g:1:e1000g1:obytes64 511893716821
e1000g:1:e1000g1:oerrors 0
e1000g:1:e1000g1:opackets 1768643700
e1000g:1:e1000g1:opackets64 1768643700
e1000g:1:e1000g1:rbytes 3645784631
e1000g:1:e1000g1:rbytes64 1287841006135
e1000g:1:e1000g1:snaptime 6135445.85282709
e1000g:1:e1000g1:unknowns 0
Comment by Dehaen Pierre [ 2016 Oct 20 ]

Hi,

I'm afraid "kstat -p -c net '::<interface_name>:'" won't work. For example on a Solaris 11 box with the following interfaces:

GZ# dladm
LINK                CLASS     MTU    STATE    OVER
net0                phys      1500   up       --
net1                phys      1500   up       --
zone1/net0          vnic      1500   up       net1
zone2/net0          vnic      1500   up       net1

GZ# kstat -p -c net '::net0:obytes'
link:0:net0:obytes      3869558907
link:1:net0:obytes      859068232
link:2:net0:obytes      4004013184

[...little delay... so don't try to match exact figures...]

zone1# kstat -p -c net '::net0:obytes'
link:0:net0:obytes      861050627

zone2# kstat -p -c net '::net0:obytes'
link:0:net0:obytes      469266626

Explanations:

  • The GZ has 2 physical interfaces with the default datalink names net0 and net1
  • There are 2 NGZ with exclusive IP (vnic based), each with a datalink named net0 in their respective zone.

Well, your kstat command seems to work on Solaris 10 NGZ and GZ, on Solaris 11 in a NGZ, but on Solaris 11 in the GZ you would most probably have to take the instance number 0 as, I guess, it will be set up before the vnics are created on it.

Now... I'm not sure simply taking the first instance in the kstat chain (first match) always means taking instance 0! Ideally you should read all the matching instances and compare the instance number found with the previous match (if any), and keep it if it is less than the previous one. Something like:

	kstat_t		*kpl;
	for (kp = kc->kc_chain; NULL != kp; kp = kp->ks_next)	/* traverse all kstat chain */
	{
		if (0 != strcmp(name, kp->ks_name))		/* network interface name */
			continue;
		if (NULL != kpl && kp->ks_instance > kpl->ks_instance)
			continue;
		kpl = kp;
	}
	kp = kpl;
Comment by Andris Mednis [ 2016 Oct 20 ]

Thanks, pierre4031, for your great help!
Is it ok that

                if (0 == strcmp("net", kp->ks_class))
                        break;

is not included any more ? (i think it is still necessary).

Comment by Dehaen Pierre [ 2016 Oct 20 ]

You are right Mednis (or Andris, I'm not sure about your first name)!

The class check is still necessary (as in kstat -c net...).
Thanks for your work!

PS: I used "kpl" for "kplowest".

Comment by richlv [ 2016 Oct 20 ]

pierre4031, the first name would be "Andris"

Comment by Andris Mednis [ 2016 Oct 21 ]

Added searching for the instance with the smallest number, thanks to pierre4031 for idea:

static int      get_kstat_named_field(const char *name, const char *field, zbx_uint64_t *field_value)
{
        int             ret = FAIL, min_instance = -1;
        kstat_ctl_t     *kc;
        kstat_t         *kp, *min_kp;
        kstat_named_t   *kn;

        if (NULL == (kc = kstat_open()))
                return FAIL;

        for (kp = kc->kc_chain; NULL != kp; kp = kp->ks_next)   /* traverse all kstat chain */
        {
                if (0 != strcmp(name, kp->ks_name))             /* network interface name */
                        continue;

                if (0 != strcmp("net", kp->ks_class))
                        continue;

                /* find instance with the smallest number */

                if (-1 == min_instance || kp->ks_instance < min_instance)
                {
                        min_instance = kp->ks_instance;
                        min_kp = kp;
                }
        }

        if (-1 != min_instance)
                kp = min_kp;
...
Comment by Andris Mednis [ 2016 Oct 21 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-11292 (for version 2.2, but same changes should apply to newer versions, too).

Comment by Andris Zeila [ 2016 Oct 27 ]

(1) possible minor optimization - we can stop looking for the instance with smallest number when we find 0.

andris Thanks for finding it ! RESOLVED in r63399.

wiper CLOSED

Comment by Andris Zeila [ 2016 Oct 27 ]

Successfully tested.

Comment by Andris Mednis [ 2016 Oct 28 ]

Released in:

  • pre-2.2.16rc1 r63412
  • pre-3.0.6rc1 r63413
  • pre-3.2.2rc1 r63414
  • pre-3.3.0 (trunk) r63415

No documentation change required.

Comment by patrik uytterhoeven [ 2017 Feb 15 ]

we have the same issue with the solaris packages

when will new packages be released ?

Generated at Thu Apr 25 10:15:37 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.