[ZBX-11292] net.if.in/out, net.if.discovery issues on Solaris 10 Created: 2016 Sep 29 Updated: 2017 May 30 Resolved: 2016 Oct 28 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 3.2.0 |
Fix Version/s: | 2.2.16rc1, 3.0.6rc1, 3.2.2rc1, 3.4.0alpha1 |
Type: | Incident report | Priority: | Blocker |
Reporter: | David Angelovich | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | networkmonitoring, notsupported, solaris | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Solaris 10 |
Issue Links: |
|
Description |
After upgrading the agent to 3.2.0, we see issues with discovery (some hosts discover no interfaces, others discover correctly). Agent 3.0.0: [user@server01 ~]$ /opt/zabbix/bin/zabbix_get -s localhost -k net.if.discovery {"data":[{"{#IFNAME}":"lo0"},{"{#IFNAME}":"igb1"}]} [user@server01 ~]$ /opt/zabbix/bin/zabbix_get -s localhost -k net.if.in[igb1] 18025317092167 Agent 3.2.0: [user@server01 ~]$ sudo /opt/zabbix/bin/zabbix_get -s localhost -k net.if.discovery {"data":[{"{#IFNAME}":"lo0"},{"{#IFNAME}":"igb1"}]} [user@server01 ~]$ sudo /opt/zabbix/bin/zabbix_get -s localhost -k net.if.in[igb1] ZBX_NOTSUPPORTED: Cannot look up interface "igb1" in kernel statistics facility [user@server01 ~]$ sudo /opt/zabbix/bin/zabbix_get -s localhost -k net.if.out[igb1] ZBX_NOTSUPPORTED: Cannot look up interface "igb1" in kernel statistics facility Although it does appear that igb1 is in kstat: [[email protected] ~]$ kstat | grep igb module: igb instance: 0 module: igb instance: 0 name: igb0 class: net module: igb instance: 0 module: igb instance: 0 module: igb instance: 1 module: igb instance: 1 name: igb1 class: net module: igb instance: 1 module: igb instance: 1 module: igb instance: 2 module: igb instance: 2 module: igb instance: 2 module: igb instance: 3 module: igb instance: 3 module: igb instance: 3 Agent 3.2.0 failing to return interfaces in discovery: [user@server02 ~]$ /opt/zabbix/bin/zabbix_get -s localhost -k net.if.discovery {"data":[]} [user@server02 ~]$ ifconfig -a lo0:4: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 bnx532002:1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 6 inet 10.0.0.1 netmask ffffff00 broadcast 10.0.0.255 I believe this is somehow related to |
Comments |
Comment by Oleksii Zagorskyi [ 2016 Oct 05 ] |
I also have the same/similar issue: On agent 2.4.4 item "net.if.in" for all interfaces was working fine: [root@pmgzzb01 bin]# ./zabbix_get -s<> -k"net.if.discovery" {"data":[{"{#IFNAME}":"lo0"},{"{#IFNAME}":"bnx1"},{"{#IFNAME}":"e1000g0"},{"{#IFNAME}":"e1000g1"},{"{#IFNAME}":"e1000g2"}]} [ root@pmgzzb01 bin]# ./zabbix_get -s<> -k"net.if.discovery" {"data":[{"{#IFNAME}":"lo0"},{"{#IFNAME}":"bnx1"},{"{#IFNAME}":"e1000g0"},{"{#IFNAME}":"e1000g1"},{"{#IFNAME}":"e1000g2"}]} [root@pmgzzb01 bin]# ./zabbix_get -s<> -k"net.if.in[e1000g0,bytes]" 62420663340 [root@pmgzzb01 bin]# ./zabbix_get -s<> -k"net.if.in[e1000g2,bytes]" 119450027608 But after upgrade it to agent 3.2.0, one particular interface stopped to work: ./zabbix_get -s<> -k"net.if.discovery" {"data":[{"{#IFNAME}":"lo0"},{"{#IFNAME}":"bnx1"},{"{#IFNAME}":"e1000g0"},{"{#IFNAME}":"e1000g1"},{"{#IFNAME}":"e1000g2"}]} NAME root 11:06:59 /usr/local/sbin # ./zabbix_agentd -t "net.if.in[e1000g0,bytes]" net.if.in[e1000g0,bytes] [u|2821496044068] NAME root 11:08:33 /usr/local/sbin # ./zabbix_agentd -t "net.if.in[e1000g1,bytes]" net.if.in[e1000g1,bytes] [m|ZBX_NOTSUPPORTED] [Cannot look up interface "e1000g1" in kernel statistics facility] zabbix_agentd (daemon) (Zabbix) 3.2.0 Revision 62447 13 September 2016, compilation time: Sep 13 2016 17:10:38 uname -a SunOS NAME 5.10 Generic 150401-38 i86pc i86pc Agent 3.2 downloaded from http://www.zabbix.com/downloads/3.2.0/zabbix_agents_3.2.0.solaris10.amd64.tar.gz |
Comment by Oleksii Zagorskyi [ 2016 Oct 05 ] |
David, you showed that agent 3.2.0 for key "net.if.discovery" successfully returns interfaces for "server01" host but does not for host "server02". |
Comment by David Angelovich [ 2016 Oct 05 ] |
I was unable to find a reason for it, however my understanding of Solaris NIC config is a pretty hazy. I've only found two servers (rather, Solaris Containers) where no items are discovered. Happy to pull more info from the servers if you can tell me what commands/output you want. |
Comment by Dehaen Pierre [ 2016 Oct 05 ] |
David, it is normal that the net.if.discovery does not work in a zone for a "shared ip" interface. This is an layer 3 (ip) "interface" without a layer 2 in the zone, and it is at layer 2 that the kernel keeps network interface usage statistics. You can have layer 3 usage but not per interface. A physical interface appears in the output of "ifconfig -a" as "igb1" for instance, but a logical (1) interface appears as "igb1:2" for instance. I'm at the origin of Nevertheless I downloaded the agent 3.2.0 (zabbix_agents_3.2.0.solaris10.amd64.tar.gz and sparc), untar'ed and I tried the following:
|
Comment by Dehaen Pierre [ 2016 Oct 06 ] |
Oleksiy, I confirm that, with the fix implemented in As you can see here after each interface has a different instance number: # kstat -p bge::bge*:obytes bge:0:bge0:obytes 2519212624 bge:1:bge1:obytes 122227548 bge:2:bge2:obytes 3470083222 ...but the code says: /* Assume that interfaces in our zone have instance 0. If instance > 0 then it belongs to other zone */ /* and should be monitored by Zabbix agent running in that zone where it will have instance number 0. */ if (0 != kp->ks_instance) continue; That explains your problems. To summarize:
|
Comment by Oleksii Zagorskyi [ 2016 Oct 06 ] |
Dehaen, your input is amazing, thank you ! |
Comment by Andris Mednis [ 2016 Oct 17 ] |
Hi! Can we find a single 'kstat' command which covers all cases ? For example, If that works we have a model what to do in Zabbix agent. |
Comment by Andris Mednis [ 2016 Oct 20 ] |
I propose a very simple fix - delete instance number checking: Index: src/libs/zbxsysinfo/solaris/net.c =================================================================== --- src/libs/zbxsysinfo/solaris/net.c (revision 63182) +++ src/libs/zbxsysinfo/solaris/net.c (working copy) @@ -37,11 +37,6 @@ if (0 != strcmp(name, kp->ks_name)) /* network interface name */ continue; - /* Assume that interfaces in our zone have instance 0. If instance > 0 then it belongs to other zone */ - /* and should be monitored by Zabbix agent running in that zone where it will have instance number 0. */ - if (0 != kp->ks_instance) - continue; - if (0 == strcmp("net", kp->ks_class)) break; } Apparently it works on Solaris 10 and 11 tests. Can somebody test it in real-world Solaris configurations ? |
Comment by Andrei Gushchin (Inactive) [ 2016 Oct 20 ] |
Please see kstat output XXXXX root 10:40:57 /usr/local/bin # kstat -p -c net '::e1000g0:' e1000g:0:e1000g0:brdcstrcv 535115 e1000g:0:e1000g0:brdcstxmt 335994 e1000g:0:e1000g0:class net e1000g:0:e1000g0:collisions 0 e1000g:0:e1000g0:crtime 187.334528519 e1000g:0:e1000g0:ierrors 0 e1000g:0:e1000g0:ifspeed 100000000 e1000g:0:e1000g0:ipackets 3633596669 e1000g:0:e1000g0:ipackets64 3633596669 e1000g:0:e1000g0:multircv 0 e1000g:0:e1000g0:multixmt 0 e1000g:0:e1000g0:norcvbuf 0 e1000g:0:e1000g0:noxmtbuf 0 e1000g:0:e1000g0:obytes 2970201381 e1000g:0:e1000g0:obytes64 226308500773 e1000g:0:e1000g0:oerrors 0 e1000g:0:e1000g0:opackets 2019264177 e1000g:0:e1000g0:opackets64 2019264177 e1000g:0:e1000g0:rbytes 3913518136 e1000g:0:e1000g0:rbytes64 3658930687032 e1000g:0:e1000g0:snaptime 6135201.07807392 e1000g:0:e1000g0:unknowns 0 XXXXXX root 10:44:03 /usr/local/bin # kstat -p -c net '::e1000g1:' e1000g:1:e1000g1:brdcstrcv 2538824 e1000g:1:e1000g1:brdcstxmt 164358 e1000g:1:e1000g1:class net e1000g:1:e1000g1:collisions 0 e1000g:1:e1000g1:crtime 187.53033686 e1000g:1:e1000g1:ierrors 0 e1000g:1:e1000g1:ifspeed 100000000 e1000g:1:e1000g1:ipackets 3902471694 e1000g:1:e1000g1:ipackets64 3902471694 e1000g:1:e1000g1:multircv 0 e1000g:1:e1000g1:multixmt 0 e1000g:1:e1000g1:norcvbuf 0 e1000g:1:e1000g1:noxmtbuf 0 e1000g:1:e1000g1:obytes 792608597 e1000g:1:e1000g1:obytes64 511893716821 e1000g:1:e1000g1:oerrors 0 e1000g:1:e1000g1:opackets 1768643700 e1000g:1:e1000g1:opackets64 1768643700 e1000g:1:e1000g1:rbytes 3645784631 e1000g:1:e1000g1:rbytes64 1287841006135 e1000g:1:e1000g1:snaptime 6135445.85282709 e1000g:1:e1000g1:unknowns 0 |
Comment by Dehaen Pierre [ 2016 Oct 20 ] |
Hi, I'm afraid "kstat -p -c net '::<interface_name>:'" won't work. For example on a Solaris 11 box with the following interfaces: GZ# dladm LINK CLASS MTU STATE OVER net0 phys 1500 up -- net1 phys 1500 up -- zone1/net0 vnic 1500 up net1 zone2/net0 vnic 1500 up net1 GZ# kstat -p -c net '::net0:obytes' link:0:net0:obytes 3869558907 link:1:net0:obytes 859068232 link:2:net0:obytes 4004013184 [...little delay... so don't try to match exact figures...] zone1# kstat -p -c net '::net0:obytes' link:0:net0:obytes 861050627 zone2# kstat -p -c net '::net0:obytes' link:0:net0:obytes 469266626 Explanations:
Well, your kstat command seems to work on Solaris 10 NGZ and GZ, on Solaris 11 in a NGZ, but on Solaris 11 in the GZ you would most probably have to take the instance number 0 as, I guess, it will be set up before the vnics are created on it. Now... I'm not sure simply taking the first instance in the kstat chain (first match) always means taking instance 0! Ideally you should read all the matching instances and compare the instance number found with the previous match (if any), and keep it if it is less than the previous one. Something like: kstat_t *kpl; for (kp = kc->kc_chain; NULL != kp; kp = kp->ks_next) /* traverse all kstat chain */ { if (0 != strcmp(name, kp->ks_name)) /* network interface name */ continue; if (NULL != kpl && kp->ks_instance > kpl->ks_instance) continue; kpl = kp; } kp = kpl; |
Comment by Andris Mednis [ 2016 Oct 20 ] |
Thanks, pierre4031, for your great help! if (0 == strcmp("net", kp->ks_class)) break; is not included any more ? (i think it is still necessary). |
Comment by Dehaen Pierre [ 2016 Oct 20 ] |
You are right Mednis (or Andris, I'm not sure about your first name)! The class check is still necessary (as in kstat -c net...). PS: I used "kpl" for "kplowest". |
Comment by richlv [ 2016 Oct 20 ] |
pierre4031, the first name would be "Andris" |
Comment by Andris Mednis [ 2016 Oct 21 ] |
Added searching for the instance with the smallest number, thanks to pierre4031 for idea: static int get_kstat_named_field(const char *name, const char *field, zbx_uint64_t *field_value) { int ret = FAIL, min_instance = -1; kstat_ctl_t *kc; kstat_t *kp, *min_kp; kstat_named_t *kn; if (NULL == (kc = kstat_open())) return FAIL; for (kp = kc->kc_chain; NULL != kp; kp = kp->ks_next) /* traverse all kstat chain */ { if (0 != strcmp(name, kp->ks_name)) /* network interface name */ continue; if (0 != strcmp("net", kp->ks_class)) continue; /* find instance with the smallest number */ if (-1 == min_instance || kp->ks_instance < min_instance) { min_instance = kp->ks_instance; min_kp = kp; } } if (-1 != min_instance) kp = min_kp; ... |
Comment by Andris Mednis [ 2016 Oct 21 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-11292 (for version 2.2, but same changes should apply to newer versions, too). |
Comment by Andris Zeila [ 2016 Oct 27 ] |
(1) possible minor optimization - we can stop looking for the instance with smallest number when we find 0. andris Thanks for finding it ! RESOLVED in r63399. wiper CLOSED |
Comment by Andris Zeila [ 2016 Oct 27 ] |
Successfully tested. |
Comment by Andris Mednis [ 2016 Oct 28 ] |
Released in:
No documentation change required. |
Comment by patrik uytterhoeven [ 2017 Feb 15 ] |
we have the same issue with the solaris packages when will new packages be released ? |