[ZBX-7426] snmp checks fail with failed: first network error, wait for 15 seconds Created: 2013 Nov 22 Updated: 2022 Oct 08 Resolved: 2014 Aug 02 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 2.2.0 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Trivial |
Reporter: | sles | Assignee: | Unassigned |
Resolution: | Duplicate | Votes: | 6 |
Labels: | retry, snmp | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
centos 6.4 |
Attachments: | screenshot-1.jpg | ||||||||
Issue Links: |
|
Description |
Hello! There are messages in server log: [root@zabbix zabbix]# grep "Khokhryaki ABK room 130 UPS 2200" zabbix_server.log 6840:20131122:054359.552 SNMP agent item [upsAdvBatteryCapacity] on host [Khokhryaki ABK room 130 UPS 2200] failed: first network error, wait for 15 seconds 6937:20131122:054400.590 SNMP agent item [upsAdvInputLineFailCause] on host [Khokhryaki ABK room 130 UPS 2200] failed: first network error, wait for 15 seconds 7060:20131122:054415.207 resuming SNMP agent checks on host [Khokhryaki ABK room 130 UPS 2200]: connection restored 6827:20131122:063504.611 SNMP agent item [upsAdvOutputCurrent] on host [Khokhryaki ABK room 130 UPS 2200] failed: first network error, wait for 15 seconds 6978:20131122:063505.612 SNMP agent item [upsAdvOutputLoad] on host [Khokhryaki ABK room 130 UPS 2200] failed: first network error, wait for 15 seconds 7061:20131122:063520.581 resuming SNMP agent checks on host [Khokhryaki ABK room 130 UPS 2200]: connection restored 6837:20131122:064537.468 SNMP agent item [upsBasicBatteryStatus] on host [Khokhryaki ABK room 130 UPS 2200] failed: first network error, wait for 15 seconds 7055:20131122:064552.802 resuming SNMP agent checks on host [Khokhryaki ABK room 130 UPS 2200]: connection restored 6844:20131122:070129.039 SNMP agent item [upsAdvBatteryCapacity] on host [Khokhryaki ABK room 130 UPS 2200] failed: first network error, wait for 15 seconds 6796:20131122:070130.106 SNMP agent item [upsAdvInputLineFailCause] on host [Khokhryaki ABK room 130 UPS 2200] failed: first network error, wait for 15 seconds 7053:20131122:070145.585 resuming SNMP agent checks on host [Khokhryaki ABK room 130 UPS 2200]: connection restored 6837:20131122:072029.159 SNMP agent item [upsAdvBatteryCapacity] on host [Khokhryaki ABK room 130 UPS 2200] failed: first network error, wait for 15 seconds 7055:20131122:072044.904 resuming SNMP agent checks on host [Khokhryaki ABK room 130 UPS 2200]: connection restored 6907:20131122:072529.707 SNMP agent item [upsAdvBatteryCapacity] on host [Khokhryaki ABK room 130 UPS 2200] failed: first network error, wait for 15 seconds 6923:20131122:072530.044 SNMP agent item [upsAdvInputLineFailCause] on host [Khokhryaki ABK room 130 UPS 2200] failed: first network error, wait for 15 seconds 7061:20131122:072545.188 resuming SNMP agent checks on host [Khokhryaki ABK room 130 UPS 2200]: connection restored 6894:20131122:075559.451 SNMP agent item [upsAdvBatteryCapacity] on host [Khokhryaki ABK room 130 UPS 2200] failed: first network error, wait for 15 seconds 6869:20131122:075600.478 SNMP agent item [upsAdvInputLineFailCause] on host [Khokhryaki ABK room 130 UPS 2200] failed: first network error, wait for 15 seconds 7061:20131122:075615.297 resuming SNMP agent checks on host [Khokhryaki ABK room 130 UPS 2200]: connection restored and no data retrieved for several hosts. For particular one I created simple script on the same host which checks the same value every minute: #!/bin/sh date >>/var/log/snmpgettest snmpget -v1 -c public 192.168.46.42 1.3.6.1.4.1.318.1.1.1.3.2.5.0 >>/var/log/snmpgettest And it never failed for last 24 hours, so looks like this is zabbix bug... Thank you! |
Comments |
Comment by Aleksandrs Saveljevs [ 2013 Nov 25 ] |
I used to be getting the same kind of behavior when I was running a UDP-intensive network application alongside Zabbix server. Once I stopped that application, the error no longer appears. So Zabbix probably fails because the UDP request it sends is dropped along the path and, since The reason snmpget does not fail is because it retries 5 times by default if getting value fails. Try repeating the same test with "-r 0" option added to snmpget invocation. |
Comment by sles [ 2013 Nov 29 ] |
Hello! I'd like to add retries to zabbix |
Comment by Aleksandrs Saveljevs [ 2013 Nov 29 ] |
Adding retries to Zabbix is trivial: you should wait until If you wish to patch Zabbix server to work around in the meanwhile, you can change "session.retries = 0" and "session.timeout = ..." in src/zabbix_server/poller/checks_snmp.c. Although I have tried that and it did not help in my scenario. Have you performed the test again with snmpget by adding "-r 0" to the command line? |
Comment by sles [ 2013 Nov 29 ] |
Thank you, I'll try to patch. Just added -r0, will inform about results after 1-2 days. |
Comment by sles [ 2013 Nov 30 ] |
Well, changing session.retries = 5 doesn't help. |
Comment by Aleksandrs Saveljevs [ 2013 Dec 02 ] |
In my case, changing "session.retries = 3" did not help either. When I investigated the problem a bit, tcpdump showed that Zabbix sends request packets, but there is no response. Log on SNMP device shows that in those cases it sometimes drops outgoing UDP packets (i.e., IF-MIB::ifOutDiscards.1 increases), but sometimes it does not. So the problem might be on the device side, where it limits request or response rate, but I have not found such a setting yet. |
Comment by sles [ 2013 Dec 03 ] |
anyway, I get far less such errors in log after increasing retries... |
Comment by Cristian Vasquez Lucic [ 2013 Dec 03 ] |
Hi guys, im getting the same issue with all my hosts, i have cacti and Zabbix in the same Machine, can you point me where i can find the "src/zabbix_server/poller/checks_snmp.c" file?, im running Zabbix 2.2 in a Centos 6.4 64-bit machine with mysql and i can find this file or the "session.retries = 0" anywhere. The real issue is that all my Zabbix graphs are incomplete, but all my cacti graphs look fine. |
Comment by Aleksandrs Saveljevs [ 2013 Dec 04 ] |
You can download Zabbix sources from http://www.zabbix.com/download.php or check them out from Subversion repository at svn://svn.zabbix.com. |
Comment by Przemek [ 2013 Dec 18 ] |
Hi guys, |
Comment by Vlad Ciobancai [ 2013 Dec 31 ] |
I have the same problem on Zabbix 2.2.1, I would like to know if there will be some fix for this problem because is very annoying |
Comment by Corey Shaw [ 2014 Jan 16 ] |
Just a thought, but there may be a few of you that are seeing this error because your Zabbix poller processes are just busy and you simply need more of them. I'd suggest reading and implementing stuff here => http://blog.zabbix.com/monitoring-how-busy-zabbix-processes-are/457/ before blaming this on a bug (which it might legitamitely be, but pollers should be checked first). |
Comment by sles [ 2014 Jan 17 ] |
they are checked. not all pollers are busy |
Comment by Ali HBB [ 2014 Mar 01 ] |
Same Problem Here |
Comment by Vlad Ciobancai [ 2014 Mar 01 ] |
Hey, for me the problem disappears after the snmpd application on application nodes (we use 6 application servers with RHEL 5.10) has been restarted. |
Comment by diego serrano [ 2014 May 26 ] |
Hello! 5989:20140526:130943.953 SNMP agent item "hrStorageUsed[C:\ Label: Serial Number 60b133af]" on host "xxx" failed: first network error, wait for 15 seconds There is not a network problem. Regards |
Comment by Vlad Ciobancai [ 2014 May 26 ] |
Please update the Zabbix agents and Zabbix Server to the latest version 2.2.3. They fixed this bug: https://support.zabbix.com/browse/ZBXNEXT-98 and for me is working without any problems |
Comment by jean-marc CHORIER [ 2014 Jul 04 ] |
Hi, FYI |
Comment by Vlad Ciobancai [ 2014 Jul 04 ] |
Hi jean, Can you paste the errors that you received in zabbix_server log ? |
Comment by Raimonds Treimanis [ 2014 Jul 22 ] |
Im getting same error on regular basis. Most of my items are SNMPv2 (monitoring Cisco routers) Also i noticed that number of busy pollers is much higher than it should be during those error periods, although far from 100% In attached graph you can see it. Notice that nothing in config was changed. At 22:20 errors suddenly started and were appearing until 10:45 when i restartdt zabbix-proxy. After restart they just disappear, to eventually return after some random period of time. |
Comment by Oleksii Zagorskyi [ 2014 Aug 02 ] |
After recently a lot investigating SNMP and rereading now this thread I can say that it's all about network errors and lost UPD packets. Note that starting from 2.2.3 it can look a bit differently, and last Raimonds' comment absolutely confirm my issue report ZBX-8528. I don't think we need to continue discussion here, problem is clear. |
Comment by Oleksii Zagorskyi [ 2014 Aug 02 ] |
Well, I'm closing this issue as duplicate. |