[ZBX-21716] Agent 2 throws ‘first network error’ on a regular basis Created: 2022 Sep 30 Updated: 2022 Nov 08 |
|
Status: | Reopened |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 6.0.9, 6.2.2, 6.2.3 |
Fix Version/s: | None |
Type: | Problem report | Priority: | Trivial |
Reporter: | Jeffrey Descan | Assignee: | Victor Breda Credidio |
Resolution: | Unresolved | Votes: | 3 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Zabbix Agent2 (v6.0.x, v6.2.x) on Linux (Debian) and Windows |
Attachments: |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Description |
Comments |
Comment by Dmitrijs Lamberts [ 2022 Oct 03 ] |
Please be advised that this section of the tracker is for bug reports only. The case you have submitted can not be qualified as one (more looks like a network issue), so please reach out to [email protected] for commercial support (https://zabbix.com/support) or consultancy services. Alternatively, you can also use our IRC channel or community forum (https://www.zabbix.com/forum) for assistance. With that said, we are closing this ticket. Thank you for understanding. |
Comment by Jeffrey Descan [ 2022 Oct 03 ] |
This is not an actual network error, we've verified all firewalls in between. They are not causing any network timeouts. A high level network verification is done in multiple steps to prove this.
Passive connection from proxy to host
root@proxy001:~# telnet web012.ctr 20050
Trying x.x.x.43...
Connected to web012.ctr.
Escape character is '^]'.
^]q
telnet> q
Connection closed.
SSH + Active connection from host to proxy root@proxy001:~# ssh x.x.x.43 Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Last login: Mon Oct 3 08:03:20 2022 from 172.29.63.48 [email protected]:~# telnet 172.29.63.48 10051 Trying 172.29.63.48... Connected to 172.29.63.48. Escape character is '^]'. ^]q telnet> q Connection closed.
Zabbix Get executions: root@proxy001:~# date && zabbix_get -s x.x.x.43 -p 20050 --tls-connect psk --tls-psk-identity <PSK> --tls-psk-file <PSK-FILE> -k 'agent.version'; Mon 03 Oct 2022 08:06:49 AM CEST 6.2.3root@proxy001:~# date && zabbix_get -s x.x.x.43 -p 20050 --tls-connect psk --tls-psk-identity <PSK> --tls-psk-file <PSK-FILE> -k 'agent.version'; Mon 03 Oct 2022 08:06:50 AM CEST 6.2.3 root@proxy001:~# date && zabbix_get -s x.x.x.43 -p 20050 --tls-connect psk --tls-psk-identity <PSK> --tls-psk-file <PSK-FILE> -k 'agent.version'; Mon 03 Oct 2022 08:06:51 AM CEST 6.2.3root@proxy001:~# date && zabbix_get -s x.x.x.43 -p 20050 --tls-connect psk --tls-psk-identity <PSK> --tls-psk-file <PSK-FILE> -k 'agent.version'; Mon 03 Oct 2022 08:06:52 AM CEST 6.2.3root@proxy001:~# date && zabbix_get -s x.x.x.43 -p 20050 --tls-connect psk --tls-psk-identity <PSK> --tls-psk-file <PSK-FILE> -k 'agent.version'; Mon 03 Oct 2022 08:06:57 AM CEST 6.2.3root@proxy001:~# date && zabbix_get -s x.x.x.43 -p 20050 --tls-connect psk --tls-psk-identity <PSK> --tls-psk-file <PSK-FILE> -k 'agent.version'; Mon 03 Oct 2022 08:09:10 AM CEST 6.2.3root@proxy001:~# date && zabbix_get -s x.x.x.43 -p 20050 --tls-connect psk --tls-psk-identity <PSK> --tls-psk-file <PSK-FILE> -k 'agent.version'; Mon 03 Oct 2022 08:10:46 AM CEST 6.2.3
Intermittent zabbix_get (multiple times per second): root@proxy001:~# while true; do date && zabbix_get -s x.x.x.43 -p 20050 --tls-connect psk --tls-psk-identity <PSK> --tls-psk-file <PSK-FILE> -k 'agent.version'; sleep 0.3; done Mon 03 Oct 2022 08:11:13 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:13 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:13 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:14 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:14 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:14 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:15 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:15 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:15 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:16 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:16 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:16 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:17 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:17 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:17 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:17 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:18 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:18 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:18 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:19 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:19 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:19 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:20 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:20 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:20 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:21 AM CEST 6.2.3 Mon 03 Oct 2022 08:11:21 AM CEST 6.2.3 ^C
This is not a fluke, nor a network timeout on our end, as we're always seeing this with the massive collectors being started and stopped at that point. We have this on more than 1000 hosts in our environment, all running Agent2 and passing different network flows (they've been checked as we speak).
We hope you can look into this, as this is a big show stopper.
In case we need to provide any other data, or run some tests, please let me know. |
Comment by Victor Breda Credidio [ 2022 Oct 04 ] |
Hi Jeffrey! Can you confirm if this is happening only with one Proxy, what is the resources running on it (memory, CPU, type of disk) and what is the VPS it's working with (you can check that on Administration -> Proxies)? |
Comment by Jeffrey Descan [ 2022 Oct 04 ] |
Hey Victor Thanks for reproducing this. We're seeing this behaviour with all our proxies, we have 20 proxies running at this point. All proxies have the same hardware requirements:
The Zabbix proxy config: LogFile=/var/log/zabbix/zabbix_proxy.log LogSlowQueries=3000 LogRemoteCommands=1 TLSConnect=psk ProxyLocalBuffer=24 ProxyOfflineBuffer=24 StartPollers=700 StartIPMIPollers=1 StartPollersUnreachable=350 StartPingers=100 StartHTTPPollers=30 StartSNMPTrapper=1 CacheSize=8G HistoryIndexCacheSize=2G EnableRemoteCommands=1 ConfigFrequency=60 PidFile=/run/zabbix/zabbix_proxy.pid SocketDir=/run/zabbix SNMPTrapperFile=/var/log/snmptrap/snmptrap.log FpingLocation=/usr/bin/fping Fping6Location=/usr/bin/fping6 SSHKeyLocation=/etc/zabbix/.ssh Timeout=30 DBHost=localhost DBName=zabbix DBUser=zabbix StartPreprocessors=40 StartDiscoverers=5 StartTrappers=10 StartDBSyncers=16 UnreachableDelay=45 UnavailableDelay=180 HistoryCacheSize=2G DBSocket=/run/mysqld/mysqld.sock All running on 6.2.3: [email protected]:~# dpkg -l | grep -i zabbix ii zabbix-agent2 1:6.2.3-1+debian11 amd64 Zabbix network monitoring solution - agent ii zabbix-get 1:6.2.3-1+debian11 amd64 Zabbix network monitoring solution - get ii zabbix-proxy-mysql 1:6.2.3-1+debian11 amd64 Zabbix network monitoring solution - proxy (MySQL) ii zabbix-release 1:6.2-2+debian11 all Zabbix official repository configuration ii zabbix-sql-scripts 1:6.2.3-1+debian11 all Zabbix network monitoring solution - sql-scripts
Our MySQL backend configuration: port=3306 admin_address=127.0.0.1 admin_port=33062 create_admin_listener_thread=OFF log_output=file slow_query_log=OFF long_query_time=5 log_slow_rate_limit=100 log_slow_rate_type=query log_slow_verbosity=full log_slow_admin_statements=ON log_slow_slave_statements=ON slow_query_log_always_write_time=1 slow_query_log_use_global_control=all performance_schema=OFF innodb_monitor_enable=all userstat=1 skip_name_resolve=1 server_id=2887597872 log_slave_updates gtid_mode=ON enforce_gtid_consistency=ON binlog_expire_logs_seconds=7200 default_authentication_plugin=mysql_native_password skip_log_bin innodb_buffer_pool_instances=8 innodb_buffer_pool_size=1536M innodb_log_file_size=192M key_buffer_size=0 thread_cache_size=15
Some screenshots attached:
Please let me know if you need any other information.
Kind regards |
Comment by Jeffrey Descan [ 2022 Oct 13 ] |
Hi Victor Do you please have an update so far? Jeffrey |
Comment by Victor Breda Credidio [ 2022 Oct 26 ] |
Hi Jeffrey! Really sorry for the late response. On my lab environment everything is working accordingly, so I imagine that this could perhaps be related to your infrastructure. Even though my environment is really small compared to yours, all the Zabbix components are working as expected. Best regards, |