[ZBX-12620] Windows Agent becomes unavailable from time to time Created: 2017 Aug 28 Updated: 2017 Aug 30 Resolved: 2017 Aug 30 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 3.2.7, 3.4.0 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Major |
Reporter: | Alex P | Assignee: | Unassigned |
Resolution: | Duplicate | Votes: | 0 |
Labels: | agent | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Server: Agent: |
Attachments: |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
||||||||
Issue Links: |
|
Description |
from time to time getting the subject trigger on our Windows Servers (see attached). Nothing specific in the logs. The servers are alive at those moments, no networks issues and data is keep collecting from them. Please advise. |
Comments |
Comment by Vladislavs Sokurenko [ 2017 Aug 29 ] |
This looks like a support request. For available options please see http://zabbix.org/wiki/Getting_help. No indication of a bug. Closing as Won't Fix. |
Comment by Alex P [ 2017 Aug 29 ] |
it is NOT a support request. It IS a bug. |
Comment by Vladislavs Sokurenko [ 2017 Aug 29 ] |
is it version 3.4.1 ? could it be |
Comment by Vladislavs Sokurenko [ 2017 Aug 29 ] |
It looks like agent is unreachable due to network issues. |
Comment by Alex P [ 2017 Aug 29 ] |
Server - 3.4.1 |
Comment by Alex P [ 2017 Aug 29 ] |
it looks like but it is NOT.. I can reach the server at that time.. |
Comment by Alex P [ 2017 Aug 29 ] |
again, the ping is completely fine, the servers are in the same room and VLAN; no network issues. > ping wpcaldc02 PING wpcaldc02.wpninc.com (10.200.60.12) 56(84) bytes of data. 64 bytes from wpcaldc02.wpninc.com (10.200.60.12): icmp_seq=1 ttl=128 time=0.254 ms 64 bytes from wpcaldc02.wpninc.com (10.200.60.12): icmp_seq=2 ttl=128 time=0.266 ms 64 bytes from wpcaldc02.wpninc.com (10.200.60.12): icmp_seq=3 ttl=128 time=0.278 ms The server load is zero (see attached). |
Comment by Alex P [ 2017 Aug 29 ] |
and no, it is NOT normal.. |
Comment by Vladislavs Sokurenko [ 2017 Aug 29 ] |
items that are collected, are they active items or only agen.ping passive item fails, while other are also passive ? |
Comment by Rostislav Palivoda [ 2017 Aug 29 ] |
Please provide steps to stable reproduce or ask support from community or subscribe for Zabbix support service. |
Comment by Alex P [ 2017 Aug 29 ] |
All items are passive. |
Comment by Alex P [ 2017 Aug 29 ] |
@palivoda, will provide steps to reproduce. No need to close the ticket without the investigation. |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 29 ] |
Could you please provide the following information?
|
Comment by Alex P [ 2017 Aug 30 ] |
glebs.ivanovskis, the host is defined with a mix of official "Template OS Windows" that linked to the "Template App Zabbix Agent" and a few third party templates. |
Comment by Alex P [ 2017 Aug 30 ] |
Agent configuration files zabbix_agentd.conf:
LogFile=C:\srv\ZabbixAgent\log\zabbix_agentd.log
LogFileSize=5
### Option: DebugLevel
# 0 - basic information about starting and stopping of Zabbix processes
# 1 - critical information
# 2 - error information
# 3 - warnings
# 4 - for debugging (produces lots of information)
# DebugLevel=3
EnableRemoteCommands=0
LogRemoteCommands=1
UnsafeUserParameters=0
Server=zabbix...
ServerActive=
ListenPort=10050
# ListenIP=0.0.0.0
StartAgents=1
Hostname=wpcaldc02
BufferSend=60
BufferSize=200
Timeout=20
Include=C:\srv\ZabbixAgent\conf\conf.d\
conf.d/zbx-win-envmon.conf: UserParameter=system.discovery[*],%systemroot%\system32\cscript.exe /nologo /T:30 C:\srv\ZabbixAgent\scripts\zabbix_win_system_discovery.vbs $1 UserParameter=quota[*],%systemroot%\system32\cscript.exe /nologo /T:30 C:\srv\ZabbixAgent\scripts\zabbix_win_quota.vbs $1 $2 UserParameter=server.domain,%systemroot%\system32\cscript.exe /nologo /T:30 C:\srv\ZabbixAgent\scripts\zabbix_user_domain.vbs UserParameter=server.roles,%systemroot%\system32\cscript.exe /nologo /T:30 C:\srv\ZabbixAgent\scripts\zabbix_server_role.vbs UserParameter=server.serial,%systemroot%\system32\cscript.exe /nologo /T:30 C:\srv\ZabbixAgent\scripts\zabbix_server_serialnumber.vbs # add to the Task Manager, interval 12-24 hours # C:\srv\ZabbixAgent\scripts\zabbix_wus_update_all.bat # C:\srv\ZabbixAgent\scripts\zabbix_wus_update_crit.bat UserParameter=wu.all,type C:\srv\ZabbixAgent\log\zabbix_wus_update_all.log UserParameter=wu.crit,type C:\srv\ZabbixAgent\log\zabbix_wus_update_crit.log conf.d/zbx-win-hwinfo.conf: # wmic csproduct get vendor,name,identifyingnumber UserParameter=system.hw.product-name,for /f "usebackq tokens=* skip=1" %a in (`WMIC csproduct get name`) do @echo:%a UserParameter=system.hw.vendor,for /f "usebackq tokens=* skip=1" %a in (`WMIC csproduct get vendor`) do @echo:%a UserParameter=system.hw.serial,for /f "usebackq tokens=* skip=1" %a in (`WMIC csproduct get identifyingnumber`) do @echo:%a conf.d/zbx-win-swinfo.conf: #UserParameter=vbs.softwareinventoryList[*],cscript /nologo C:\srv\ZabbixAgent\scripts\zbx-softwareinventory.vbs UserParameter=system.sw.packages,%systemroot%\system32\windowspowershell\v1.0\powershell.exe -nologo C:\srv\ZabbixAgent\scripts\zbx-softwareinventory.ps1 |
Comment by Vladislavs Sokurenko [ 2017 Aug 30 ] |
Is it real parameter ? Or you actually have something else there ? |
Comment by Alex P [ 2017 Aug 30 ] |
vso, it is something else of course |
Comment by Alex P [ 2017 Aug 30 ] |
here is an example for another Windows Server: another domain controller, so all the configuration and templates are the same as above. Logs on the Zabbix server: 2017-08-29T19:46:09-06:00 wpmon02 zabbix_server[10818]: Zabbix agent item "net.if.in[WAN Miniport (Network Monitor)-Kaspersky Lab NDIS 6 Filter-0000]" on host "wpdc01" failed: first network error, wait for 15 seconds 2017-08-29T19:46:24-06:00 wpmon02 zabbix_server[10871]: Zabbix agent item "perf_counter["\DNS\Database Node Memory",300]" on host "wpdc01" failed: another network error, wait for 15 seconds 2017-08-29T19:46:59-06:00 wpmon02 zabbix_server[10915]: Zabbix agent item "vfs.fs.size[E:,free]" on host "wpdc01" failed: another network error, wait for 15 seconds 2017-08-29T19:47:17-06:00 wpmon02 zabbix_server[10862]: Zabbix agent item "net.tcp.listen[636]" on host "wpdc01" failed: another network error, wait for 15 seconds 2017-08-29T19:47:35-06:00 wpmon02 zabbix_server[10871]: Zabbix agent item "service_state[DHCPServer]" on host "wpdc01" failed: another network error, wait for 15 seconds 2017-08-29T19:47:53-06:00 wpmon02 zabbix_server[10867]: Zabbix agent item "service_state[LanmanWorkstation]" on host "wpdc01" failed: another network error, wait for 15 seconds 2017-08-29T19:48:08-06:00 wpmon02 zabbix_server[10869]: Zabbix agent item "service_state[WINS]" on host "wpdc01" failed: another network error, wait for 15 seconds 2017-08-29T19:48:26-06:00 wpmon02 zabbix_server[10895]: Zabbix agent item "service_state[AppHostSvc]" on host "wpdc01" failed: another network error, wait for 15 seconds 2017-08-29T19:48:43-06:00 wpmon02 zabbix_server[10871]: Zabbix agent item "service_state[Winmgmt]" on host "wpdc01" failed: another network error, wait for 15 seconds 2017-08-29T19:48:58-06:00 wpmon02 zabbix_server[10873]: Zabbix agent item "service_state[Winmgmt]" on host "wpdc01" failed: another network error, wait for 15 seconds 2017-08-29T19:49:13-06:00 wpmon02 zabbix_server[10913]: temporarily disabling Zabbix agent checks on host "wpdc01": host unavailable 2017-08-29T19:50:28-06:00 wpmon02 zabbix_server[10897]: enabling Zabbix agent checks on host "wpdc01": host became available 2017-08-30T04:09:53-06:00 wpmon02 zabbix_server[10772]: item "wpdc01:system.sw.packages" became not supported: Timeout while executing a shell script. 2017-08-30T04:19:52-06:00 wpmon02 zabbix_server[10773]: item "wpdc01:system.sw.packages" became supported We rebooted dc01 last night around 19:46-19:48, so the first lines are fine. Then according to the logs, the connection got restored and data started to be collected (which is true). But according to alerts on the dashboard, it says:
2017-08-29 19:52:00 PROBLEM wpdc01 Zabbix agent on wpdc01 is unreachable for 10 minutes 15h 17m 27s
vso does increase of StartAgents help ? Please increase to 3 and restart agent, then see if issue persists. |
Comment by Alex P [ 2017 Aug 30 ] |
vso, no, but the server restart did. |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 30 ] |
Dear alexpr, thank you for exhaustive information! To me this starts to look very much like Can you show "500 latest values" for agent.ping? What are other Zabbix agent (passive!) checks on this host? vso it looks more and more as |
Comment by Alex P [ 2017 Aug 30 ] |
glebs.ivanovskis, the latest data for agent.ping is attached, both in graph and numerical views. |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 30 ] |
Thanks! There is indeed a gap from August 24 to August 30. You say:
But 3.4.1 was released on Monday, August 28. Have you restarted server after upgrade? |
Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 30 ] |
Closing as Duplicate of |
Comment by Alex P [ 2017 Aug 30 ] |
yes, I did restart it after the upgrade. |