[ZBX-12620] Windows Agent becomes unavailable from time to time Created: 2017 Aug 28  Updated: 2017 Aug 30  Resolved: 2017 Aug 30

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 3.2.7, 3.4.0
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: Alex P Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: agent
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Server:
zabbix-server-pgsql 1:3.4.1-1+jessie amd64

Agent:
Starting Zabbix Agent ... Zabbix 3.4.0 (revision 71462) on MS Windows Server 2012 R2 Std


Attachments: PNG File zabbix_agent_win.png     PNG File zabbix_agent_win_2.png     PNG File zabbix_agent_win_3.png     PNG File zabbix_agent_win_4.png     PNG File zabbix_agent_win_5_trigger setup.png     PNG File zabbix_agent_win_6_item setup.png     PDF File zabbix_agent_win_7_latest data.pdf     PDF File zabbix_agent_win_8_agent_ping-500.pdf     PNG File zabbix_agent_win_8_agent_ping-graph.png    
Issue Links:
Duplicate
duplicates ZBX-12626 Poller thread stops working after scr... Closed

 Description   

from time to time getting the subject trigger on our Windows Servers (see attached). Nothing specific in the logs.

The servers are alive at those moments, no networks issues and data is keep collecting from them.

Please advise.



 Comments   
Comment by Vladislavs Sokurenko [ 2017 Aug 29 ]

This looks like a support request. For available options please see http://zabbix.org/wiki/Getting_help.

No indication of a bug. Closing as Won't Fix.

Comment by Alex P [ 2017 Aug 29 ]

it is NOT a support request. It IS a bug.
Please do not close but rather investigate.

Comment by Vladislavs Sokurenko [ 2017 Aug 29 ]

is it version 3.4.1 ? could it be ZBX-12549 ?

Comment by Vladislavs Sokurenko [ 2017 Aug 29 ]

It looks like agent is unreachable due to network issues.

Comment by Alex P [ 2017 Aug 29 ]

Server - 3.4.1
Agent - 3.4.0

Comment by Alex P [ 2017 Aug 29 ]

it looks like but it is NOT.. I can reach the server at that time..
Moreover, the subject host is marked as "unreachable" for 4 days already. If I go to the "Latest data" for that host, there is data collected from it, i.e. Zabbix Server perfectly collects the information from it.
vso it's possible that server cannot ping agent, but agent still sends the data in active mode, it's normal.

Comment by Alex P [ 2017 Aug 29 ]

again, the ping is completely fine, the servers are in the same room and VLAN; no network issues.

> ping wpcaldc02
PING wpcaldc02.wpninc.com (10.200.60.12) 56(84) bytes of data.
64 bytes from wpcaldc02.wpninc.com (10.200.60.12): icmp_seq=1 ttl=128 time=0.254 ms
64 bytes from wpcaldc02.wpninc.com (10.200.60.12): icmp_seq=2 ttl=128 time=0.266 ms
64 bytes from wpcaldc02.wpninc.com (10.200.60.12): icmp_seq=3 ttl=128 time=0.278 ms

The server load is zero (see attached).

Comment by Alex P [ 2017 Aug 29 ]

and no, it is NOT normal..

Comment by Vladislavs Sokurenko [ 2017 Aug 29 ]

items that are collected, are they active items or only agen.ping passive item fails, while other are also passive ?

Comment by Rostislav Palivoda [ 2017 Aug 29 ]

Please provide steps to stable reproduce or ask support from community or subscribe for Zabbix support service.

Comment by Alex P [ 2017 Aug 29 ]

All items are passive.

Comment by Alex P [ 2017 Aug 29 ]

@palivoda, will provide steps to reproduce. No need to close the ticket without the investigation.

Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 29 ]

Could you please provide the following information?

  • Configuration of the trigger. Is it a regular trigger from official Template App Zabbix Agent?
  • Configuration of items involved. agent.ping in case of official template.
  • Latest Data for these items.
Comment by Alex P [ 2017 Aug 30 ]

glebs.ivanovskis, the host is defined with a mix of official "Template OS Windows" that linked to the "Template App Zabbix Agent" and a few third party templates.
All requested screenshots are attached. The host latest data is attached in PDF format.

Comment by Alex P [ 2017 Aug 30 ]

Agent configuration files

zabbix_agentd.conf:


LogFile=C:\srv\ZabbixAgent\log\zabbix_agentd.log
LogFileSize=5

### Option: DebugLevel
#	0 - basic information about starting and stopping of Zabbix processes
#	1 - critical information
#	2 - error information
#	3 - warnings
#	4 - for debugging (produces lots of information)
# DebugLevel=3

EnableRemoteCommands=0
LogRemoteCommands=1
UnsafeUserParameters=0

Server=zabbix...
ServerActive=
ListenPort=10050
# ListenIP=0.0.0.0

StartAgents=1

Hostname=wpcaldc02

BufferSend=60
BufferSize=200
Timeout=20


Include=C:\srv\ZabbixAgent\conf\conf.d\

conf.d/zbx-win-envmon.conf:

UserParameter=system.discovery[*],%systemroot%\system32\cscript.exe /nologo /T:30 C:\srv\ZabbixAgent\scripts\zabbix_win_system_discovery.vbs $1 
UserParameter=quota[*],%systemroot%\system32\cscript.exe /nologo /T:30 C:\srv\ZabbixAgent\scripts\zabbix_win_quota.vbs $1 $2 
UserParameter=server.domain,%systemroot%\system32\cscript.exe /nologo /T:30 C:\srv\ZabbixAgent\scripts\zabbix_user_domain.vbs 
UserParameter=server.roles,%systemroot%\system32\cscript.exe /nologo /T:30 C:\srv\ZabbixAgent\scripts\zabbix_server_role.vbs 
UserParameter=server.serial,%systemroot%\system32\cscript.exe /nologo /T:30 C:\srv\ZabbixAgent\scripts\zabbix_server_serialnumber.vbs

# add to the Task Manager, interval 12-24 hours
# C:\srv\ZabbixAgent\scripts\zabbix_wus_update_all.bat
# C:\srv\ZabbixAgent\scripts\zabbix_wus_update_crit.bat
UserParameter=wu.all,type C:\srv\ZabbixAgent\log\zabbix_wus_update_all.log
UserParameter=wu.crit,type C:\srv\ZabbixAgent\log\zabbix_wus_update_crit.log

conf.d/zbx-win-hwinfo.conf:

# wmic csproduct get vendor,name,identifyingnumber
UserParameter=system.hw.product-name,for /f "usebackq tokens=* skip=1" %a in (`WMIC csproduct get name`) do @echo:%a
UserParameter=system.hw.vendor,for /f "usebackq tokens=* skip=1" %a in (`WMIC csproduct get vendor`) do @echo:%a
UserParameter=system.hw.serial,for /f "usebackq tokens=* skip=1" %a in (`WMIC csproduct get identifyingnumber`) do @echo:%a

conf.d/zbx-win-swinfo.conf:

#UserParameter=vbs.softwareinventoryList[*],cscript /nologo C:\srv\ZabbixAgent\scripts\zbx-softwareinventory.vbs
UserParameter=system.sw.packages,%systemroot%\system32\windowspowershell\v1.0\powershell.exe -nologo C:\srv\ZabbixAgent\scripts\zbx-softwareinventory.ps1
Comment by Vladislavs Sokurenko [ 2017 Aug 30 ]

Is it real parameter ? Or you actually have something else there ?
Server=zabbix...

Comment by Alex P [ 2017 Aug 30 ]

vso, it is something else of course

Comment by Alex P [ 2017 Aug 30 ]

here is an example for another Windows Server: another domain controller, so all the configuration and templates are the same as above.

Logs on the Zabbix server:

2017-08-29T19:46:09-06:00 wpmon02 zabbix_server[10818]: Zabbix agent item "net.if.in[WAN Miniport (Network Monitor)-Kaspersky Lab NDIS 6 Filter-0000]" on host "wpdc01" failed: first network error, wait for 15 seconds
2017-08-29T19:46:24-06:00 wpmon02 zabbix_server[10871]: Zabbix agent item "perf_counter["\DNS\Database Node Memory",300]" on host "wpdc01" failed: another network error, wait for 15 seconds
2017-08-29T19:46:59-06:00 wpmon02 zabbix_server[10915]: Zabbix agent item "vfs.fs.size[E:,free]" on host "wpdc01" failed: another network error, wait for 15 seconds
2017-08-29T19:47:17-06:00 wpmon02 zabbix_server[10862]: Zabbix agent item "net.tcp.listen[636]" on host "wpdc01" failed: another network error, wait for 15 seconds
2017-08-29T19:47:35-06:00 wpmon02 zabbix_server[10871]: Zabbix agent item "service_state[DHCPServer]" on host "wpdc01" failed: another network error, wait for 15 seconds
2017-08-29T19:47:53-06:00 wpmon02 zabbix_server[10867]: Zabbix agent item "service_state[LanmanWorkstation]" on host "wpdc01" failed: another network error, wait for 15 seconds
2017-08-29T19:48:08-06:00 wpmon02 zabbix_server[10869]: Zabbix agent item "service_state[WINS]" on host "wpdc01" failed: another network error, wait for 15 seconds
2017-08-29T19:48:26-06:00 wpmon02 zabbix_server[10895]: Zabbix agent item "service_state[AppHostSvc]" on host "wpdc01" failed: another network error, wait for 15 seconds
2017-08-29T19:48:43-06:00 wpmon02 zabbix_server[10871]: Zabbix agent item "service_state[Winmgmt]" on host "wpdc01" failed: another network error, wait for 15 seconds
2017-08-29T19:48:58-06:00 wpmon02 zabbix_server[10873]: Zabbix agent item "service_state[Winmgmt]" on host "wpdc01" failed: another network error, wait for 15 seconds
2017-08-29T19:49:13-06:00 wpmon02 zabbix_server[10913]: temporarily disabling Zabbix agent checks on host "wpdc01": host unavailable
2017-08-29T19:50:28-06:00 wpmon02 zabbix_server[10897]: enabling Zabbix agent checks on host "wpdc01": host became available
2017-08-30T04:09:53-06:00 wpmon02 zabbix_server[10772]: item "wpdc01:system.sw.packages" became not supported: Timeout while executing a shell script.
2017-08-30T04:19:52-06:00 wpmon02 zabbix_server[10773]: item "wpdc01:system.sw.packages" became supported

We rebooted dc01 last night around 19:46-19:48, so the first lines are fine. Then according to the logs, the connection got restored and data started to be collected (which is true). But according to alerts on the dashboard, it says:

2017-08-29 19:52:00				PROBLEM		wpdc01	Zabbix agent on wpdc01 is unreachable for 10 minutes	  15h 17m 27s

vso does increase of StartAgents help ? Please increase to 3 and restart agent, then see if issue persists.

Comment by Alex P [ 2017 Aug 30 ]

vso, no, but the server restart did.

Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 30 ]

Dear alexpr, thank you for exhaustive information! To me this starts to look very much like ZBX-12251. vso, how can we prove or disprove it? grepping server's log by "ignoring query"?

Can you show "500 latest values" for agent.ping? What are other Zabbix agent (passive!) checks on this host?

vso it looks more and more as ZBX-12549, or some strange variation due to StartAgents being 1 it looks almost as if it cannot get to some items, as if there is no queue.

Comment by Alex P [ 2017 Aug 30 ]

glebs.ivanovskis, the latest data for agent.ping is attached, both in graph and numerical views.

Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 30 ]

Thanks! There is indeed a gap from August 24 to August 30. You say:

Server:
zabbix-server-pgsql 1:3.4.1-1+jessie amd64

But 3.4.1 was released on Monday, August 28. Have you restarted server after upgrade?

Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 30 ]

Closing as Duplicate of ZBX-12549.

Comment by Alex P [ 2017 Aug 30 ]

yes, I did restart it after the upgrade.

Generated at Sun May 11 07:50:48 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.