-
Problem report
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
6.0.10, 6.2.4, 6.4.0beta3
-
None
We found that an active agent returns wrong value - 45 instead of 4348391.
Here is a debug capture for trapper processes on zabbix server side:
319660:20221026:110909.805 trapper got '{"request":"agent data","session":"935a94f2c514ecc8a16846c035fa7007","data":[ {"host":"somename.example.com","key":"log.count[E:\\SiteScope\\logs\\RunMonitor.log,\"(?i)timeout|timed out\",,,skip]","value":"0","lastlogsize":469578,"id":16523,"clock":1666796944,"ns":735725500}, {"host":"somename.example.com","key":"vfs.fs.size[C:,pfree]","value":"71.231177","id":16524,"clock":1666796944,"ns":736378500}, {"host":"somename.example.com","key":"vfs.fs.size[D:,pfree]","value":"99.826274","id":16525,"clock":1666796944,"ns":736779600}, {"host":"somename.example.com","key":"vfs.fs.size[E:,pfree]","value":"96.167602","id":16526,"clock":1666796944,"ns":737179400}, {"host":"somename.example.com","key":"vfs.file.size[E:\\some{HOST.HOST}_SIS_Config.log]","value":"45","id":16527,"clock":1666796945,"ns":752000000}],"clock":1666796949,"ns":806334100}'
We see that returned for the key was "value":"45"
a minute later we see how agent is updating list of active checks, where is our key, this is fine:
319660:20221026:111008.715 trapper got '{"request":"active checks","host":"somename.example.com","host_metadata":"OS:Windows;Apps:SIS43264;Type:Primary"}' 319660:20221026:111008.715 In get_hostid_by_host() host:'somename.example.com' metadata:'OS:Windows;Apps:SIS43264;Type:Primary' 319660:20221026:111008.715 query [txnlev:0] [select h.hostid,h.status,h.tls_accept,h.tls_issuer,h.tls_subject,h.tls_psk_identity,a.host_metadata,a.listen_ip,a.listen_dns,a.listen_port,a.flags from hosts h left join autoreg_host a on a.proxy_hostid is null and a.host=h.host where h.host='somename.example.com' and h.status in (0,1) and h.flags<>2 and h.proxy_hostid is null] 319660:20221026:111008.719 End substitute_simple_macros_impl() data:'E:\some\somename.example.com_SIS_Config.log' 319660:20221026:111008.719 End of substitute_key_macros_impl():SUCCEED data:'vfs.file.size[E:\some\somename.example.com_SIS_Config.log]' 319660:20221026:111008.720 send_list_of_active_checks_json() sending [{"response":"success","data":[ {"key":"system.uptime","delay":1800,"lastlogsize":0,"mtime":0}, {"key":"agent.version","delay":1800,"lastlogsize":0,"mtime":0},{"key":"vfs.fs.discovery","delay":3600,"lastlogsize":0,"mtime":0}, ... cut 6 log.count items ... {"key":"vfs.file.regexp[E:\\some.log,\"Total monitors: ([\\d]+)\",,,,\\1]","delay":200,"lastlogsize":0,"mtime":0}, {"key":"vfs.file.size[E:\\some\\somename.example.com_SIS_Config.log]","key_orig":"vfs.file.size[E:\\some{HOST.HOST}_SIS_Config.log]","delay":600,"lastlogsize":0,"mtime":0}, {"key":"service.discovery","delay":300,"lastlogsize":0,"mtime":0}, {"key":"vfs.fs.size[C:,pfree]","delay":1800,"lastlogsize":0,"mtime":0}, {"key":"vfs.fs.size[D:,pfree]","delay":1800,"lastlogsize":0,"mtime":0}, {"key":"vfs.fs.size[E:,pfree]","delay":1800,"lastlogsize":0,"mtime":0}, {"key":"agent.ping","delay":200,"lastlogsize":0,"mtime":0}, {"key":"service.info[SiteScope,state]","delay":200,"lastlogsize":0,"mtime":0} ]}]
In the same time, when testing the key locally using passive checks way, returned value is correct:
C:\Zabbix\bin>zabbix_agentd.exe -t vfs.file.size[E:\some\somename.example.com_SIS_Config.log] vfs.file.size[E:\some\somename.example.com_SIS_Config.log] [u|4348391] C:\Zabbix\bin>zabbix_get.exe -s 127.0.0.1 -k vfs.file.size[E:\some\somename.example.com_SIS_Config.log] 4348391
Agent version:
C:\Zabbix\bin>zabbix_agentd.exe -V zabbix_agentd Win64 (service) (Zabbix) 6.0.0 Revision 5203d2ea7d 14 February 2022, compilation time: Feb 14 2022 12:57:08
After we restarted zabbix agent, the active check item started to return correct value - 4348391.
I could not find related fixes in 6.0 branch, so this issue looks fresh one to me.
It happens sometimes/rarely and on a few different Windows servers.
It's not possible to predict when it happens again.