[ZBX-21703] Zabbix Agent2 is no longer retrieving Windows perfmon counters after a period of time Created: 2022 Sep 28 Updated: 2025 Mar 03 |
|
Status: | Reopened |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 6.2.3 |
Fix Version/s: | 6.0.28rc1, 6.4.13rc1, 7.0.0beta2, 7.0 (plan) |
Type: | Problem report | Priority: | Minor |
Reporter: | Stijn De Doncker | Assignee: | Michael Veksler |
Resolution: | Unresolved | Votes: | 50 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Windows Server 2019 Standard with Version 10.0.17663. |
Attachments: |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Team: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sprint: | Sprint 103 (Aug 2023), Sprint 104 (Sep 2023), Sprint 105 (Oct 2023), Sprint 106 (Nov 2023), Sprint 107 (Dec 2023), S2401, S24-W6/7, S24-W8/9, S24-W10/11 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Story Points: | 1 |
Description |
Comments |
Comment by Stijn De Doncker [ 2022 Sep 28 ] |
Running Zabbix server version: v6.2.3 Running Zabbix proxy version: v6.2.3 |
Comment by Aigars Kadikis [ 2022 Oct 07 ] |
Thank you for the investment so far. 1) When the agent stops reporting, try to configure this item key:
wmi.getall[root\cimv2,"select * from win32_perfformatteddata_perfdisk_physicaldisk"]
Set the output as text, and to not fulfil the DB with text, install update interval 1h or more. Does the item working? 2) Your comment:
This is quite a bold statement. Mind to generate some statistics on which versions of Windows the performance counter still works in the long run? Use this item key to generate the build number of Windows: wmi.get[root\CIMV2,SELECT Version FROM Win32_OperatingSystem] 3) Another item to generate a spectrum is total unsupported items on a host object. This would allow to understand how many minutes, hours or days the agent worked perfectly fine after the boot-up. Maybe there is a correlation, I do not know. Make sure the windows host is having an item: zabbix[host,,items_unsupported] and attach a trigger when the number is drastically changing. When the amount of unsupported items gets big (we have a precise timestamp when it happened), we need to have a look at the agent log and calculate the time between service boot-up and the increase for unsupported items. |
Comment by Jeffrey Descan [ 2022 Oct 11 ] |
Thanks for your response, Aigars. The items were created this weekend as you requested. We’ve encountered the issue and have timestamps to share. Attached you can find the Agent Log in DebugLevel 5 for the host, referenced here as, db021. For this host we have the best logs in terms of times. The problem triggered at this timestamp. I’ve created 2 triggers: when the change of this item >= 10 and when unsupported items >= 15. When the perf_counter_en is going unsupported we’re seeing still that the WMI data is coming in properly, see attachment ‘zbx21703_db021_troubleshooting_items’. This is the graph of the unsupported items (this is from 10/10/2022) for host 'db021': see attachment ‘zbx21703_db021_unsupported_items’. At 12:42 - 12:43 the unsupported items raised a little and around 13:09 it jumped even further, then we then saw all the `The system cannot find message text for message number 0x%1 in the message file for %2.` errors and items were unsupported. We also see that at 12:42 some items keep returning ‘No data to return’, then timeouts on fetching data and resulting in `The system cannot find message text for message number 0x%1 in the message file for %2.` at the end. Slightly before we see that Perflib is throwing an issue: see attachment ‘zbx21703_db021_perflib_eventviewer.png’. Some other hosts have this issue as well: see attachment ‘zbx21703_all_hosts_problems.png’. Another host with some logs that I could fetch is ‘mta069’. This is the graph of unsupported items (this is from 11/10/2022): see attachment ‘zbx21703_mta069_unsupported’. Attached:
|
Comment by Stijn De Doncker [ 2022 Oct 20 ] |
Hi Aigars, Do you please have an update so far? Stijn |
Comment by Bartosz Mickiewicz (Inactive) [ 2022 Oct 21 ] |
Hi, 2022/10/11 07:40:46.039445 Detected performance counter with negative denominator the second time after retry, giving up... 2022/10/11 07:40:46.039445 [Cpu] cannot obtain CPU#1 utilization counter value: A counter with a negative denominator value was detected. 2022/10/11 07:40:46.039445 Detected performance counter with negative denominator, retrying in 1 second 2022/10/11 07:40:47.051021 Detected performance counter with negative denominator the second time after retry, giving up... 2022/10/11 07:40:47.051271 [Cpu] cannot obtain CPU#2 utilization counter value: A counter with a negative denominator value was detected. 2022/10/11 07:40:47.051336 Detected performance counter with negative denominator, retrying in 1 second What can be done, after Agent stops polling data, can you execute on the server following command: odctr /q It will list all the performance providers, just look for disabled ones. Best regards, |
Comment by Jeffrey Descan [ 2022 Oct 26 ] |
Hey Bartosz We've performed some tests in the meantime as you've requested. At the time the agent is starting to report The system cannot find message text for message number 0x%1 in the message file for %2, and executing commands on a new Powershell session on the server it is still working. However, Zabbix seem to fail to do so in the current process he is currently in with Performance Counters (it looks like that), an example: [email protected]:~# zabbix_get -s x.x.x.x -p 20050 --tls-connect psk --tls-psk-identity psk-agent --tls-psk-file /tmp/psk -k 'perf_counter_en["\System\Threads"]' ZBX_NOTSUPPORTED: The system cannot find message text for message number 0x%1 in the message file for %2. Either the above error is thrown or a timeout: [email protected]:~# zabbix_get -s x.x.x.x -p 20050 --tls-connect psk --tls-psk-identity psk-agent --tls-psk-file /tmp/psk -k 'perf_counter_en["\System\Threads"]'
Now executed on the host itself, that is giving the error above, at the same moment: PS C:\Program Files\Zabbix Agent 2> Get-Counter -Counter "\System\Threads" Timestamp CounterSamples --------- -------------- 26/10/2022 8:08:00 \\<HOSTNAME>\system\threads : 10758 PS C:\Program Files\Zabbix Agent 2> .\zabbix_agent2.exe -t perf_counter_en["\System\Threads"] perf_counter_en[\System\Threads] [s|10790.000000]
To me it looks like nothing is wrong with the server itself, I see 2 takeaways from this:
Due to the following takeaways I do not think involving Microsoft is needed, as the normal way of obtaining Performance Counters are still working at the moment of issue.
On some servers where we are seeing this issue, we do see that the [{{{}BITS] Performance Counters{}}} are disabled. Unfortunately, we do see this on other servers as well that are not encountering the issue at the moment. PS C:\Program Files\Zabbix Agent 2> lodctr /q [BITS] Performance Counters (Disabled) DLL Name: C:\Windows\System32\bitsperf.dll Open Procedure: PerfMon_Open Collect Procedure: PerfMon_Collect Close Procedure: PerfMon_Close First Counter ID: 0x00002018 (8216) Last Counter ID: 0x00002028 (8232) First Help ID: 0x00002019 (8217) Last Help ID: 0x00002029 (8233)
So the 'resolution' to get the counters back working is restarting the Zabbix Agent2. When that is completed all data collection of performance counters start working again, but output of lodctr counter remains in the exact same state for days (even after restarting the BITS service). It does not look related, at this point to me. Please let us know if we can perform more tests for this issue.
Kind regards Jeffrey |
Comment by Aigars Kadikis [ 2022 Oct 27 ] |
Jeffrey, thank you for update. 1) Data collection of plugin has been described (code-wise) in: There is some error checking and validation build in. I have noticed that our current error message "system cannot find message text for message number 0x%1 in the message file for %2" (or parts of it) are not included in the code. This makes me think that this is a message comes from windows component. This is not big conclusion, just wanted to make this statement. 2) Agent2 is plugin based. Each plugin has its own characteristics. Configure in zabbix_agent2.conf: StatusPort=1025 After that, we can access plugin stats over http://127.0.0.1:1025/status. There is a section which generates output like:
[WindowsPerfMon]
active: true
capacity: 0/100
tasks: 22
Please enable "StatusPort=1025", restart agent2 and apply agent2-generate-WindowsPerfMon-stats.xml |
Comment by Jeffrey Descan [ 2022 Oct 28 ] |
Hey Aigars Thanks for the update. We've deployed the config on cases where we see the problem most of the time. The template is attached in our debug template. We will report our findings in case we see the issue again, with the data of the Perfmon statistics template.
I've seen that the template & your link is Zabbix release 5.0, just flagging that everything is running on 6.2 over here.
Kind regards |
Comment by Jeffrey Descan [ 2022 Nov 03 ] |
Hey Aigars We have some data to share, some data is better than another, and some remarks. 1) I was unable to leave the item type to 'Zabbix Agent (active)', I've switched it to Passive.
2) We have 2 cases after applying/linking the template that gives some data in terms of the status template you've provided us:
Some screenshots are provided with the data. Hope this provides any extra insight. Please let me know if we can provide you any more data. Kind regards |
Comment by Jeffrey Descan [ 2022 Nov 03 ] |
Hey Aigars I got another host, with new screenshots attached. Kind regards |
Comment by Stijn De Doncker [ 2022 Nov 09 ] |
Hi Aigars, Do you please have an update so far? Stijn |
Comment by Stijn De Doncker [ 2022 Nov 16 ] |
Hi Aigars, Any update on this? Stijn |
Comment by Stijn De Doncker [ 2022 Nov 25 ] |
Hi Aigars, Any update on this? Stijn |
Comment by Aigars Kadikis [ 2023 Jan 24 ] |
Hi Stijn, https://support.zabbix.com/browse/ZBX-20356 is related to this issue since it contains similar lines as in this issue: 2022/11/07 16:41:22.588500 plugin 'WindowsPerfMon' collector failed: No data to return. The public release comes out at January 31. If the issue (agent completely stops performance counter data collection) happens once a week, start testing now: zabbix_agent2-6.0.13rc1-windows-amd64-openssl-static.zip It's fine to use 6.0.x agent together with 6.2.x server. Fingers crossed! |
Comment by Aigars Kadikis [ 2023 Mar 14 ] |
Kindly reopen the case if encounter issues under 6.0.14 or 6.2.8 agent2 environment. Issues must be fixed because of improvements in https://support.zabbix.com/browse/ZBX-20356. |
Comment by Marcel Walter [ 2023 Mar 16 ] |
Hello, we just switched to Zabbix Agent 2 6.2.8 and directly hit this issue. Here's the log: 2023/03/16 11:21:36.026779 Starting Zabbix Agent 2 (6.2.8) 2023/03/16 11:21:36.028635 OpenSSL library (OpenSSL 3.0.8 7 Feb 2023) initialized 2023/03/16 11:21:36.028635 using configuration file: C:\Program Files\Zabbix Agent 2\zabbix_agent2.conf 2023/03/16 11:21:36.028635 using plugin 'Agent' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.028635 using plugin 'Ceph' (built-in) providing following interfaces: exporter, runner, configurator 2023/03/16 11:21:36.028635 using plugin 'Cpu' (built-in) providing following interfaces: exporter, collector, runner 2023/03/16 11:21:36.028635 using plugin 'DNS' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.034723 using plugin 'File' (built-in) providing following interfaces: exporter, configurator 2023/03/16 11:21:36.034723 using plugin 'Log' (built-in) providing following interfaces: exporter, configurator 2023/03/16 11:21:36.034723 using plugin 'MQTT' (built-in) providing following interfaces: watcher, configurator 2023/03/16 11:21:36.034723 using plugin 'Memcached' (built-in) providing following interfaces: exporter, runner, configurator 2023/03/16 11:21:36.034723 using plugin 'Memory' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.034723 using plugin 'Modbus' (built-in) providing following interfaces: exporter, configurator 2023/03/16 11:21:36.034723 using plugin 'Mysql' (built-in) providing following interfaces: exporter, runner, configurator 2023/03/16 11:21:36.034723 using plugin 'NetIf' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.034723 using plugin 'Oracle' (built-in) providing following interfaces: exporter, runner, configurator 2023/03/16 11:21:36.034723 using plugin 'Proc' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.034723 using plugin 'Redis' (built-in) providing following interfaces: exporter, runner, configurator 2023/03/16 11:21:36.034723 using plugin 'Registry' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.034723 using plugin 'Smart' (built-in) providing following interfaces: exporter, configurator 2023/03/16 11:21:36.035234 using plugin 'Swap' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.035234 using plugin 'SystemRun' (built-in) providing following interfaces: exporter, configurator 2023/03/16 11:21:36.035234 using plugin 'TCP' (built-in) providing following interfaces: exporter, configurator 2023/03/16 11:21:36.035234 using plugin 'UDP' (built-in) providing following interfaces: exporter, configurator 2023/03/16 11:21:36.035234 using plugin 'Uname' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.035234 using plugin 'Uptime' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.035234 using plugin 'Users' (built-in) providing following interfaces: exporter, configurator 2023/03/16 11:21:36.035234 using plugin 'VFSDir' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.035234 using plugin 'VMemory' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.035234 using plugin 'VfsFs' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.035234 using plugin 'WebCertificate' (built-in) providing following interfaces: exporter, configurator 2023/03/16 11:21:36.035234 using plugin 'WebPage' (built-in) providing following interfaces: exporter, configurator 2023/03/16 11:21:36.035234 using plugin 'WindowsEventlog' (built-in) providing following interfaces: exporter, configurator 2023/03/16 11:21:36.035234 lowering the plugin WindowsPerfInstance capacity to 1 as the configured capacity 100 exceeds limits 2023/03/16 11:21:36.035234 using plugin 'WindowsPerfInstance' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.035234 using plugin 'WindowsPerfMon' (built-in) providing following interfaces: exporter, collector, runner 2023/03/16 11:21:36.035234 using plugin 'WindowsServices' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.035234 using plugin 'Wmi' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.035234 using plugin 'ZabbixAsync' (built-in) providing following interfaces: exporter 2023/03/16 11:21:36.035234 using plugin 'ZabbixStats' (built-in) providing following interfaces: exporter, configurator 2023/03/16 11:21:36.035234 lowering the plugin ZabbixSync capacity to 1 as the configured capacity 100 exceeds limits 2023/03/16 11:21:36.035234 using plugin 'ZabbixSync' (built-in) providing following interfaces: exporter 2023/03/16 11:21:38.505445 Plugin communication protocol version is 6.2.7 2023/03/16 11:21:39.006682 Zabbix Agent2 hostname: [APEX-ONE] 2023/03/16 11:21:41.052658 [101] no active checks on server [zabbix-02.klinikum-os.net:10051]: host [APEX-ONE] not found 2023/03/16 11:23:41.629285 [VFSDir] failed to walk dir with path C:\Program Files\Zabbix Agent\zabbix_agentd.d\ 2023/03/16 11:27:00.606302 plugin 'Cpu' collector failed: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:28:01.594230 plugin 'Cpu' collector failed: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:29:02.600942 plugin 'Cpu' collector failed: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:30:03.600690 plugin 'Cpu' collector failed: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:31:04.588239 plugin 'Cpu' collector failed: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:32:05.604979 plugin 'Cpu' collector failed: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:33:06.604315 plugin 'Cpu' collector failed: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:35:59.631189 check 'system.uptime' is not supported: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:37:07.609609 [WindowsPerfInstance] Cannot refresh object cache: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:37:29.603536 check 'system.uptime' is not supported: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:38:07.618157 check 'perf_instance_en.discovery[PhysicalDisk]' is not supported: Cannot find object: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:38:59.613817 check 'system.uptime' is not supported: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:40:29.601948 check 'system.uptime' is not supported: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:41:59.598221 check 'system.uptime' is not supported: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:42:04.648234 plugin 'WindowsPerfMon' collector failed: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:43:09.689268 check 'perf_counter_en["\PhysicalDisk(0 C:)\Avg. Disk sec/Write",60]' is not supported: The specified object was not found on the computer. 2023/03/16 11:43:29.605604 check 'system.uptime' is not supported: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:44:59.603562 check 'system.uptime' is not supported: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:46:29.610021 check 'system.uptime' is not supported: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:47:59.594059 check 'system.uptime' is not supported: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:49:10.605796 plugin 'Cpu' collector failed: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:49:29.618093 check 'system.uptime' is not supported: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:50:11.599105 plugin 'Cpu' collector failed: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:50:59.613331 check 'system.uptime' is not supported: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:51:12.612933 plugin 'Cpu' collector failed: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:52:13.604102 plugin 'Cpu' collector failed: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:52:14.726156 check 'perf_counter_en["\Memory\Page Faults/sec"]' is not supported: Unable to connect to the specified computer or the computer is offline. 2023/03/16 11:53:14.741981 check 'perf_counter_en["\Paging file(_Total)\% Usage"]' is not supported: Unable to connect to the specified computer or the computer is offline. 2023/03/16 11:54:14.757310 check 'perf_counter_en["\Memory\Cache Bytes"]' is not supported: Unable to connect to the specified computer or the computer is offline. 2023/03/16 11:54:14.757310 check 'system.uptime' is not supported: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:55:14.761158 check 'perf_counter_en["\System\Context Switches/sec"]' is not supported: Unable to connect to the specified computer or the computer is offline. 2023/03/16 11:55:29.617985 check 'system.uptime' is not supported: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:56:14.769694 check 'perf_counter_en["\Processor Information(_total)\% DPC Time"]' is not supported: Unable to connect to the specified computer or the computer is offline. 2023/03/16 11:56:59.608942 check 'system.uptime' is not supported: The system cannot find message text for message number 0x%1 in the message file for %2. 2023/03/16 11:57:14.777343 check 'perf_counter_en["\Processor Information(_total)\% Interrupt Time"]' is not supported: Unable to connect to the specified computer or the computer is offline. |
Comment by Jeffrey Descan [ 2023 Apr 12 ] |
@Aigars Kadikis we can confirm we still have this issue and the issue is not resolved. Extra info that we see now: Proxies now get more load at the unreachable poller level, when this behaviour starts. We have a workaround to restart services every day, but this is not comfortable, nor acceptable. |
Comment by Volodymyr Markovsky [ 2023 Apr 22 ] |
I'm seeing the same issue on agent version 6.2.6. |
Comment by Aigars Kadikis [ 2023 Apr 24 ] |
vmarkovsky, Please use the supported version of agent2. 6.0.16 or 6.4.1. 6.0 LTS release is compatible with 6.2 branch. To all: For the server which is crashing the most, please install zabbix_agentd in parallel and apply "Windows by Zabbix agent active" template. My idea is that "C agent" is doing something extra (too much) that is performing some sort of performance counter cache reload. jeffdesc Could you try to run two Zabbix agents (zabbix_agent2 and zabbix_agentd) on the same windows server? Use a windows server where the performance counter registry gets corrupted most frequently. With "zabbix_agentd" we can set: # turn off passive checks so it does not conflict with port 10050 StartAgents=0 # set unique hostname Hostname=AnotherNameWhichDoesNotConflict Install a secondary Zabbix agent in parallel: "c:\zabbix\zabbix_agentd.exe" -c "c:\zabbix\zabbix_agentd.conf" --install net start "zabbix agent" Create a new host with name "AnotherNameWhichDoesNotConflict" and apply "Windows by Zabbix agent active" Is the Zabbix agent 2 performance counters are still crushing while Zabbix agent 1 runs in parallel? |
Comment by Robin Roevens [ 2023 Jun 16 ] |
This morning I upgraded our Windows Zabbix Agents (classic C version) from 6.2.9 to 6.4.3. And I started seeing exactly the same problem as described here. Only here with Zabbix Agent and not Agent2 (All Windows clients run Agent 1 here). Many, close to all hosts start complaining about Agent Active checks no longer working after some time after (re)start of agent. Exactly like described here. Logging shows, in this case, about 3 hours after (re)start of agent: 10888:20230616:132357.874 agent #0 started [main process] 8936:20230616:132357.875 agent #1 started [collector] 10080:20230616:132357.876 agent #2 started [listener #1] 8900:20230616:132357.881 agent #3 started [listener #2] 11524:20230616:132357.882 agent #4 started [listener #3] 8904:20230616:132357.883 agent #5 started [active checks #1] 9224:20230616:132357.884 agent #6 started [active checks #2] 9224:20230616:162359.525 active check "perf_instance_en.discovery[PhysicalDisk]" is not supported: 9224:20230616:162801.309 active check "perf_counter_en["\Memory\Pages/sec"]" is not supported: Cannot obtain performance information from collector. 9224:20230616:162801.313 active check "system.cpu.util" is not supported: Performance counter is not ready. Zabbix server no longer receives any active check from those agents..(most of our checks are set to active)
Passive checks still do work, except for performance counter checks, those get a timeout:
> zabbix_get -s windowshost -k 'agent.ping' 1 $ zabbix_get -s windowshost -k 'perf_counter_en["\System\Threads"]' zabbix_get [111705]: Timeout while executing operation While on the host itself, they still work:
> Get-Counter -Counter "\System\Threads" Timestamp CounterSamples --------- -------------- 16/06/2023 17:10:54 \\windowshost\system\threads : 2001 After restart of Zabbix Agent, active checks, and performance counters start to work again for some time.. Most servers having this problem have Windows Server 2019 Datacenter 17763.1.amd64fre.rs5_release.180914-1434 Build 17763.XXXXX installed. However I also see some machines running Windows Server 2016 Datacenter 14393.5921.amd64fre.rs1_release.230504-1649 Build 14393.XXXX being affected.
|
Comment by Robin Roevens [ 2023 Jun 19 ] |
I have now downgraded Zabbix agent 6.4.3 back to 6.2.9 without any machine rebooting. All performance checking problem went away and agents are again happily monitoring the Windows performance counters. So for now I will have to stay on the unsupported 6.2.9 as the supported 6.4.3 agents seem to contain a major problem in this area. I think this bug priority should be set to major instead of minor? As it seems to affect both agent 1 and 2 and effectively breaks the agent on many Windows machines causing the inability to monitor those machines when the agent acts up. |
Comment by Onkel Titus [ 2023 Jun 25 ] |
I agree. It's a big problem. With the 6.4.3 Agent2 shortly after the installation the Performance Counters don't work anymore and became unsupported. |
Comment by Vladislavs Sokurenko [ 2023 Jul 05 ] |
Does it help if perf_instance_en.discovery is not used, is it caused when some specific checks are added ? |
Comment by Stijn De Doncker [ 2023 Jul 05 ] |
@Vladislavs Sokurenko,
We see this behavior with all checks based on Windows performance counters. Not only OS performance counters but also e.g. IIS performance counters, see also |
Comment by Vladislavs Sokurenko [ 2023 Jul 05 ] |
Yes, the question is if some checks affect others, that's why interested to know if it happens only with specific configuration |
Comment by Lasse Osterild [ 2023 Jul 19 ] |
I see a similar issue, it only happens with "system.cpu.util" and it's only on VM's with more than 2 vCPUs. I have a single VM with 2 vCPU and it's never happend on that one. Tested with Agent 2 v6.4.2, v6.4.3 and v6.4.4 and the standard Windows template. Agents are in passive mode, active is disabled. 1350870:20230719:073157.161 item "serverA.acme.tld:system.cpu.util" became not supported: No data available. 2012R2 Datacenter 6.3.9600 20 x vCPU 2016 Datacenter 10.0.14393 16 x vCPU 2016 Datacenter 10.0.14393 8 x vCPU As others have reported, restarting the agent helps for a short while so perhaps after X number of Y it fails to insert into a map or slice, or it's something to do with timing or timeouts, though I wouldn't expect an agent restart to help with that. I've set DebugLevel=5 on one of the agents, will report back once I have something. /Lasse |
Comment by Vladislavs Sokurenko [ 2023 Aug 08 ] |
Could you please be so kind and see if disabling perf_instance_en.discovery and perf_instance.discovery items helps, there is strong suspicion that there is a problem with that item. |
Comment by Stijn De Doncker [ 2023 Aug 25 ] |
@Vladislavs Sokurenko, We disabled the perf_instance_en.discovery a week ago. The number of first network errors has been halved and keeps stable so far. Most |
Comment by Vladislavs Sokurenko [ 2023 Aug 25 ] |
Thank you for confirming this, we should perform perf_instance_en.discovery under mutex on Zabbix agent 2 and see if it helps as this windows library call might not be thread safe. |
Comment by Vladislavs Sokurenko [ 2023 Aug 25 ] |
Regarding IIS perf counters ( Please see: |
Comment by Vladislavs Sokurenko [ 2023 Aug 28 ] |
For the record, what kind of error have been experienced with perf_instance_en.discovery enabled ? |
Comment by Michael Veksler [ 2023 Aug 30 ] |
Hi stijndd, Be so kind to test zabbix_agent2-x64-v64-dbg1-reopen-query.7z If you confirm that this approach works, we will apply these changes. |
Comment by Robin Roevens [ 2023 Aug 30 ] |
Please don't overlook the fact that this problem is also seen in classic Zabbix Agent 6.4 as reported in comment-805074 |
Comment by Michael Veksler [ 2023 Aug 31 ] |
Hi robinr , The problem with win perfCounters and classic Zabbix Agent we are now solving in Be so kind to test Agent from comment |
Comment by Jeffrey Descan [ 2023 Aug 31 ] |
Hi @Michael Veksler I am a member of Stijn's team and jump in on this. Could you please confirm on what Zabbix Agent2 version this debug agent is build upon? How stable is this agent? |
Comment by Michael Veksler [ 2023 Aug 31 ] |
Hi jeffdesc, The agent zabbix_agent2-x64-v64-dbg1-reopen-query.7z The version of agent stable. |
Comment by Jeffrey Descan [ 2023 Aug 31 ] |
Thanks Michael. We're unfortunately still on 6.2.4 on Zabbix Server, will that give any issues when installing a 6.4 agent? |
Comment by Michael Veksler [ 2023 Aug 31 ] |
There were no changes in the protocol from 6.2 to 6.4. If you install 6.4 there shouldn't be any issue. (only all fixes |
Comment by Jeffrey Descan [ 2023 Sep 05 ] |
@Michael Veksler I've installed it on a couple of hosts that are problematic. I'll follow-up the hosts and will report within a week. This behavior needs to build up from time to time. |
Comment by Jeffrey Descan [ 2023 Sep 13 ] |
Hi @Michael Veksler As promised I was going to update you with our findings. So it does seem that for some hosts this fix was an improvement, where for others it did not have any impact at all. In 'Screenshot 2023-09-13 at 16.04.47' you can see the decrease and increase again in the proxy logs. The filter was the following: ((message: "perf_counter*" or message: "perf_counter_en*") and message: "first network error") To make it more tangible I have made an overview of our hosts and the amount of errors it was throwing us. Please verify 'zbx21703_hosts_error_sept.png'. Due to the nature of our customers and the content in the logs, I cannot upload them here in plain text, of what hosts do you want to receive some logs (if any) together with the proxy logs? I can provide them to you through email or any other medium. Looking forward to your response. Jeffrey |
Comment by Michael Veksler [ 2023 Sep 14 ] |
Hi jeffdesc, |
Comment by Michael Veksler [ 2023 Sep 15 ] |
Hi jeffdesc.
/I will prepare a new dev build next week/ |
Comment by Vladislavs Sokurenko [ 2023 Sep 18 ] |
Unreachable error could be due to |
Comment by Michael Veksler [ 2023 Sep 18 ] |
Hi jeffdesc, Be so kind to test the agent zabbix_agent2-x64-v64-dbg2-reopen-query_timeout-impr.7z The version of agent stable. |
Comment by Michael Veksler [ 2023 Oct 02 ] |
Hi stijndd, |
Comment by Michael Veksler [ 2023 Oct 05 ] |
Hi stijndd, Be so kind as to send me the proxy log (the agent log was not enough). |
Comment by Michael Veksler [ 2023 Oct 25 ] |
Hi stijndd , Be so kind to test the agent zabbix_agent2-x64-v64-dbg3-mutex-split.7z The version of agent stable and include all recent improvements. The main idea of this build is check hypotheses about unpredictable long work of perfCounter collector. We divided the 'lock' into two parts - first lock to collect data and second lock to return collected data. |
Comment by Michael Veksler [ 2023 Nov 13 ] |
Thanks for good news. |
Comment by Michael Veksler [ 2023 Nov 15 ] |
Hi stijndd, Please send me the agent2 log from web146.*.cloud host |
Comment by Michael Veksler [ 2023 Nov 15 ] |
Hi stijndd , Is node web146 checks via passive check ? and what timeout is set ? |
Comment by Michael Veksler [ 2023 Nov 23 ] |
Hi stijndd, Be so kind to test the agent zabbix_agent2-x64-v64-dbg4-global_mutex_remove.7z The main idea of this build is to remove all global locks from Export(). Eliminated all potential hangups when returning data to server. |
Comment by Michael Veksler [ 2023 Dec 11 ] |
Hi stijndd, Be so kind to test the agent zabbix_agent2-x64-v64-dbg5-Errorlogs.7z The version of agent stable and without functional changes, except for additional logs. The main idea of this build is to test the hypotheses that approximately 1 time per hour we really lost connectivity with agent and proxy telling the truth We have added logs that will be output to zabbix_agent2.log if execution time become more than 1 sec. After the test, be so kind as to send me proxy and agent logs. |
Comment by Nechaev Aleksey [ 2023 Dec 27 ] |
Hi @Michael Veksler Similar problem IIS collector Zabbix Server: 6.4.10 Used file zabbix_agent2-x64-v64-dbg5-Errorlogs.7z |
Comment by Michael Veksler [ 2024 Jan 10 ] |
Be so kind to test the agent zabbix_agent2-x64-v64-dbg6-removePdhPath.7z The agent version is stable, but is for debugging purposes only (lots of debugging code). We found a piece of code that works with pdh without the lock. We refactored and move this logic under the lock. If you see that the behavior has not changed, disable all "Windows: CPU*" items and repeat the test again. P.S. thanks for the latest log files |
Comment by Nechaev Aleksey [ 2024 Jan 11 ] |
Hi, Michael Veksler Replaced zabbix agent. After receiving errors again disable Windows: CPU and restart agent |
Comment by Nechaev Aleksey [ 2024 Jan 24 ] |
Hi Michael Veksler, Any update on this? Nechaev Aleksey |
Comment by Michael Veksler [ 2024 Jan 24 ] |
Hi @Nechaev Aleksey, We are still preparing to test the new version, but I have an additional questions:
|
Comment by Nechaev Aleksey [ 2024 Jan 24 ] |
Hi Michael Veksler, 1. We have many servers with Windows. Approximately in the range 180 - 320 counters 2. All intervals are standard for templates IIS by Zabbix agent and Windows by Zabbix agent or active 3. Default timeout for Zabbix Agent 2 = 3 seconds |
Comment by Michael Veksler [ 2024 Jan 31 ] |
Hi @Stijn De Doncker and @Nechaev Aleksey, Long story short... More detailed information: As a workaround for the proxy error: "first network error, wait for 45 seconds", set the agent timeout = 28 seconds. This will help server receive agent response with item error 'No data available' (which is true in principle P.S. most of improvements tested during this R&D will be applied to production |
Comment by Vladislavs Sokurenko [ 2024 Feb 08 ] |
We need to allow bigger capacity then, currently is is just 100, if we allow 500 then it should cover this scenario |
Comment by Mickael Martin [ 2024 Feb 08 ] |
Hi, same issue on 6.4.4 agent in passive mode, around 30 items with perfcounter and "perf_instance_en.discovery[PhysicalDisk]" LLD. If you want tests or logs, do not hesitate. |
Comment by Vladislavs Sokurenko [ 2024 Feb 08 ] |
Please try increasing StartAgents to 30 mma |
Comment by Mickael Martin [ 2024 Feb 08 ] |
I've got this message : C:\Program Files\Zabbix Agent 2>zabbix_agent2.exe -c zabbix_agent2.conf -f zabbix_agent2 [4024]: ERROR: Cannot assign configuration: invalid parameter StartAgents at line 520: unknown parameter We are in passive mode onlly. |
Comment by Alexey Kochmarskiy [ 2024 Feb 23 ] |
We are expiriencing the same problem. In our case we are monitoring Hyper-V VMs. Each Hyper-V host hosts ~50 VMs. Each VM has ~15 perf counters (CPU, RAM, DISK, NET) (passive, 60s interval) and 3 perf_instance_en.discovery(HV VP CPU,NET,DISK) (10m interval). So, in total each Hyper-V host ends up with ~750 perf_counters with 60s interval (~12.5 per second) and 150 perf_instance_en.discovery with 10m interval (0.25 per second) We started with classic Zabbix agent. It simply bottlenecked on TCP connections and stopped responding at all. We moved to Agent2(6.4.11) and it performed well with TCP connections but started bottlenecking inside on perf_instance_en.discovery since Plugins.WindowsPerfInstance.System.Capacity is limited to 1. So, I increased perf_instance_en.discovery interval to 20m and all perf_counter intervals to 2m. It helps, ureachable poller data collector holds around 40% busy. We still see "The system cannot find message text for message number 0x%1 in the message file for %2." on some systems in agent2 log. So, we try to either increase interval on perf_counter items and/or move some of them to active mode. After all, as I see it. It is clearly a bottleneck with both Agent2 perf_counter and perf_instance_en.discovery. It should allow much more requests in parallel but Plugins.WindowsPerfMon.System.Capacity is hard limited to 100. Yes, you can say that system responds slow, but slow is fine. Also we should be able to increase timeouts throughout all agent-server communication pipeline. Also I wan't but I can't use active checks with LLD Host Prototypes. Host prototype host name will never be able to match host name in zabbix_agent2.conf because it must be unique on Zabbix Server but we cannot specify multiple Hostname or HostnameItem in zabbix_agent2.conf. So it creates a bottleneck on server side around poller processes because I have to use only passive checks. Also I don't think that our use case is some extreme situation, 50 VMs on one Hyper-V host isn't that much.
P.S. This is not a minor prority. |
Comment by Andrejs Kozlovs [ 2024 Mar 04 ] |
Available in:
|
Comment by Michael Veksler [ 2024 Mar 05 ] |
Hi @Alexey Kochmarskiy, The maximum capacity limit will be increased in WindowsPerfInstance capacity will remain 1. Additional questions: - which item are you using perf_instance_en.discovery or perf_instance.discovery - how long does *discovery call last for each set of counters ? |
Comment by Alexey Kochmarskiy [ 2024 Mar 05 ] |
Hi @Michael Veksler, I'm using solely perf_instance_en.discovery. If it is somehow different from perf_instance.discovery in context of perf counters retrieval performance - I can test it.
"how long does *discovery call last for each set of counters ?" In zabbix web interface from clicking "Test" on discovery item to showing result it doesn't take longer then 4 seconds regardless of counter type. "Hyper-V Hypervisor Virtual Processor" returns a 100-200 of instances and "LogicalDisk" returns 3-4. Timing seems the same. |
Comment by zeta12 [ 2024 May 14 ] |
Agent updated to 6.0.28. The problem recurred |
Comment by Stefan Sieber [ 2024 May 14 ] |
I can confirm this with Zabbix Agent 2 6.4.14, also seeing problems with performance counters. And occasionally stops the agent. |
Comment by Mickael Martin [ 2024 Jun 19 ] |
Same issue with 6.4.15 today ;-/ 2024/06/18 09:22:36.316058 Starting Zabbix Agent 2 (6.4.15) 2024/06/18 09:22:36.319297 OpenSSL library (OpenSSL 3.0.11 19 Sep 2023) initialized 2024/06/18 09:22:36.319297 using configuration file: C:\Program Files\Zabbix Agent 2\zabbix_agent2.conf 2024/06/18 09:22:36.319633 using plugin 'Agent' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.319633 using plugin 'Ceph' (built-in) providing following interfaces: exporter, runner, configurator 2024/06/18 09:22:36.319633 using plugin 'Cpu' (built-in) providing following interfaces: exporter, collector, runner 2024/06/18 09:22:36.320148 using plugin 'DNS' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322186 using plugin 'File' (built-in) providing following interfaces: exporter, configurator 2024/06/18 09:22:36.322186 using plugin 'Log' (built-in) providing following interfaces: exporter, configurator 2024/06/18 09:22:36.322186 using plugin 'MQTT' (built-in) providing following interfaces: watcher, configurator 2024/06/18 09:22:36.322186 using plugin 'Memcached' (built-in) providing following interfaces: exporter, runner, configurator 2024/06/18 09:22:36.322186 using plugin 'Memory' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322186 using plugin 'Modbus' (built-in) providing following interfaces: exporter, configurator 2024/06/18 09:22:36.322697 using plugin 'Mysql' (built-in) providing following interfaces: exporter, runner, configurator 2024/06/18 09:22:36.322697 using plugin 'NetIf' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322697 using plugin 'Oracle' (built-in) providing following interfaces: exporter, runner, configurator 2024/06/18 09:22:36.322697 using plugin 'Proc' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322697 using plugin 'Redis' (built-in) providing following interfaces: exporter, runner, configurator 2024/06/18 09:22:36.322697 using plugin 'Registry' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322697 using plugin 'Smart' (built-in) providing following interfaces: exporter, configurator 2024/06/18 09:22:36.322697 using plugin 'Sw' (built-in) providing following interfaces: exporter, configurator 2024/06/18 09:22:36.322697 using plugin 'Swap' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322697 using plugin 'SystemRun' (built-in) providing following interfaces: exporter, configurator 2024/06/18 09:22:36.322697 using plugin 'TCP' (built-in) providing following interfaces: exporter, configurator 2024/06/18 09:22:36.322697 using plugin 'UDP' (built-in) providing following interfaces: exporter, configurator 2024/06/18 09:22:36.322697 using plugin 'Uname' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322697 using plugin 'Uptime' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322697 using plugin 'Users' (built-in) providing following interfaces: exporter, configurator 2024/06/18 09:22:36.322697 using plugin 'VFSDir' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322697 using plugin 'VMemory' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322697 using plugin 'VfsFs' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322697 using plugin 'WebCertificate' (built-in) providing following interfaces: exporter, configurator 2024/06/18 09:22:36.322697 using plugin 'WebPage' (built-in) providing following interfaces: exporter, configurator 2024/06/18 09:22:36.322697 using plugin 'WindowsEventlog' (built-in) providing following interfaces: exporter, configurator 2024/06/18 09:22:36.322697 lowering the plugin WindowsPerfInstance capacity to 1 as the configured capacity 100 exceeds limits 2024/06/18 09:22:36.322697 using plugin 'WindowsPerfInstance' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322697 using plugin 'WindowsPerfMon' (built-in) providing following interfaces: exporter, collector, runner 2024/06/18 09:22:36.322697 using plugin 'WindowsServices' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322697 using plugin 'Wmi' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322697 using plugin 'ZabbixAsync' (built-in) providing following interfaces: exporter 2024/06/18 09:22:36.322697 using plugin 'ZabbixStats' (built-in) providing following interfaces: exporter, configurator 2024/06/18 09:22:36.322697 lowering the plugin ZabbixSync capacity to 1 as the configured capacity 100 exceeds limits 2024/06/18 09:22:36.322697 using plugin 'ZabbixSync' (built-in) providing following interfaces: exporter 2024/06/18 09:22:37.651625 Plugin communication protocol version is 6.4.0 2024/06/18 09:22:37.651625 Zabbix Agent2 hostname: [******************] 2024/06/18 09:27:02.028065 plugin 'Cpu' collector failed: The system cannot find message text for message number 0x%1 in the message file for %2. 2024/06/18 09:27:02.028065 plugin 'Cpu': time spent in collector task 60.008753 s exceeds collecting interval 1 s 2024/06/18 09:28:02.044708 [WindowsPerfMon] error while removing counter '\PhysicalDisk(0 C:)\Current Disk Queue Length': The system cannot find message text for message number 0x%1 in the message file for %2. 2024/06/18 09:28:03.037778 plugin 'Cpu' collector failed: The system cannot find message text for message number 0x%1 in the message file for %2. |
Comment by Michael Veksler [ 2024 Aug 13 ] |
Hello everyone, Please help me with testing the new zabbix_agent2.exe from ZBX-24672 |
Comment by Michael Veksler [ 2024 Aug 14 ] |
Hi stijndd, |
Comment by Tomass Janis Bross [ 2024 Sep 18 ] |
As temporary workaround, an automatic service restart on failure can be implemented on Windows level. 1. Open run, type "services.msc" |
Comment by Marcel Renner [ 2024 Sep 18 ] |
tbross: We have the same problem. I don't think your solution will work because the Zabbix Agent service doesn't crash instead continues to run. Currently, we run a local PowerShell script on each machine every 2 hours that reads the last lines of the log and restarts the agent if certain strings are in the log (like the error in this ticket). Whether they all have something to do with this error here, I have no idea. But these were the strings that were often logged when the agent no longer worked.
|
Comment by Michael Veksler [ 2024 Sep 18 ] |
stijndd, please clarify witch errors are still present ?
I assume that "first network error". If so, what is the value of Plugins.Stat.System.Capacity variable from conf |
Comment by Michael Veksler [ 2024 Sep 20 ] |
Hi stijndd, Seams to me I found the reason of the test failure for you. There was redundant lock for protection against "perf_instance_en.discovery". I implemented RWlock for "perf_instance_en.discovery" and Rlock for other pdh calls. This is means that "sometimes" perf_instance_en.discovery can fail with "first network error" because long gathering of all perCounters values can block executions of perf_instance_en.discovery Anyway, be so kind as to test zabbix_agent2-x64-v64-dev2.7z from
Taikocuya, please test the dev2 build of Agent2 from |
Comment by Michael Veksler [ 2024 Sep 20 ] |
Hi stijndd, additional thought.
|
Comment by Michael Veksler [ 2024 Oct 07 ] |
Hi stijndd , Do you have some news ? |
Comment by Michael Veksler [ 2024 Oct 08 ] |
Hi stijndd, Thank for info about dev2 build. |
Comment by Michael Veksler [ 2024 Oct 09 ] |
Hi stijndd, Thanks for your help. We will finalize and merge current changes and return to the "first network error" problem in 25. |
Comment by Cesar Inacio Martins [ 2024 Nov 14 ] |
Hi, here we have the same issue. Update our agent from 6.4.10 to 6.4.19 and the problem still occurring. With debuglog = 4 , I don't see any useful information to help... Is there a way to gather more details to try to identify the problem? Here, from 86 hosts, 9 are in this situation. What is weird is sometimes, very rarely they collect one single data... 2024/11/14 14:43:40.864139 plugin WindowsPerfMon: executing collector task 2024/11/14 14:43:40.902473 direct exporter task expired for key 'perf_counter_en["\PhysicalDisk(0 C:)\% Idle Time",60]' error: 'No data available.' 2024/11/14 14:43:40.902795 executing direct exporter task for key 'perf_counter_en["\PhysicalDisk(0 C:)\Disk Writes/sec",60]' 2024/11/14 14:43:40.902795 executed direct exporter task for key 'perf_counter_en["\PhysicalDisk(0 C:)\Disk Writes/sec",60]' 2024/11/14 14:43:40.902795 executing direct exporter task for key 'perf_counter_en["\PhysicalDisk(0 C:)\Avg. Disk sec/Write",60]' 2024/11/14 14:43:40.903324 executed direct exporter task for key 'perf_counter_en["\PhysicalDisk(0 C:)\Avg. Disk sec/Write",60]' 2024/11/14 14:43:40.903365 sending passive check response: ZBX_NOTSUPPORTED: 'No data available.' to '101.60.0.5' 2024/11/14 14:43:41.859369 plugin Cpu: executing collector task 2024/11/14 14:43:41.859369 plugin WindowsPerfMon: executing collector task 2024/11/14 14:43:41.902729 direct exporter task expired for key 'perf_counter_en["\PhysicalDisk(0 C:)\Disk Writes/sec",60]' error: 'No data available.' 2024/11/14 14:43:41.902729 sending passive check response: ZBX_NOTSUPPORTED: 'No data available.' to '101.60.0.5' 2024/11/14 14:43:41.903299 direct exporter task expired for key 'perf_counter_en["\PhysicalDisk(0 C:)\Avg. Disk sec/Write",60]' error: 'No data available.' 2024/11/14 14:43:41.903299 sending passive check response: ZBX_NOTSUPPORTED: 'No data available.' to '101.60.0.5' 2024/11/14 14:43:42.858563 plugin Cpu: executing collector task 2024/11/14 14:43:42.858563 plugin WindowsPerfMon: executing collector task 2024/11/14 14:43:42.970830 connection established using TLSv1.2 PSK-AES128-CBC-SHA |
Comment by Michael Veksler [ 2024 Nov 15 ] |
Hi @Cesar Inacio Martins, Please test zabbix_agent2-x64-v64-dev2.7z from |
Comment by Cesar Inacio Martins [ 2024 Nov 18 ] |
Hi MVekslers , Just did, changed from 6.4.19 to 6.4.18rc1 (version you referred) . The agent still not collecting. I didn't check the agent log with debuglevel, if needed, please let me know. Only for illustration, this was the last data they were able to collect |
Comment by Michael Veksler [ 2024 Nov 20 ] |
Hi @Cesar Inacio Martins, I think the root of the issue is the same as described here: "agent2 v64 will internally rerun tasks that return an empty result until the timeout expires, and this behavior can lead to 'first network error'" As a result, can you check the problematic hosts via agent v70 (from my dev zabbix_agent2-x64-v70-dev2.7z or official download) |
Comment by Cesar Inacio Martins [ 2024 Nov 21 ] |
Hi MVekslers , Tried and had no luck. We used the official download. Restarted twice, at first time, ran for few minutes and no collect at all. I have the log, I don't see any error or warning here. Is there something special to look? I can share the hole log to you in private since it contains sensitive data.
|
Comment by Michael Veksler [ 2024 Dec 12 ] |
Hi ceinmart, Please test the dev3 build from |
Comment by daniel safro [ 2024 Dec 30 ] |
hey, we have the same issue at our prod. over 100+ windows assets monitored with Zabbix-agent2. agents are 7.0.6. restarting the agent fixes issue - will test and update whether odctr /q does anything or agent restart is mandatory. see logs attached: |
Comment by Michael Veksler [ 2025 Jan 17 ] |
New changes discussed here have been merged into 6.0.38rc1 and 7.0.9rc1 ( |