Type: Problem report
Affects Version/s: 4.0.13
Component/s: Agent (G)
Sprint:Sprint 58 (Nov 2019), Sprint 59 (Dec 2019), Sprint 60 (Jan 2020), Sprint 61 (Feb 2020), Sprint 62 (Mar 2020)
Random repeated log reads detected by Zabbix agent on Windows platforms having a log from a Java based custom application. Makes log items unreliable.
Windows 10 tablets.
NTFS file system
Zabbix Agent 4.0.3
Log written by a Java application.
The server and agent keep the trace of a monitored log's size and last modification time (for logrt) in two counters. Additionally:
The agent also internally uses inode numbers (on UNIX/GNU/Linux), file indexes (on Microsoft Windows) and MD5 sums of the first 512 log file bytes for improving decisions when logfiles get truncated and rotated.
On UNIX/GNU/Linux systems it is assumed that the file systems where log files are stored report inode numbers, which can be used to track files.
On Microsoft Windows Zabbix agent determines the file system type the log files reside on and uses:
On NTFS file systems 64-bit file indexes.
File size and modification times captured by the same agent:
Steps to reproduce:
Not known. (yet)
As the log is not rotated and the timestamps not modified as far as the monitoring is detecting, no repeated log read should happen.
Although the log file in question is not a candidate for logrt item (file name remains the same after rotation, timestamp is appended to the rotated file afterwards), here are the item keys that completely eliminates the repeated reads in the given example:
Apparently copytruncate option adds additional layer of verification if the log file is new or not, resulting in much better handling. Expected outcome of this task would be the same level of detection in normal log items too. And additionally an explanation (if known) what could possibly be improved in the Java applications to eliminate this behavior.
See the attached screenshot which illustrates the number of hourly repeated log reads across 88 devices before with log items and after logrt - copytruncate workaround was introduced.