Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-24097

logrt() fails on idle log file - item set to error state

XMLWordPrintable

    • Icon: Problem report Problem report
    • Resolution: Unresolved
    • Icon: Trivial Trivial
    • None
    • 6.0.26
    • None
    • None

      Opening this one as a follow-up to ZBX-24037 where our initial assessment of what happens was incorrect.

      Steps to reproduce:

          Create an item with function set to logrt(). This implies an active check. Set it to match some text in the log. Set check time to something reasonable like 5 min.
          Rotate the target log file so that the newly created log file is empty.
          Add some text to the log file (does not matter if there will be a match on the text or not).
          Let the check run once. Observe that the item is is "supported" state.
          Wait for 3-5 more consecutive checks by Zabbix Agent without adding data to the log file.
          Observe the item getting into "error" state with the message: "Item is set to error: Cannot obtain directory information: [2] No such file or directory"
          Touch the log file.
          Observe that the one the next check the item becomes "supported" again (until the next few check, when everything repeats itself).

      Result:

      Item is set to error: "Cannot obtain directory information: [2] No such file or directory" after the log file being idle for some time.

      Expected:

      No matter how long the file stays idle, it should not go into an error state.

      Further:

      We can now clarify that the observed misbehavior - a logrt() item getting into error state with the message "Cannot obtain directory information: [2] No such file or directory" - happens not because of the log file being of zero size after rotation, but because of the log file being idle for more than 15-20 minutes (i.e. log file timestamp unchanged for more than few consecutive checks by Zabbix Agent). The item returns to normal state if the log file timestamp changes to current time (e.g., via touch), but after the log stays idle for some more time, it again falls into error state with the same message. This is consistent with logs that are busy (updated once every minute or so), which never get into the error state. Note that it is the timestamp of the file that matters and not whether there are any newly added lines or whether such lines match the configured regular expression for log content or not.

      The Zabbix item is as simple as it can be (we only want the current log to be processed and not any pre-rotated ones; we're perfectly happy to get the complete line and need no filtering on it; everything is expected to be ASCII; we don't want maxlines as the logged data volume is low etc.):

      logrt[/var/log/nginx/test.example.com/php-fpm-error.log,^.*Fatal error.*$,,,skip,,,,]

      Few more notes:

      • We run this on Agent version 2. The host OS is Linux (RHEL, although that should not matter). SELinux is in permissive mode. 
      • The items are created based on LLD from a template that is attached to a dozen or so hosts. Log files in question are PHP FPM log files, placed in the logging directory of each hosted web site - all deployed automatically following the same layout - so there is no way the Zabbix item error can be due to a misconfiguration (of neither Zabbix nor the log location or permissions), caused by a human error.
      • The fact that the log files in question briefly become "supported" when their timestamp changes confirms everything with the path and access permissions on the log file is correct and the error is somewhere in Zabbix. 
      • The documentation on logrt() is somewhat ambiguous (see https://www.zabbix.com/documentation/current/en/manual/config/items/itemtypes/zabbix_agent#logrtfile-regexpregexpencodingmaxlinesmodeoutputmaxdelayoptionspersistent-dir ): one one hand, it shows the arguments as a comma-separated list without any quotes anywhere - but in the provided examples the file path is quoted in one and unquoted in another, while the log text matching pattern is always quoted; why is so? On other hand, even though the comma seems to be an argument delimiter for logrt(), both the comma and the quote are perfectly valid characters in a Unix filename (in fact, everything except the directory separator is) - so how do you tell where the pattern for the file path and name ends and where the text matching regular expression begins? You manual says absolutely nothing about the need to escape commas, quotes etc. in the filename. Not that we have quotes and commas anywhere in the filename or the log text matching pattern, but still.

            aigars.kadikis Aigars Kadikis
            assen.totin Assen Totin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: