Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-22813

Zabbix server |LLD | lost of data because of OOM (?)

XMLWordPrintable

    • Icon: Problem report Problem report
    • Resolution: Incomplete
    • Icon: Trivial Trivial
    • None
    • None
    • None
    • None

      This is a bug report. I will be providing logs and other pieces of information next.

      I am running Zabbix 6.4.0 on kubernetes and i have imposed a limit to its memory.

      The server handles about 2000 hosts and >20000 items, all discovered via means of LLDs. All LLD discovery rules are set to have a retention of undiscovered items of 30 days

      For some reason, at some point the server simply eliminates all discovered Hosts and Items in a housekeeping job that (according to the log) takes about 400 seconds to execute (regular housekeeping ususally take up just few seconds).

      I am still investigating this issue but i fear it might be linked to the CPU or RAM limits imposed on the Pod. That's why the title of this post.

      Has anyone had similar symptoms ? This is quite a bad bug i must say

      UPDATE

      Here the log i see in the server container: "invalid discovery rule ID [51085]" ---> this goes for most of the rules

      and before that:

      ```
      244:20230517:065155.162 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: deadlock detected
      DETAIL: Process 1680 waits for ShareLock on transaction 2741830; blocked by process 1683.
      Process 1683 waits for ShareLock on transaction 2741132; blocked by process 1680.
      HINT: See server log for query details.
      CONTEXT: while deleting tuple (36,3) in relation "item_rtdata"
      SQL statement "DELETE FROM ONLY "public"."item_rtdata" WHERE $1 OPERATOR(pg_catalog.=) "itemid""
      [delete from functions where (itemid in​
      ```

      if i try to create more LLDs it seems the they are not executed, i see no useful logs in the server process. only these two things:

      • postgres DB log: `LOG: could not receive data from client: Connection reset by peer`
      • zabbix server:

       
      Code:
      ...
      271:20230517:073822.065 server #35 started
      7:20230517:073822.068 "zabbix-server-..." node started in "active" mode
      272:20230517:073822.068 server #36 started
      273:20230517:073822.072 server #37 started
      Bad operator (INTEGER): At line 73 in /var/lib/mibs/ietf/SNMPv2-PDU
      243:20230517:073823.368 thread started
      243:20230517:073823.368 thread started
      243:20230517:073823.368 thread started
      this is the onllog i see ...

            tbross Tomass Janis Bross
            db100 db100
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: