-
Problem report
-
Resolution: Incomplete
-
Trivial
-
None
-
None
-
None
-
None
This is a bug report. I will be providing logs and other pieces of information next.
I am running Zabbix 6.4.0 on kubernetes and i have imposed a limit to its memory.
The server handles about 2000 hosts and >20000 items, all discovered via means of LLDs. All LLD discovery rules are set to have a retention of undiscovered items of 30 days
For some reason, at some point the server simply eliminates all discovered Hosts and Items in a housekeeping job that (according to the log) takes about 400 seconds to execute (regular housekeeping ususally take up just few seconds).
I am still investigating this issue but i fear it might be linked to the CPU or RAM limits imposed on the Pod. That's why the title of this post.
Has anyone had similar symptoms ? This is quite a bad bug i must say
—
UPDATE
Here the log i see in the server container: "invalid discovery rule ID [51085]" ---> this goes for most of the rules
and before that:
```
244:20230517:065155.162 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: deadlock detected
DETAIL: Process 1680 waits for ShareLock on transaction 2741830; blocked by process 1683.
Process 1683 waits for ShareLock on transaction 2741132; blocked by process 1680.
HINT: See server log for query details.
CONTEXT: while deleting tuple (36,3) in relation "item_rtdata"
SQL statement "DELETE FROM ONLY "public"."item_rtdata" WHERE $1 OPERATOR(pg_catalog.=) "itemid""
[delete from functions where (itemid in
```
if i try to create more LLDs it seems the they are not executed, i see no useful logs in the server process. only these two things:
- postgres DB log: `LOG: could not receive data from client: Connection reset by peer`
- zabbix server:
Code:
...
271:20230517:073822.065 server #35 started
7:20230517:073822.068 "zabbix-server-..." node started in "active" mode
272:20230517:073822.068 server #36 started
273:20230517:073822.072 server #37 started
Bad operator (INTEGER): At line 73 in /var/lib/mibs/ietf/SNMPv2-PDU
243:20230517:073823.368 thread started
243:20230517:073823.368 thread started
243:20230517:073823.368 thread started
this is the onllog i see ...