[ZBX-25881] Zabbix 6.0.29 Database Postgresql 16 crashes on querying table hosts (OUT OF MEMORY) Created: 2025 Jan 15  Updated: 2025 Jan 15  Resolved: 2025 Jan 15

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Problem report Priority: Trivial
Reporter: Franco Tadeu Ferraciolli Assignee: Zabbix Development Team
Resolution: Won't fix Votes: 0
Labels: crash, database
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Hardware:
Memory 4GB
CPU: 2X
Disk: aws gp3

AWS EC2 instance type t4g.medium
Operating system: Ubuntu 22.04.4 LTS
Processor Architecture: ARM64
Zabbix Server version: 6.0.29
Database: PostgreSQL 16.3
Frontend: nginx/1.18.0

Deployment type: All-in-one (Frontend, DB and server hosted in the same machine)

No proxies installed.

Required server performance, new values per second 7.87


Attachments: PNG File image-2025-01-15-13-03-52-612.png     PNG File image-2025-01-15-13-05-44-715.png     File zabbix_server.conf    

 Description   

Steps to reproduce:

I don't have the exact steps to reproduce but my deployment uses mostly external checks and HTTP agent itens. Has less than 50 hosts. Low VPS. All Zabbix components are installed in the same machine.

Result:

Crash moment screenshot (Zabbix Server Log)

2918636:20250115:123032.642 [Z3001] connection to database 'zabbix' failed: [0] connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL: the database system is shutting down
2918636:20250115:123033.611 [Z3001] connection to database 'zabbix' failed: [0] connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL: the database system is shutting down
2918636:20250115:123034.653 [Z3001] connection to database 'zabbix' failed: [0] connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL: the database system is shutting down
2918636:20250115:123035.727 [Z3001] connection to database 'zabbix' failed: [0] connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL: the database system is shutting down
2918636:20250115:123036.690 [Z3001] connection to database 'zabbix' failed: [0] connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL: the database system is shutting down
2918721:20250115:123037.152 [Z3001] connection to database 'zabbix' failed: [0] connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL: the database system is shutting down
2918721:20250115:123037.153 database is down: reconnecting in 10 seconds
2918732:20250115:123037.210 [Z3001] connection to database 'zabbix' failed: [0] connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL: the database system is shutting down

 

OS System Logs

 

root@localhost:/home# dmesg -T | grep -E -i "oom|killed"
[Wed Jan 15 12:29:13 2025] zabbix_agentd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
[Wed Jan 15 12:29:13 2025]  oom_kill_process+0x24c/0x3e0
[Wed Jan 15 12:29:13 2025]  __alloc_pages_may_oom+0x130/0x208
[Wed Jan 15 12:29:13 2025] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[Wed Jan 15 12:29:13 2025] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=zabbix-agent.service,mems_allowed=0,global_oom,task_memcg=/system.slice/system-postgresql.slice/[email protected],task=postgres,pid=1556742,uid=115
[Wed Jan 15 12:29:13 2025] Out of memory: Killed process 1556742 (postgres) total-vm:230048kB, anon-rss:9716kB, file-rss:2176kB, shmem-rss:140288kB, UID:115 pgtables:440kB oom_score_adj:0 

 

 

PostgreSQL LOG (At the moment of the Crash){}{}

 

2025-01-15 12:26:17.943 UTC [3503028] lzabbix@zabbix ERROR:  invalid input syntax for type bigint: "zabbix" at character 354
2025-01-15 12:26:17.943 UTC [3503028] lzabbix@zabbix STATEMENT:  SELECT p.* FROM problem p WHERE p.source='0' AND p.object='0' AND p.eventid IN (369980150,369980189,6830901425,6830920664,6830>
2025-01-15 12:26:20.349 UTC [1556397] LOG:  checkpoint starting: time
2025-01-15 12:26:26.964 UTC [3503120] lzabbix@zabbix ERROR:  invalid input syntax for type bigint: "zabbix" at character 77
2025-01-15 12:26:26.964 UTC [3503120] lzabbix@zabbix STATEMENT:  SELECT p.* FROM problem p WHERE p.source='0' AND p.object='0' AND p.eventid='zabbix' AND p.r_eventid IS NULL
2025-01-15 12:28:10.567 UTC [1556397] LOG:  checkpoint complete: wrote 1101 buffers (6.7%); 0 WAL file(s) added, 0 removed, 0 recycled; write=110.206 s, sync=0.003 s, total=110.219 s; sync fi>
2025-01-15 12:30:26.383 UTC [1556396] LOG:  server process (PID 1556742) was terminated by signal 9: Killed
2025-01-15 12:30:26.383 UTC [1556396] DETAIL:  Failed process was running: select h.hostid,h.host,h.name,h.status,h.discover,hi.inventory_mode,h.custom_interfaces from hosts h,host_discovery >
2025-01-15 12:30:26.443 UTC [1556396] LOG:  terminating any other active server processes
2025-01-15 12:30:26.986 UTC [3505910] lzabbix@zabbix FATAL:  the database system is in recovery mode
2025-01-15 12:30:26.992 UTC [3505909] lzabbix@zabbix FATAL:  the database system is in recovery mode
2025-01-15 12:30:27.000 UTC [3505911] lzabbix@zabbix FATAL:  the database system is in recovery mode
2025-01-15 12:30:27.019 UTC [3505907] lzabbix@zabbix FATAL:  the database system is in recovery mode
2025-01-15 12:30:27.032 UTC [3505906] lzabbix@zabbix FATAL:  the database system is in recovery mode
2025-01-15 12:30:27.039 UTC [3505908] lzabbix@zabbix FATAL:  the database system is in recovery mode
2025-01-15 12:30:27.049 UTC [1556396] LOG:  all server processes terminated; reinitializing
2025-01-15 12:30:27.123 UTC [3505913] LOG:  database system was interrupted; last known up at 2025-01-15 12:28:10 UTC
2025-01-15 12:30:27.138 UTC [1556396] LOG:  received fast shutdown request
2025-01-15 12:30:27.191 UTC [3505916] lzabbix@zabbix FATAL:  the database system is in recovery mode
2025-01-15 12:30:27.194 UTC [3505913] LOG:  database system was not properly shut down; automatic recovery in progress
2025-01-15 12:30:27.197 UTC [3505917] lzabbix@zabbix FATAL:  the database system is in recovery mode
2025-01-15 12:30:27.207 UTC [3505913] LOG:  redo starts at 52/DA9E6530
2025-01-15 12:30:27.217 UTC [3505919] lzabbix@zabbix FATAL:  the database system is shutting down
2025-01-15 12:30:27.347 UTC [3505920] lzabbix@zabbix FATAL:  the database system is shutting down

 

 

 

 

Zabbix-Frontend remained unavailable until manual restart of PostgreSQL.
 

Memory utilization graphic from another zabbix monitoring this instance:

 

Configuration Syncer at the moment of crash:

 

Expected:
Configuration Syncer Completed as expected and no crash during it.

*Zabbix Server configuration file:*
zabbix_server.conf



 Comments   
Comment by Alexander Vladishev [ 2025 Jan 15 ]

First, you need to check the PostgreSQL server settings, especially those related to memory. The settings might not match the total amount of memory available in the system.

Second, for such a small installation, you have allocated too much memory for HistoryIndexCacheSize, TrendCacheSize, and likely CacheSize. It depends on the number of items in the system. I would suggest the following sizes:

  • CacheSize=128M
  • TrendCacheSize=16M
  • HistoryIndexCacheSize=16M

Additionally, there is no point in having such a large number of processes. Reduce these parameters to 2-4:
StartPollers, StartPreprocessors, StartPollersUnreachable, StartHistoryPollers, StartTrappers, StartTimers, StartEscalators, StartAlerters.

Using the "Zabbix server health" template, you can perform the final tuning of these parameters.

Comment by Alexander Vladishev [ 2025 Jan 15 ]

Please note that this section of the tracker is intended only for reporting Zabbix bugs.
The case you submitted cannot be qualified as such. With this in mind, we are closing this ticket. Thank you for your understanding.

Comment by Franco Tadeu Ferraciolli [ 2025 Jan 15 ]

After a research, I am considering altering these parameters for postgresql.conf:

ALTER SYSTEM SET shared_buffers = '1GB'; 
SELECT pg_reload_conf();
ALTER SYSTEM SET work_mem = '36MB'; 
SELECT pg_reload_conf();
ALTER SYSTEM SET max_connections = 500; 
SELECT pg_reload_conf();
 

Thanks for the Response

Generated at Fri Mar 14 18:03:40 EET 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.