[ZBX-23610] Zabbix Postgres DB crash Created: 2023 Oct 26  Updated: 2023 Oct 27  Resolved: 2023 Oct 27

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 6.4.7
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: Ivan Duart Assignee: Zabbix Support Team
Resolution: Won't fix Votes: 0
Labels: database
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File zabbix-postgres.log    

 Description   

Steps to reproduce:

  1. When zabbix housekeeper try to delete old data, the database crash

 

I attach the postgres log where you can see the errors and restarts



 Comments   
Comment by Alex Kalimulin [ 2023 Oct 27 ]

This is hardly a Zabbix problem, it may be a corrupted file system or damaged DB files.

Comment by Ivan Duart [ 2023 Oct 27 ]

For more info, when i truncate the 5 history tables, it works again until the next housekeeper.

Comment by Alex Kalimulin [ 2023 Oct 27 ]

What platform and PostgreSQL version?

Can you reproduce the crash by exporting the existing DB and importing it into a newly created DB, desirably on another disk?

Comment by Edgar Akhmetshin [ 2023 Oct 27 ]
2023-10-26 05:33:14.359 UTC [1817214] FATAL:  the database system is in recovery mode
2023-10-26 05:33:14.897 UTC [1817181] LOG:  database system was not properly shut down; automatic recovery in progress
2023-10-26 05:33:14.904 UTC [1817181] LOG:  redo starts at 8D2/23E783F8
2023-10-26 05:33:14.905 UTC [1817181] LOG:  invalid record length at 8D2/23E81508: wanted 24, got 0
2023-10-26 05:33:14.905 UTC [1817181] LOG:  redo done at 8D2/23E814E0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2023-10-26 05:33:14.918 UTC [1817182] LOG:  checkpoint starting: end-of-recovery immediate wait
2023-10-26 05:33:14.995 UTC [1817182] LOG:  checkpoint complete: wrote 12 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.068 s, sync=0.004 s, total=0.078 s; sync files=11, longest=0.001 s, average=0.001 s; distance=36 kB, estimate=36 kB
2023-10-26 05:33:14.999 UTC [1] LOG:  database system is ready to accept connections
2023-10-26 05:33:23.747 UTC [1817227] PANIC:  corrupted item lengths: total 8480, available space 7552
2023-10-26 05:33:23.747 UTC [1817227] STATEMENT:  delete from history_uint where itemid=877561 and ctid = any(array(select ctid from history_uint where itemid=877561 limit 10000))
2023-10-26 05:33:23.836 UTC [1] LOG:  server process (PID 1817227) was terminated by signal 6: Aborted
2023-10-26 05:33:23.836 UTC [1] DETAIL:  Failed process was running: delete from history_uint where itemid=877561 and ctid = any(array(select ctid from history_uint where itemid=877561 limit 10000))
2023-10-26 05:33:23.837 UTC [1] LOG:  terminating any other active server processes
2023-10-26 05:33:23.849 UTC [1] LOG:  all server processes terminated; reinitializing
2023-10-26 05:33:24.304 UTC [1817248] LOG:  database system was interrupted; last known up at 2023-10-26 05:33:14 UTC
2023-10-26 05:33:24.305 UTC [1817251] FATAL:  the database system is in recovery mode
2023-10-26 05:33:24.352 UTC [1817252] FATAL:  the database system is in recovery mode
2023-10-26 05:33:24.353 UTC [1817253] FATAL:  the database system is in recovery mode
2023-10-26 05:33:24.354 UTC [1817254] FATAL:  the database system is in recovery mode
2023-10-26 05:33:24.355 UTC [1817255] FATAL:  the database system is in recovery mode
2023-10-26 05:33:24.356 UTC [1817256] FATAL:  the database system is in recovery mode
2023-10-26 05:33:24.359 UTC [1817257] FATAL:  the database system is in recovery mode
2023-10-26 05:33:24.829 UTC [1817248] LOG:  database system was not properly shut down; automatic recovery in progress
2023-10-26 05:33:24.837 UTC [1817248] LOG:  redo starts at 8D2/23E81580
2023-10-26 05:33:24.849 UTC [1817248] LOG:  unexpected pageaddr 8D1/C7000000 in log segment 00000001000008D200000024, offset 0
2023-10-26 05:33:24.849 UTC [1817248] LOG:  redo done at 8D2/23FFC810 system usage: CPU: user: 0.00 s, system: 0.01 s, elapsed: 0.01 s
2023-10-26 05:33:24.860 UTC [1817249] LOG:  checkpoint starting: end-of-recovery immediate wait
2023-10-26 05:33:24.940 UTC [1817249] LOG:  checkpoint complete: wrote 230 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.068 s, sync=0.006 s, total=0.083 s; sync files=28, longest=0.002 s, average=0.001 s; distance=1530 kB, estimate=1530 kB

Doesn't related to the Zabbix, try to update PostgreSQL to fix database bug causing the issue.

For more info, when i truncate the 5 history tables, it works again until the next housekeeper.

This means that under load crash is reproducible more often, nothing more.

Please be advised that this section of the tracker is for bug reports only. The case you have submitted can not be qualified as one, so please reach out to [email protected] for commercial support (https://zabbix.com/support) or consultancy services. Alternatively, you can also use our IRC channel or community forum (https://www.zabbix.com/forum) for assistance. With that said, we are closing this ticket. Thank you for understanding.

Generated at Sun May 11 07:21:21 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.