[ZBX-23610] Zabbix Postgres DB crash Created: 2023 Oct 26 Updated: 2023 Oct 27 Resolved: 2023 Oct 27 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 6.4.7 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Trivial |
Reporter: | Ivan Duart | Assignee: | Zabbix Support Team |
Resolution: | Won't fix | Votes: | 0 |
Labels: | database | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: |
![]() |
Description |
Steps to reproduce:
I attach the postgres log where you can see the errors and restarts |
Comments |
Comment by Alex Kalimulin [ 2023 Oct 27 ] |
This is hardly a Zabbix problem, it may be a corrupted file system or damaged DB files. |
Comment by Ivan Duart [ 2023 Oct 27 ] |
For more info, when i truncate the 5 history tables, it works again until the next housekeeper. |
Comment by Alex Kalimulin [ 2023 Oct 27 ] |
What platform and PostgreSQL version? Can you reproduce the crash by exporting the existing DB and importing it into a newly created DB, desirably on another disk? |
Comment by Edgar Akhmetshin [ 2023 Oct 27 ] |
2023-10-26 05:33:14.359 UTC [1817214] FATAL: the database system is in recovery mode 2023-10-26 05:33:14.897 UTC [1817181] LOG: database system was not properly shut down; automatic recovery in progress 2023-10-26 05:33:14.904 UTC [1817181] LOG: redo starts at 8D2/23E783F8 2023-10-26 05:33:14.905 UTC [1817181] LOG: invalid record length at 8D2/23E81508: wanted 24, got 0 2023-10-26 05:33:14.905 UTC [1817181] LOG: redo done at 8D2/23E814E0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2023-10-26 05:33:14.918 UTC [1817182] LOG: checkpoint starting: end-of-recovery immediate wait 2023-10-26 05:33:14.995 UTC [1817182] LOG: checkpoint complete: wrote 12 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.068 s, sync=0.004 s, total=0.078 s; sync files=11, longest=0.001 s, average=0.001 s; distance=36 kB, estimate=36 kB 2023-10-26 05:33:14.999 UTC [1] LOG: database system is ready to accept connections 2023-10-26 05:33:23.747 UTC [1817227] PANIC: corrupted item lengths: total 8480, available space 7552 2023-10-26 05:33:23.747 UTC [1817227] STATEMENT: delete from history_uint where itemid=877561 and ctid = any(array(select ctid from history_uint where itemid=877561 limit 10000)) 2023-10-26 05:33:23.836 UTC [1] LOG: server process (PID 1817227) was terminated by signal 6: Aborted 2023-10-26 05:33:23.836 UTC [1] DETAIL: Failed process was running: delete from history_uint where itemid=877561 and ctid = any(array(select ctid from history_uint where itemid=877561 limit 10000)) 2023-10-26 05:33:23.837 UTC [1] LOG: terminating any other active server processes 2023-10-26 05:33:23.849 UTC [1] LOG: all server processes terminated; reinitializing 2023-10-26 05:33:24.304 UTC [1817248] LOG: database system was interrupted; last known up at 2023-10-26 05:33:14 UTC 2023-10-26 05:33:24.305 UTC [1817251] FATAL: the database system is in recovery mode 2023-10-26 05:33:24.352 UTC [1817252] FATAL: the database system is in recovery mode 2023-10-26 05:33:24.353 UTC [1817253] FATAL: the database system is in recovery mode 2023-10-26 05:33:24.354 UTC [1817254] FATAL: the database system is in recovery mode 2023-10-26 05:33:24.355 UTC [1817255] FATAL: the database system is in recovery mode 2023-10-26 05:33:24.356 UTC [1817256] FATAL: the database system is in recovery mode 2023-10-26 05:33:24.359 UTC [1817257] FATAL: the database system is in recovery mode 2023-10-26 05:33:24.829 UTC [1817248] LOG: database system was not properly shut down; automatic recovery in progress 2023-10-26 05:33:24.837 UTC [1817248] LOG: redo starts at 8D2/23E81580 2023-10-26 05:33:24.849 UTC [1817248] LOG: unexpected pageaddr 8D1/C7000000 in log segment 00000001000008D200000024, offset 0 2023-10-26 05:33:24.849 UTC [1817248] LOG: redo done at 8D2/23FFC810 system usage: CPU: user: 0.00 s, system: 0.01 s, elapsed: 0.01 s 2023-10-26 05:33:24.860 UTC [1817249] LOG: checkpoint starting: end-of-recovery immediate wait 2023-10-26 05:33:24.940 UTC [1817249] LOG: checkpoint complete: wrote 230 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.068 s, sync=0.006 s, total=0.083 s; sync files=28, longest=0.002 s, average=0.001 s; distance=1530 kB, estimate=1530 kB Doesn't related to the Zabbix, try to update PostgreSQL to fix database bug causing the issue.
This means that under load crash is reproducible more often, nothing more. Please be advised that this section of the tracker is for bug reports only. The case you have submitted can not be qualified as one, so please reach out to [email protected] for commercial support (https://zabbix.com/support) or consultancy services. Alternatively, you can also use our IRC channel or community forum (https://www.zabbix.com/forum) for assistance. With that said, we are closing this ticket. Thank you for understanding. |