[ZBX-13362] Housekeeper potentially deleting the wrong row when PostgreSQL partitioning is used Created: 2018 Jan 18 Updated: 2024 Apr 10 Resolved: 2018 Mar 14 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Proxy (P), Server (S) |
Affects Version/s: | 3.4.6 |
Fix Version/s: | 3.0.16rc1, 3.4.8rc1, 4.0.0alpha5, 4.0 (plan) |
Type: | Problem report | Priority: | Minor |
Reporter: | Raymond Tau | Assignee: | Sergejs Paskevics |
Resolution: | Fixed | Votes: | 1 |
Labels: | housekeeper, partitioning, postgresql | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Database: PostgreSQL 9.5 |
Team: | Team C |
Sprint: | Sprint 27, Sprint 28, Sprint 29 |
Story Points: | 1 |
Description |
When Housekeeper is enabled and MaxHousekeeperDelete is not 0 (defaults to 5000), it is found that SQLs like this are issued, to cleanup deleted items: delete from history_uint where ctid = any(array(select ctid from history_uint where itemid=36109 limit 5000)) However, if partitioning is done, such as https://www.zabbix.org/wiki/Higher_performant_partitioning_in_PostgreSQL, the ctid may not be unique across inherited tables, demonstrated in http://sqlfiddle.com/#!17/b9e7d/3/0, preparing the test table as below: CREATE OR REPLACE FUNCTION main_insert_trigger() RETURNS TRIGGER AS 'BEGIN IF (NEW.id < 10) THEN INSERT INTO child_1 VALUES (NEW.*); ELSIF (NEW.id >= 10 and NEW.id < 20) THEN INSERT INTO child_2 VALUES (NEW.*); ELSE RETURN NEW; END IF; RETURN NULL; END;' LANGUAGE plpgsql; create table main (id INTEGER, value INTEGER); create table child_1 (check (id < 10)) INHERITS (main); create table child_2 (check (id >=10 and id < 20)) INHERITS (main); CREATE TRIGGER insert_main_trigger BEFORE INSERT ON main FOR EACH ROW EXECUTE PROCEDURE main_insert_trigger(); insert into main values (1,1),(10,2),(100,3); Which select tableoid, ctid,* from main; would produce result like: tableoid ctid id value 17361 (0,1) 100 3 17364 (0,1) 1 1 17368 (0,1) 10 2 |
Comments |
Comment by Raymond Tau [ 2018 Jan 18 ] |
The SQL should have been originated from src/zabbix_server/housekeeper/housekeeper.c, function DBdelete_from_table. |
Comment by Marc [ 2018 Jan 19 ] |
Good catch! |
Comment by Raymond Tau [ 2018 Jan 22 ] |
Consider one of the suggested method to tackle high housekeeper usage is using table partitioning, and I think it would be rather easy to fix the problem (by adding the same filter to the outer delete statement, the low cost of tid scan would make any additional filter cost nothing), I wish they would be willing to fix it. For example, the SQL mentioned could be changed to: delete from history_uint where (itemid=36109) and ctid = any(array(select ctid from history_uint where itemid=36109 limit 5000)) |
Comment by Viktors Tjarve [ 2018 Feb 16 ] |
Testing with more than 6 mil. values in history table: # explain analyze delete from history where ctid=any(array(select ctid from history where (itemid=23676) limit 1000000)); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Delete on history (cost=24653.59..24693.69 rows=10 width=6) (actual time=7287.174..7287.174 rows=0 loops=1) InitPlan 1 (returns $0) -> Limit (cost=0.00..24653.58 rows=1000000 width=6) (actual time=0.125..2122.481 rows=1000000 loops=1) -> Seq Scan on history history_1 (cost=0.00..290320.90 rows=11776014 width=6) (actual time=0.122..1075.796 rows=1000000 loops=1) Filter: (itemid = 23676) -> Tid Scan on history (cost=0.01..40.11 rows=10 width=6) (actual time=2765.850..3790.410 rows=1000000 loops=1) TID Cond: (ctid = ANY ($0)) Planning time: 0.163 ms Execution time: 7289.960 ms (9 rows) # explain analyze delete from history where (itemid=23676) and ctid=any(array(select ctid from history where (itemid=23676) limit 1000000)); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Delete on history (cost=24653.59..24693.72 rows=9 width=6) (actual time=7967.702..7967.702 rows=0 loops=1) InitPlan 1 (returns $0) -> Limit (cost=0.00..24653.58 rows=1000000 width=6) (actual time=9.761..2277.569 rows=1000000 loops=1) -> Seq Scan on history history_1 (cost=0.00..290320.90 rows=11776014 width=6) (actual time=9.758..1217.197 rows=1000000 loops=1) Filter: (itemid = 23676) -> Tid Scan on history (cost=0.01..40.14 rows=9 width=6) (actual time=2956.973..4110.316 rows=1000000 loops=1) TID Cond: (ctid = ANY ($0)) Filter: (itemid = 23676) Planning time: 0.173 ms Execution time: 7969.142 ms (10 rows) # explain analyze delete from history where (itemid=23676) and ctid=any(array(select ctid from history limit 1000000)); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Delete on history (cost=18936.30..18976.42 rows=9 width=6) (actual time=7320.094..7320.094 rows=0 loops=1) InitPlan 1 (returns $0) -> Limit (cost=0.00..18936.29 rows=1000000 width=6) (actual time=0.188..1964.917 rows=1000000 loops=1) -> Seq Scan on history history_1 (cost=0.00..256462.32 rows=13543432 width=6) (actual time=0.185..922.297 rows=1000000 loops=1) -> Tid Scan on history (cost=0.01..40.14 rows=9 width=6) (actual time=2615.487..3730.859 rows=1000000 loops=1) TID Cond: (ctid = ANY ($0)) Filter: (itemid = 23676) Planning time: 0.163 ms Execution time: 7322.085 ms (9 rows) No significant increase in execution time with the introduced changes. |
Comment by Viktors Tjarve [ 2018 Feb 16 ] |
Successfully tested. |
Comment by Sergejs Paskevics [ 2018 Mar 05 ] |
Implemented:
|
Comment by Sergejs Paskevics [ 2018 Mar 12 ] |
Implemented:
|