[ZBX-12975] Foreign key constraint fails in events Created: 2017 Nov 02  Updated: 2024 Apr 10  Resolved: 2017 Dec 01

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: None
Affects Version/s: 3.2.9, 3.4.4, 4.0.0alpha1, 4.0 (plan)
Fix Version/s: 3.2.11rc1, 3.4.5rc1, 4.0.0alpha1

Type: Problem report Priority: Trivial
Reporter: Frank Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Causes
caused by ZBX-12434 No one clear missed objects from problem Closed
Team: Team A
Team: Team A
Sprint: Sprint 20, Sprint 21, Sprint 22
Story Points: 0.25

 Description   

After upgrading to 3.4.3 (from 3.2.x) I noticed the following in the logs happening:

  1505:20171102:115356.574 [Z3005] query failed: [1452] Cannot add or update a child row: a foreign key constraint fails (`zabbix`.`event_recovery`, CONSTRAINT `c_event_recovery_1` FOREIGN KEY (`eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE) [insert into event_recovery (eventid,r_eventid,correlationid,c_eventid,userid) values (103966,127948,null,null,null),(125859,127948,null,null,null),(89485,127948,null,null,null),(108432,127948,null,null,null),(1061995,127948,null,null,null),(116210,127948,null,null,null),(127405,127948,null,null,null),(112889,127948,null,null,null),(102583,127948,null,null,null),(93945,127948,null,null,null);

This happens somewhere between 0 and 23 times a day. Can I somehow find and get rid of these corrupt entries?



 Comments   
Comment by Vladislavs Sokurenko [ 2017 Nov 02 ]

Could you please show create table problem;
show create table event_recovery;
Have you performed partitionning?
What are your housekeeper settings?

Comment by Frank [ 2017 Nov 03 ]

Certainly:

CREATE TABLE `problem` (
  `eventid` bigint(20) unsigned NOT NULL,
  `source` int(11) NOT NULL DEFAULT '0',
  `object` int(11) NOT NULL DEFAULT '0',
  `objectid` bigint(20) unsigned NOT NULL DEFAULT '0',
  `clock` int(11) NOT NULL DEFAULT '0',
  `ns` int(11) NOT NULL DEFAULT '0',
  `r_eventid` bigint(20) unsigned DEFAULT NULL,
  `r_clock` int(11) NOT NULL DEFAULT '0',
  `r_ns` int(11) NOT NULL DEFAULT '0',
  `correlationid` bigint(20) unsigned DEFAULT NULL,
  `userid` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`eventid`),
  KEY `problem_1` (`source`,`object`,`objectid`),
  KEY `problem_2` (`r_clock`),
  KEY `c_problem_2` (`r_eventid`),
  CONSTRAINT `c_problem_1` FOREIGN KEY (`eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE,
  CONSTRAINT `c_problem_2` FOREIGN KEY (`r_eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

We do not use partitioning in our database.

In regards to housekeeper:

  • Trigger data / internal data: Enabled, 60d
  • Services: Enabled and storage period 365d
  • History: Enabled and depending on templates
  • Trends: Enabled, overridden and storage period set to 90d.

Others are all set to 30d and are all enabled.

Comment by Vladislavs Sokurenko [ 2017 Nov 03 ]

Did you see any other messages such as database down, in server log file ?

Comment by Frank [ 2017 Nov 03 ]

During the upgrade or normally? Since this particular thing keeps on occurring on a daily basis.

Comment by Vladislavs Sokurenko [ 2017 Nov 03 ]

on a daily basis, it looks like event is generated when trigger enters problem state, but then when trigger goes to OK state, the problem event is already gone.

Did you delete events manually ?

Comment by Frank [ 2017 Nov 03 ]

Some may have been removed in the past in order to (attempt to) fix this problem.

After the upgrade from 3.2 => 3.4 I also noticed a few "problems" that were there, dating back to April 2017.
Based on the housekeeper, these entries should have been removed a long time ago.

Comment by Vladislavs Sokurenko [ 2017 Nov 03 ]

Do you use global correlation ?

Comment by Frank [ 2017 Nov 03 ]

No, we haven't configured that (yet).

Comment by Vladislavs Sokurenko [ 2017 Nov 03 ]

Did you perform some manual changes to problem table in the past ?

When issue occurs again is it same ids in error ? Please try performing query similar to this below based on your ids from error message, please put each eventid that is in error message.

select * from problem where eventid in (103966, 125859, 103966, 125859, 89485, 108432, 1061995, 116210, 127405, 112889, 102583, 93945);
Comment by Frank [ 2017 Nov 03 ]

The table structure was never modified (like any other tables).
The mentioned query results in the following output:

+---------+--------+--------+----------+------------+-----------+-----------+---------+------+---------------+--------+
| eventid | source | object | objectid | clock      | ns        | r_eventid | r_clock | r_ns | correlationid | userid |
+---------+--------+--------+----------+------------+-----------+-----------+---------+------+---------------+--------+
|   89485 |      0 |      0 |    49935 | 1504488612 | 436087519 |      NULL |       0 |    0 |          NULL |   NULL |
|   93945 |      0 |      0 |    49935 | 1504920618 | 632504518 |      NULL |       0 |    0 |          NULL |   NULL |
|  102583 |      0 |      0 |    49935 | 1506044037 | 404912788 |      NULL |       0 |    0 |          NULL |   NULL |
|  103966 |      0 |      0 |    49935 | 1506303237 | 857141626 |      NULL |       0 |    0 |          NULL |   NULL |
|  108432 |      0 |      0 |    49935 | 1506897259 |  82800191 |      NULL |       0 |    0 |          NULL |   NULL |
|  112889 |      0 |      0 |    49935 | 1507772065 |  78279481 |      NULL |       0 |    0 |          NULL |   NULL |
|  116210 |      0 |      0 |    49935 | 1508290469 | 994276139 |      NULL |       0 |    0 |          NULL |   NULL |
|  125859 |      0 |      0 |    49935 | 1509418427 | 385058366 |      NULL |       0 |    0 |          NULL |   NULL |
|  127405 |      0 |      0 |    49935 | 1509580433 |  33610867 |      NULL |       0 |    0 |          NULL |   NULL |
+---------+--------+--------+----------+------------+-----------+-----------+---------+------+---------------+--------+

Comment by Vladislavs Sokurenko [ 2017 Nov 03 ]

And now please do

select * from events where eventid in (103966, 125859, 103966, 125859, 89485, 108432, 1061995, 116210, 127405, 112889, 102583, 93945);
Comment by Frank [ 2017 Nov 03 ]

This produces:

+---------+--------+--------+----------+------------+-------+--------------+-----------+
| eventid | source | object | objectid | clock      | value | acknowledged | ns        |
+---------+--------+--------+----------+------------+-------+--------------+-----------+
|   89485 |      0 |      0 |    49935 | 1504488612 |     1 |            1 | 436087519 |
|   93945 |      0 |      0 |    49935 | 1504920618 |     1 |            1 | 632504518 |
|  102583 |      0 |      0 |    49935 | 1506044037 |     1 |            1 | 404912788 |
|  103966 |      0 |      0 |    49935 | 1506303237 |     1 |            1 | 857141626 |
|  108432 |      0 |      0 |    49935 | 1506897259 |     1 |            1 |  82800191 |
|  112889 |      0 |      0 |    49935 | 1507772065 |     1 |            1 |  78279481 |
|  116210 |      0 |      0 |    49935 | 1508290469 |     1 |            1 | 994276139 |
|  125859 |      0 |      0 |    49935 | 1509418427 |     1 |            1 | 385058366 |
|  127405 |      0 |      0 |    49935 | 1509580433 |     1 |            1 |  33610867 |
+---------+--------+--------+----------+------------+-------+--------------+-----------+

vso I have updated queries to your full list, please try again.

Comment by Vladislavs Sokurenko [ 2017 Nov 03 ]

So I see that eventid 1061995 is missing from problem table, it is also missing from events table.
Is it possible that someone else created such problem entry and deleted it later ?

Comment by Frank [ 2017 Nov 03 ]

It's possible that these entries were removed to resolve this particular issue some time ago. So, in order to fix this we'll probably need an elaborate query that removes entries from the event_recovery table in case they are missing from problem and/or events?

Comment by Vladislavs Sokurenko [ 2017 Nov 03 ]

What is your HousekeepingFrequency ?

Comment by Frank [ 2017 Nov 03 ]

It's set to the default value (HousekeepingFrequency=1)

Comment by Vladislavs Sokurenko [ 2017 Nov 03 ]

You can try disabling housekeeping and see if it helps, then it will be at least clear that it has something to do with housekeeper. Another option is to disable housekeeping of events, can you please confirm if issue is gone then ?

Do you add/remove triggers frequently ?
How often do you reload cache ?

Comment by Frank [ 2017 Nov 06 ]

Triggers are being added once in a while, generally once every few weeks. What cache do you mean exactly?

Comment by Vladislavs Sokurenko [ 2017 Nov 06 ]

I mean CacheUpdateFrequency

Comment by Frank [ 2017 Nov 06 ]

That value is not explicitly configured, so it's using the defaults (CacheUpdateFrequency=60).

Comment by Vladislavs Sokurenko [ 2017 Nov 06 ]

Could you please confirm that you deleted some triggers before the issue occurs ? It looks like there is possibility that trigger is deleted but still kept in cache so when trigger recovers, then it will try to recover deleted trigger, and if housekeeper executes at this specific moment right before create event_recovery then foreign key constraint error happens.

Comment by Frank [ 2017 Nov 06 ]

The last errors referring to foreign key constraint seem to have stopped last Thursday.
It may very well have been cache, but that would only make sense if we make daily changes which we do not.

vso do you use low level discovery ? Could some triggers get deleted due to not being discovered anymore ?

Comment by Frank [ 2017 Nov 13 ]

vso Yes, we use quite a bit of LLD in our environment.

Comment by Andris Zeila [ 2017 Nov 17 ]

Successfully tested

Comment by Vladislavs Sokurenko [ 2017 Nov 17 ]

Fixed in:

  • pre-3.2.11rc1 r74720
  • pre-3.4.5rc1 r74722
  • pre-4.0.0alpha1 (trunk) r74723
Comment by Frank [ 2018 Dec 20 ]

It appears that this problem has resurfaced in 4.0:

 

20261:20181220:110711.401 [Z3005] query failed: [1452] Cannot add or update a child row: a foreign key constraint fails (`zabbix`.`event_recovery`, CONSTRAINT `c_event_recovery_1` FOREIGN KEY (`eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE) [insert into event_recovery (eventid,r_eventid,correlationid,c_eventid,userid) values (1059545,1010517,null,null,null),(1062270,1010518,null,null,null);
]
20261:20181220:110713.425 [Z3005] query failed: [1452] Cannot add or update a child row: a foreign key constraint fails (`zabbix`.`event_recovery`, CONSTRAINT `c_event_recovery_1` FOREIGN KEY (`eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE) [insert into event_recovery (eventid,r_eventid,correlationid,c_eventid,userid) values (1059441,1010519,null,null,null);
]
20261:20181220:110741.807 [Z3005] query failed: [1452] Cannot add or update a child row: a foreign key constraint fails (`zabbix`.`event_recovery`, CONSTRAINT `c_event_recovery_1` FOREIGN KEY (`eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE) [insert into event_recovery (eventid,r_eventid,correlationid,c_eventid,userid) values (1059545,1010520,null,null,null),(1062270,1010521,null,null,null);
]
20262:20181220:110743.817 [Z3005] query failed: [1452] Cannot add or update a child row: a foreign key constraint fails (`zabbix`.`event_recovery`, CONSTRAINT `c_event_recovery_1` FOREIGN KEY (`eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE) [insert into event_recovery (eventid,r_eventid,correlationid,c_eventid,userid) values (1059441,1010522,null,null,null);

It doesn't seem to cause issues but is there any way to fix this?

Generated at Sat Apr 20 14:27:21 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.