Loading...

XML

Word

Printable

Type: Problem report
Resolution: Fixed
Priority: Trivial
Fix Version/s: 7.2.11rc1, 7.4.1rc1, 8.0.0alpha1
Affects Version/s: 7.2.5
Component/s: Server (S)
Labels:
- database
- triggers
Environment:
Almalinux 9.5

Sprint:
S25-W18/19, S25-W20/21, S25-W24/25, S25-W28/29
Story Points:
0.5

Our environment:

2 dedicated servers, 16 cores/64GB RAM, running zabbix server

1 VM, for database replication.

We are running a MariaDB SQL server with a patroni layer infront of it.

Problem:

We experienced a deadlock on our node 1, HA switched over to node 2.

/var/log/zabbix/zabbix_server.log:2888270:20250428:054630.667 [Z3005] query failed: [1213] Deadlock found when trying to get lock; try restarting transaction [commit;]

/var/log/zabbix/zabbix_server.log:2888270:20250428:054630.667 ERROR: rollback without transaction. Please report it to Zabbix Team.

/var/log/zabbix/zabbix_server.log:2888270:20250428:054630.667 === Backtrace: ===

/var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 10: /usr/sbin/zabbix_server: ha manager(zbx_backtrace+0x41) [0x55c5d4b5cb71]

/var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 9: /usr/sbin/zabbix_server: ha manager(zbx_dbconn_rollback+0x10b) [0x55c5d4b44a6b]

/var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 8: /usr/sbin/zabbix_server: ha manager(+0x258aa4) [0x55c5d4990aa4]

/var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 7: /usr/sbin/zabbix_server: ha manager(ha_manager_thread+0x42a) [0x55c5d4992faa]

/var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 6: /usr/sbin/zabbix_server: ha manager(zbx_ha_start+0x6d) [0x55c5d49949ed]

/var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 5: /usr/sbin/zabbix_server: ha manager(MAIN_ZABBIX_ENTRY+0x9e8) [0x55c5d480cb28]

/var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 4: /usr/sbin/zabbix_server: ha manager(zbx_daemon_start+0x145) [0x55c5d4b5dc75]

/var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 3: /usr/sbin/zabbix_server: ha manager(main+0x3f5) [0x55c5d4801bb5]

/var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 2: /lib64/libc.so.6(+0x29590) [0x7fb1a3629590]

/var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 1: /lib64/libc.so.6(__libc_start_main+0x80) [0x7fb1a3629640]

/var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 0: /usr/sbin/zabbix_server: ha manager(_start+0x25) [0x55c5d4808f55]

Once the second node started running, we observed duplicate primary key errors:

307243:20250428:151039.021 query [txnlev:1] [insert into events (eventid,source,object,objectid,clock,ns,value,name,severity) values (222916608,0,0,3518246,1745845835,718767,1,'/var/log: Disk space is low (used > 80%)',2);.]
307243:20250428:151039.021 In dbconn_get_cached_nextid() table:'event_tag' num:12
307243:20250428:151039.021 End of dbconn_get_cached_nextid() table:'event_tag' [33167419:33167430]
307243:20250428:151039.021 query [txnlev:1] [insert into event_tag (eventtagid,eventid,tag,value) values (33167419,222916608,'scope','availability'),(33167420,222916608,'scope','capacity'),(33167421,222916608,'component','storage'),(33167422,222916608,'filesystem','/var/log'),(33167423,222916608,'discovery','vmware_custom'),(33167424,222916608,'env','TEST'),(33167425,222916608,'contract','P0124001096'),(33167426,222916608,'SLA','N-00-00-00-00'),(33167427,222916608,'path','path/path/path'),(33167428,222916608,'vm_id','vm-xxxxxx'),(33167429,222916608,'class','os'),(33167430,222916608,'target','linux');.]
307243:20250428:151039.021 [Z3008] query failed due to primary key constraint: [1062] Duplicate entry '33167419' for key 'PRIMARY'

The zabbix server has a ID in its cache which isnt free.

MariaDB [zabbix]> select * from event_tag where eventtagid = 33167419;
+------------+-----------+--------+-----------+
| eventtagid | eventid   | tag    | value     |
+------------+-----------+--------+-----------+
|   33167419 | 226515294 | target | cisco-ios |
+------------+-----------+--------+-----------+1 row in set (0.001 sec)

Last ID in database at the time at writing:

MariaDB [zabbix]> select * from event_tag order by eventtagid desc limit 10;+------------+-----------+-----------+--------------+
| eventtagid | eventid   | tag       | value        |
+------------+-----------+-----------+--------------+
|   33175870 | 222949120 | target    | generic      |
|   33175869 | 222949120 | class     | network      |
|   33175868 | 222949120 | component | network      |
|   33175867 | 222949120 | component | health       |
|   33175866 | 222949120 | scope     | performance  |
|   33175865 | 222949120 | scope     | availability |
|   33175864 | 222949100 | target    | generic      |
|   33175863 | 222949100 | class     | network      |
|   33175862 | 222949100 | component | network      |
|   33175861 | 222949100 | component | health       |
+------------+-----------+-----------+--------------+
10 rows in set (0.000 sec)

Tracing the log back to the code (https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/src/libs/zbxdb/dbmisc.c#109)

As far as i can read the code, it seems that the value is not compared to the database once its cached.

Steps to reproduce:

Setup a HA environment
Let node 1 die in a way that it is not designed to do so
Observe the errors in node 2
1. In this case we increased the loglevel of several components
A switch back with a systemctl to the other node fixes the problem

Result:
Duplicate primary key errors.
Expected:

See attached log/text file for more debugging info.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

host_for_HA_test.yaml
2025 Jul 09 18:13
0.7 kB
Sergejs Olonkins
zabbix_crash.log
2025 Apr 29 10:05
6 kB
Arjo
zabbix_server_HA_duplicatehistory_str_pkey.log
2025 Jul 09 18:11
3.57 MB
Sergejs Olonkins

causes

ZBX-26909 Zabbix Server Crashes

Closed

Assignee:: Sergejs Boidenko
Reporter:: Arjo
Team:: Team B
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: 2025 Apr 29 10:05
Updated:: 2025 Aug 25 17:20
Resolved:: 2025 Jul 14 13:14

Estimated:

Not Specified

Remaining:

Not Specified

Logged:

67h

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Time Tracking