Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-26370

HA: standby node does not update its cached next id

XMLWordPrintable

    • Icon: Problem report Problem report
    • Resolution: Unresolved
    • Icon: Trivial Trivial
    • None
    • 7.2.5
    • Server (S)
    • Almalinux 9.5
    • S25-W18/19, S25-W20/21

      Our environment:

      2 dedicated servers, 16 cores/64GB RAM, running zabbix server

      1 VM, for database replication.

      We are running a MariaDB SQL server with a patroni layer infront of it.

      Problem:

      We experienced a deadlock on our node 1, HA switched over to node 2.

       

      /var/log/zabbix/zabbix_server.log:2888270:20250428:054630.667 [Z3005] query failed: [1213] Deadlock found when trying to get lock; try restarting transaction [commit;]
      
      /var/log/zabbix/zabbix_server.log:2888270:20250428:054630.667 ERROR: rollback without transaction. Please report it to Zabbix Team.
      
      /var/log/zabbix/zabbix_server.log:2888270:20250428:054630.667 === Backtrace: ===
      
      /var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 10: /usr/sbin/zabbix_server: ha manager(zbx_backtrace+0x41) [0x55c5d4b5cb71]
      
      /var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 9: /usr/sbin/zabbix_server: ha manager(zbx_dbconn_rollback+0x10b) [0x55c5d4b44a6b]
      
      /var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 8: /usr/sbin/zabbix_server: ha manager(+0x258aa4) [0x55c5d4990aa4]
      
      /var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 7: /usr/sbin/zabbix_server: ha manager(ha_manager_thread+0x42a) [0x55c5d4992faa]
      
      /var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 6: /usr/sbin/zabbix_server: ha manager(zbx_ha_start+0x6d) [0x55c5d49949ed]
      
      /var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 5: /usr/sbin/zabbix_server: ha manager(MAIN_ZABBIX_ENTRY+0x9e8) [0x55c5d480cb28]
      
      /var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 4: /usr/sbin/zabbix_server: ha manager(zbx_daemon_start+0x145) [0x55c5d4b5dc75]
      
      /var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 3: /usr/sbin/zabbix_server: ha manager(main+0x3f5) [0x55c5d4801bb5]
      
      /var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 2: /lib64/libc.so.6(+0x29590) [0x7fb1a3629590]
      
      /var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 1: /lib64/libc.so.6(__libc_start_main+0x80) [0x7fb1a3629640]
      
      /var/log/zabbix/zabbix_server.log:2888270:20250428:054630.668 0: /usr/sbin/zabbix_server: ha manager(_start+0x25) [0x55c5d4808f55]

      Once the second node started running, we observed duplicate primary key errors: 

       

       

      307243:20250428:151039.021 query [txnlev:1] [insert into events (eventid,source,object,objectid,clock,ns,value,name,severity) values (222916608,0,0,3518246,1745845835,718767,1,'/var/log: Disk space is low (used > 80%)',2);.]
      307243:20250428:151039.021 In dbconn_get_cached_nextid() table:'event_tag' num:12
      307243:20250428:151039.021 End of dbconn_get_cached_nextid() table:'event_tag' [33167419:33167430]
      307243:20250428:151039.021 query [txnlev:1] [insert into event_tag (eventtagid,eventid,tag,value) values (33167419,222916608,'scope','availability'),(33167420,222916608,'scope','capacity'),(33167421,222916608,'component','storage'),(33167422,222916608,'filesystem','/var/log'),(33167423,222916608,'discovery','vmware_custom'),(33167424,222916608,'env','TEST'),(33167425,222916608,'contract','P0124001096'),(33167426,222916608,'SLA','N-00-00-00-00'),(33167427,222916608,'path','path/path/path'),(33167428,222916608,'vm_id','vm-xxxxxx'),(33167429,222916608,'class','os'),(33167430,222916608,'target','linux');.]
      307243:20250428:151039.021 [Z3008] query failed due to primary key constraint: [1062] Duplicate entry '33167419' for key 'PRIMARY'

      The zabbix server has a ID in its cache which isnt free.

       

       

      MariaDB [zabbix]> select * from event_tag where eventtagid = 33167419;
      +------------+-----------+--------+-----------+
      | eventtagid | eventid   | tag    | value     |
      +------------+-----------+--------+-----------+
      |   33167419 | 226515294 | target | cisco-ios |
      +------------+-----------+--------+-----------+1 row in set (0.001 sec)

      Last ID in database at the time at writing:

       

       

      MariaDB [zabbix]> select * from event_tag order by eventtagid desc limit 10;+------------+-----------+-----------+--------------+
      | eventtagid | eventid   | tag       | value        |
      +------------+-----------+-----------+--------------+
      |   33175870 | 222949120 | target    | generic      |
      |   33175869 | 222949120 | class     | network      |
      |   33175868 | 222949120 | component | network      |
      |   33175867 | 222949120 | component | health       |
      |   33175866 | 222949120 | scope     | performance  |
      |   33175865 | 222949120 | scope     | availability |
      |   33175864 | 222949100 | target    | generic      |
      |   33175863 | 222949100 | class     | network      |
      |   33175862 | 222949100 | component | network      |
      |   33175861 | 222949100 | component | health       |
      +------------+-----------+-----------+--------------+
      10 rows in set (0.000 sec)

       

       

      Tracing the log back to the code (https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/src/libs/zbxdb/dbmisc.c#109) 

      As far as i can read the code, it seems that the value is not compared to the database once its cached.

       

      Steps to reproduce:

      1. Setup a HA environment
      2. Let node 1 die in a way that it is not designed to do so
      3. Observe the errors in node 2
        1. In this case we increased the loglevel of several components
      4. A switch back with a systemctl to the other node fixes the problem

      Result:
      Duplicate primary key errors.
      Expected:

      See attached log/text file for more debugging info.

            sboidenko Sergejs Boidenko
            avelderman Arjo
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - Not Specified
                Not Specified
                Logged:
                Time Spent - 11h
                11h