[ZBX-5225] Frequent "Lock wait timeout exceeded; try restarting transaction" on zabbix_server. LLD related ? Created: 2012 Jun 21  Updated: 2017 May 30  Resolved: 2013 Apr 25

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.0.0, 2.0.1
Fix Version/s: 2.0.6rc1, 2.1.0

Type: Incident report Priority: Critical
Reporter: Alf Solli Assignee: Unassigned
Resolution: Fixed Votes: 11
Labels: database, deadlock, lld
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MySQL 5.5.25 on Suse Enterprise linux, zabbix_server on OpenSUSE 12.1 (both 64Bit)


Attachments: Text File my.cnf.txt     Text File mysql_slow_query_log.txt     Text File zabbix_server.conf.txt    
Issue Links:
Duplicate
duplicates ZBX-6303 deadlocks in ORA-00060 in new zabbix ... Closed
is duplicated by ZBX-5272 Database deadlocks on PostgreSQL 9.1.... Closed
is duplicated by ZBX-5269 periodically in the logs: query faile... Closed
is duplicated by ZBX-6011 query failed: [0] PGRES_FATAL_ERROR:E... Closed

 Description   

On a fresh 2.0.0 installation using only builtin templates (SNMP devices), I frequently get messages like these in /tmp/zabbix_server.log :
11858:20120621:173328.785 [Z3005] query failed: [1205] Lock wait timeout exceeded; try restarting transaction [update items set lastclock=1340292671,lastns=654346890,prevvalue=lastvalue,lastvalue='ET-WLB01' where itemid=23471;
11859:20120621:173457.775 [Z3005] query failed: [1205] Lock wait timeout exceeded; try restarting transaction [update items set lastclock=1340290488,lastns=545936090,prevvalue=lastvalue,lastvalue='30.000000' where itemid=23328;

Even more often: 11787:20120621:173328.800 slow query: 104.092488 sec, "update items set name='Free inodes on $1 (percentage)',key_='vfs.fs.inode[/,pfree]',type=0,value_type=0,data_type=0,delay=60,delay_flex='',history=7,trends=365,trapper_hosts='',units='%',multiplier=0,delta=0,formula='1',logtimefmt='',valuemapid=null,params='',ipmi_sensor='',snmp_community='',snmp_oid='',port='',snmpv3_securityname='',snmpv3_securitylevel=0,snmpv3_authpassphrase='',snmpv3_privpassphrase='',authtype=0,username='',password='',publickey='',privatekey='',description='',interfaceid=7,flags=4 where itemid=23696;
update item_discovery set key_='vfs.fs.inode[

{#FSNAME}

,pfree]',lastcheck=1340292704,ts_delete=0 where itemid=23696 and parent_itemid=23507;

All devices were added using Discovery on a few subnets, adding and linking to SNMP Device template anything that responds. Currently 271 devices are added.
(Note: I also disabled everything except IfAlias, which is updated only once pr. day to lower traffic )

Zabbix server runs on a VM, with 6 2,5Ghz Xeon cores and 2GB memory.
MySQL runs on dedicated iron with 8 2,5Ghz Xeon cores and 8GB memory, with /varrlib/mysql mounted on a 4-disk RAID5 volume consisting of 10k RPM SAS disks.
That should be able to do some damage, but I'm seeing close to 2k QPS on MySQL on average.

Attaching slow query log, zabbix_server.conf and my.cnf.



 Comments   
Comment by Alf Solli [ 2012 Jun 24 ]

Ok, update:
tried downgrading to MySQL 5.0, no difference, but kept it running there.

I noticed alot of the SQL queries causing timeouts where trying to update IfAdminStatus, which is a an item I spesifically disabled in the template.
Though it didn't take on all my hosts, so I removed all hosts added before I edited the template (SNMP interfaces) and voilĂ ; problem solved.

Then I just waited for the discovery rule and actions to be applied for that particular subnet, and sure enough, all hosts were re-added, low-level discovered and my setup ran fine for ~12 hours.

So thinking all was good, I started adding a couple more subnets (~200 snmp devices), linking all discovered units to the same template.

All devices are found, added to groups and linked to the SNMP Device template, and the following is the first abnormal message in zabbix_server.log before the timeout message reappear and all sorts of sh*t hits the fan:

update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937385;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937386;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937387;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937388;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937389;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937390;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937391;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937392;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937393;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937394;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937395;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937396;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937397;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937398;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937399;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937400;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937401;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937402;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937403;
update item_discovery set ts_delete=1343088313 where itemdiscoveryid=937404;

This is what's spamming my log from no on;

update item_discovery set key_='ifAdminStatus[

{#SNMPVALUE}]',lastcheck=1340497450,ts_delete=0 where itemid=228102 and parent_itemid=227822;
update items set name='Admin status of interface $1',key_='ifAdminStatus[null0]',type=4,value_type=3,data_type=0,delay=60,delay_flex='50/1-7,00:00-24:00',history=7,trends=365,trapper_hosts='',units='',multiplier=0,delta=0,formula='1',logtimefmt='',valuemapid=11,params='',ipmi_sensor='',snmp_community='{$SNMP_COMMUNITY}',snmp_oid='IF-MIB::ifAdminStatus.27',port='',snmpv3_securityname='',snmpv3_securitylevel=0,snmpv3_authpassphrase='',snmpv3_privpassphrase='',authtype=0,username='',password='',publickey='',privatekey='',description='The desired state of the interface.',interfaceid=879,flags=4 where itemid=228104;
update item_discovery set key_='ifAdminStatus[{#SNMPVALUE}

]',lastcheck=1340497450,ts_delete=0 where itemid=228104 and parent_itemid=227822;
update items set name='Admin status of interface $1',key_='ifAdminStatus[system0]',type=4,value_type=3,data_type=0,delay=60,delay_flex='50/1-7,00:00-24:00',history=7,trends=365,trapper_hosts='',units='',multiplier=0,delta=0,formula='1',logtimefmt='',valuemapid=11,params='',ipmi_sensor='',snmp_community='{$SNMP_COMMUNITY}',snmp_oid='IF-MIB::ifAdminStatus.28',port='',snmpv3_securityname='',snmpv3_securitylevel=0,snmpv3_authpassphrase='',snmpv3_privpassphrase='',authtype=0,username='',password='',publickey='',privatekey='',description='The desired state of the interface.',interfaceid=879,flags=4 where itemid=228105;
update item_discovery set key_='ifAdminStatus[

{#SNMPVALUE}]',lastcheck=1340497450,ts_delete=0 where itemid=228105 and parent_itemid=227822;
update items set name='Admin status of interface $1',key_='ifAdminStatus[downstream0]',type=4,value_type=3,data_type=0,delay=60,delay_flex='50/1-7,00:00-24:00',history=7,trends=365,trapper_hosts='',units='',multiplier=0,delta=0,formula='1',logtimefmt='',valuemapid=11,params='',ipmi_sensor='',snmp_community='{$SNMP_COMMUNITY}',snmp_oid='IF-MIB::ifAdminStatus.29',port='',snmpv3_securityname='',snmpv3_securitylevel=0,snmpv3_authpassphrase='',snmpv3_privpassphrase='',authtype=0,username='',password='',publickey='',privatekey='',description='The desired state of the interface.',interfaceid=879,flags=4 where itemid=228106;
update item_discovery set key_='ifAdminStatus[{#SNMPVALUE}

]',lastcheck=1340497450,ts_delete=0 where itemid=228106 and parent_itemid=227822;
update items set name='Admin status of interface $1',key_='ifAdminStatus[loopback0]',type=4,value_type=3,data_type=0,delay=60,delay_flex='50/1-7,00:00-24:00',history=7,trends=365,trapper_hosts='',units='',multiplier=0,delta=0,formula='1',logtimefmt='',valuemapid=11,params='',ipmi_sensor='',snmp_community='{$SNMP_COMMUNITY}',snmp_oid='IF-MIB::ifAdminStatus.30',port='',snmpv3_securityname='',snmpv3_securitylevel=0,snmpv3_authpassphrase='',snmpv3_privpassphrase='',authtype=0,username='',password='',publickey='',privatekey='',description='The desired state of the interface.',interfaceid=879,flags=4 where itemid=228107;
update item_discovery set key_='ifAdminStatus[

{#SNMPVALUE}

]',lastcheck=1340497450,ts_delete=0 where itemid=228107 and parent_itemid=227822;
"
1954:20120624:024807.146 [Z3005] query failed: [1205] Lock wait timeout exceeded; try restarting transaction [update ids set nextid=nextid+29 where nodeid=0 and table_name='items' and field_name='itemid']
1954:20120624:024807.146 slow query: 51.532908 sec, "update ids set nextid=nextid+29 where nodeid=0 and table_name='items' and field_name='itemid'"
zabbix_server [1954]: ERROR file:db.c,line:1412 Something impossible has just happened.

At this point, I have no clue what to do, or what's causing it, besides I suspect it's related to Discovery.
Guess that wasn't much help, but if anyone has any tips, I'd appreciate hearing about it.

Comment by Renato Ramonda [ 2012 Jul 05 ]

As you can see from the duplicate bugs, this seems to affect PostgreSQL 9.1.4 too.

Comment by dimir [ 2012 Jul 06 ]

Would it be possible to also attach zabbix_server.log? DebugLevel=4 would be great but if not possible DebugLevel=3 would be still helpful.

Comment by Renato Ramonda [ 2012 Jul 16 ]

I can confirm, that disabling discovery solve the problem.

Comment by Oleksii Zagorskyi [ 2012 Jul 16 ]

We know that LLD engine performs a LOT of not 100% required DB updates which I suppose is a root of the problem.
Maybe it could be improved ? i.e. do not update the DB records when it's not required ?

<richlv> do we have a separate issue about that ? if not, one should be created

zalex_ua I think this issue is very suitable for problem I described. If some devs will suggest to create separate issue - I'll do that.

Comment by Alexander Vladishev [ 2012 Aug 09 ]

Related issue: ZBX-4878

Comment by Alexey [ 2012 Dec 19 ]

Guys, what you need to fix it? The bug is open for a long time.

Comment by Guy Van Sanden [ 2012 Dec 20 ]

I did some tuning to MySQL and these settings made the problem go away:

max_connections = 400
query_cache_limit = 1M
query_cache_size = 128M
max_heap_table_size = 256M
innodb_lock_wait_timeout=500
innodb_buffer_pool_size=3000M
innodb_flush_method=O_DIRECT

notably, the lock_wait_timeout made the difference

Comment by Chris Reeves [ 2013 Feb 05 ]

We are also seeing this issue with Zabbix 2.0.2 and PostgreSQL 8.4 (both running on Debian 6).

Recently we've started to see frequent deadlocks, which are making the system largely unusable - often these deadlocks involve item discovery, as noted by a previous commenter.

Here's a snippet from zabbix-server.log:

  1636:20130205:085708.902 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  deadlock detected
DETAIL:  Process 1659 waits for ShareLock on transaction 26507712; blocked by process 1729.
Process 1729 waits for ShareLock on transaction 26507984; blocked by process 1659.
HINT:  See server log for query details.
 [update items set name='Used disk space on $1',key_='vfs.fs.size[C:\\,used]',type=0,value_type=3,data_type=0,delay=60,delay_flex='',history=7,trends=365,trapper_hosts='',units='B',multiplier=0,delta=0,formula='1',logtimefmt='',valuemapid=null,params='',ipmi_sensor='',snmp_community='',snmp_oid='',port='',snmpv3_securityname='',snmpv3_securitylevel=0,snmpv3_authpassphrase='',snmpv3_privpassphrase='',authtype=0,username='',password='',publickey='',privatekey='',description='',interfaceid=73,flags=4 where itemid=28926;
update item_discovery set key_='vfs.fs.size[{#FSNAME},used]',lastcheck=1360054627,ts_delete=0 where itemid=28926 and parent_itemid=28632;
update items set name='Used disk space on $1',key_='vfs.fs.size[E:\\,used]',type=0,value_type=3,data_type=0,delay=60,delay_flex='',history=7,trends=365,trapper_hosts='',units='B',multiplier=0,delta=0,formula='1',logtimefmt='',valuemapid=null,params='',ipmi_sensor='',snmp_community='',snmp_oid='',port='',snmpv3_securityname='',snmpv3_securitylevel=0,snmpv3_authpassphrase='',snmpv3_privpassphrase='',authtype=0,username='',password='',publickey='',privatekey='',description='',interfaceid=73,flags=4 where itemid=29422;
update item_discovery set key_='vfs.fs.size[{#FSNAME},used]',lastcheck=1360054627,ts_delete=0 where itemid=29422 and parent_itemid=28632;
]

And the corresponding snippet from postgresql-8.4-main.log:

2013-02-05 08:57:08 GMT ERROR:  deadlock detected
2013-02-05 08:57:08 GMT DETAIL:  Process 1659 waits for ShareLock on transaction 26507712; blocked by process 1729.
        Process 1729 waits for ShareLock on transaction 26507984; blocked by process 1659.
        Process 1659: update items set name='Used disk space on $1',key_='vfs.fs.size[C:\\,used]',type=0,value_type=3,data_type=0,delay=60,delay_flex='',history=7,trends=365,trapper_hosts='',units='B',multiplier=0,delta=0,formula='1',logtimefmt='',valuemapid=null,params='',ipmi_sensor='',snmp_community='',snmp_oid='',port='',snmpv3_securityname='',snmpv3_securitylevel=0,snmpv3_authpassphrase='',snmpv3_privpassphrase='',authtype=0,username='',password='',publickey='',privatekey='',description='',interfaceid=73,flags=4 where itemid=28926;
        update item_discovery set key_='vfs.fs.size[{#FSNAME},used]',lastcheck=1360054627,ts_delete=0 where itemid=28926 and parent_itemid=28632;
        update items set name='Used disk space on $1',key_='vfs.fs.size[E:\\,used]',type=0,value_type=3,data_type=0,delay=60,delay_flex='',history=7,trends=365,trapper_hosts='',units='B',multiplier=0,delta=0,formula='1',logtimefmt='',valuemapid=null,params='',ipmi_sensor='',snmp_community='',snmp_oid='',port='',snmpv3_securityname='',snmpv3_securitylevel=0,snmpv3_authpa
        Process 1729: update items set lastclock=1360054567,lastns=441864580,prevvalue=lastvalue,lastvalue='1' where itemid=23287;
        update items set lastclock=1360054571,lastns=326074725,prevvalue=lastvalue,lastvalue='1' where itemid=23291;
        update items set lastclock=1360054572,lastns=854848589,prevvalue=lastvalue,lastvalue='256' where itemid=23292;
        update items set lastclock=1360054574,lastns=459605438,prevorgvalue='506003581',prevvalue=lastvalue,lastvalue='2639' where itemid=23294;
        update items set lastclock=1360054575,lastns=339624171,prevvalue=lastvalue,lastvalue='1.151250' where itemid=23295;
        update items set lastclock=1360054576,lastns=973642146,prevvalue=lastvalue,lastvalue='1.095000' where itemid=23296;
        update items set lastclock=1360054577,lastns=61672013,prevvalue=lastvalue,lastvalue='1.112500' where itemid=23297;
        update items set lastclock=1360054578,lastns=36596477,prevorgvalue='1104252186',prevvalue=lastvalue,lastvalue='5508' where itemid=23298;
        update items set lastclock=1360054563,lastns=387421557,prevvalue=lastvalu
2013-02-05 08:57:08 GMT HINT:  See server log for query details.
2013-02-05 08:57:08 GMT STATEMENT:  update items set name='Used disk space on $1',key_='vfs.fs.size[C:\\,used]',type=0,value_type=3,data_type=0,delay=60,delay_flex='',history=7,trends=365,trapper_hosts='',units='B',multiplier=0,delta=0,formula='1',logtimefmt='',valuemapid=null,params='',ipmi_sensor='',snmp_community='',snmp_oid='',port='',snmpv3_securityname='',snmpv3_securitylevel=0,snmpv3_authpassphrase='',snmpv3_privpassphrase='',authtype=0,username='',password='',publickey='',privatekey='',description='',interfaceid=73,flags=4 where itemid=28926;
        update item_discovery set key_='vfs.fs.size[{#FSNAME},used]',lastcheck=1360054627,ts_delete=0 where itemid=28926 and parent_itemid=28632;
        update items set name='Used disk space on $1',key_='vfs.fs.size[E:\\,used]',type=0,value_type=3,data_type=0,delay=60,delay_flex='',history=7,trends=365,trapper_hosts='',units='B',multiplier=0,delta=0,formula='1',logtimefmt='',valuemapid=null,params='',ipmi_sensor='',snmp_community='',snmp_oid='',port='',snmpv3_securityname='',snmpv3_securitylevel=0,snmpv3_authpassphrase='',snmpv3_privpassphrase='',authtype=0,username='',password='',publickey='',privatekey='',description='',interfaceid=73,flags=4 where itemid=29422;
        update item_discovery set key_='vfs.fs.size[{#FSNAME},used]',lastcheck=1360054627,ts_delete=0 where itemid=29422 and parent_itemid=28632;

We can get an idea of how frequently this is happening by grepping the most recent postgres logfile for deadlock:

2013-02-03 06:40:58 GMT ERROR:  deadlock detected
2013-02-03 07:34:48 GMT ERROR:  deadlock detected
2013-02-03 08:04:43 GMT ERROR:  deadlock detected
2013-02-03 08:34:59 GMT ERROR:  deadlock detected
2013-02-03 08:38:20 GMT ERROR:  deadlock detected
2013-02-03 09:34:04 GMT ERROR:  deadlock detected
2013-02-03 09:34:04 GMT ERROR:  deadlock detected
2013-02-03 11:01:33 GMT ERROR:  deadlock detected
2013-02-03 11:34:22 GMT ERROR:  deadlock detected
2013-02-03 12:47:05 GMT ERROR:  deadlock detected
2013-02-03 13:32:29 GMT ERROR:  deadlock detected
2013-02-03 14:40:10 GMT ERROR:  deadlock detected
2013-02-03 15:32:47 GMT ERROR:  deadlock detected
2013-02-03 16:32:09 GMT ERROR:  deadlock detected
2013-02-03 16:38:19 GMT ERROR:  deadlock detected
2013-02-03 22:36:28 GMT ERROR:  deadlock detected
2013-02-03 22:36:30 GMT ERROR:  deadlock detected
2013-02-03 23:39:20 GMT ERROR:  deadlock detected
2013-02-04 00:37:03 GMT ERROR:  deadlock detected
2013-02-04 00:56:51 GMT ERROR:  deadlock detected
2013-02-04 02:57:55 GMT ERROR:  deadlock detected
2013-02-04 03:34:08 GMT ERROR:  deadlock detected
2013-02-04 04:38:47 GMT ERROR:  deadlock detected
2013-02-04 05:38:57 GMT ERROR:  deadlock detected
2013-02-04 05:38:58 GMT ERROR:  deadlock detected
2013-02-04 09:38:02 GMT ERROR:  deadlock detected
2013-02-04 12:00:01 GMT ERROR:  deadlock detected
2013-02-04 12:33:59 GMT ERROR:  deadlock detected
2013-02-04 13:32:20 GMT ERROR:  deadlock detected
2013-02-04 13:37:33 GMT ERROR:  deadlock detected
2013-02-04 13:50:17 GMT ERROR:  deadlock detected
2013-02-04 14:39:38 GMT ERROR:  deadlock detected
2013-02-04 15:39:18 GMT ERROR:  deadlock detected
2013-02-04 16:33:42 GMT ERROR:  deadlock detected
2013-02-04 17:37:17 GMT ERROR:  deadlock detected
2013-02-04 19:37:54 GMT ERROR:  deadlock detected
2013-02-04 19:50:21 GMT ERROR:  deadlock detected
2013-02-04 20:37:14 GMT ERROR:  deadlock detected
2013-02-04 21:39:23 GMT ERROR:  deadlock detected
2013-02-04 21:41:38 GMT ERROR:  deadlock detected
2013-02-05 00:36:33 GMT ERROR:  deadlock detected
2013-02-05 05:58:43 GMT ERROR:  deadlock detected
2013-02-05 08:57:08 GMT ERROR:  deadlock detected
Comment by Anton Aksola [ 2013 Feb 14 ]

I'm also seeing this with MySQL 5.5.27 and Zabbix 2.0.4

I'll try Guy Van Sanden's workaround.

Comment by Anton Aksola [ 2013 Feb 15 ]

Here is some data from our environment:

Innotop
=======

When   Load  Cxns     QPS    Slow   Se/In/Up/De%  QCacheHit  KCacheHit  BpsIn    BpsOut 
Now    0.00  161      1.60k  0      69/ 0/29/ 0      22.45%    100.00%  281.01k  358.13k
Total  0.00    1.46k  1.87k  7.53k  41/ 0/58/ 0      90.84%    100.00%  337.20k    1.30M

Cmd    ID      State               User    Host           DB      Time   Query                                                    
Query   14283  Updating            zabbix  localhost      zabbix  00:49  update items set lastclock=1360917672,lastns=336481966,pr
Query   14280  Updating            zabbix  localhost      zabbix  00:46  update items set lastclock=1360917671,lastns=106638034,pr
Query   14282  Updating            zabbix  localhost      zabbix  00:46  update items set lastclock=1360917673,lastns=539945615,pr
Query   14286  Updating            zabbix  localhost      zabbix  00:46  update items set lastclock=1360917668,lastns=654048890,pr
Query   14287  Updating            zabbix  localhost      zabbix  00:46  update items set lastclock=1360917675,lastns=491710979,pr
Query   14288  Updating            zabbix  localhost      zabbix  00:46  update items set lastclock=1360917675,lastns=145399825,pr
Query   14289  Updating            zabbix  localhost      zabbix  00:46  update items set lastclock=1360917669,lastns=784594499,pr
Query   14281  Updating            zabbix  localhost      zabbix  00:44  update items set lastclock=1360917677,lastns=386003654,pr
Query   14274  Updating            zabbix  localhost      zabbix  00:43  update items set name='Administrative state uplink1, IP i
Query   14081  Writing to net      zabbix  localhost      zabbix  00:00  select i.itemid,i.hostid,h.proxy_hostid,i.type,i.data_typ
Query   14095  Copying to tmp tab  zabbix  localhost      zabbix  00:00  select distinct t.triggerid,t.expression from triggers t,
Query   14102  Writing to net      zabbix  localhost      zabbix  00:00  select distinct i.itemid from items i,item_discovery id w
Query   14215  Copying to tmp tab  zabbix  localhost      zabbix  00:00  select distinct i.itemid from items i,item_discovery id w
Query   14226  Copying to tmp tab  zabbix  localhost      zabbix  00:00  select distinct t.triggerid,t.expression from triggers t,
Query   14277  Copying to tmp tab  zabbix  localhost      zabbix  00:00  select distinct t.triggerid,t.expression from triggers t,
Press any key to continue


INNODB ENGINE STATUS
====================
...
---TRANSACTION 977AC509, ACTIVE 102 sec starting index read
mysql tables in use 3, locked 0
274 lock struct(s), heap size 31160, 22239 row lock(s), undo log entries 14816
MySQL thread id 14223, OS thread handle 0x7ffdf9c91700, query id 116167938 localhost zabbix Copying to tmp table
select distinct t.triggerid,t.expression from triggers t,functions f,items i where t.triggerid=f.triggerid and f.itemid=i.itemid a
nd i.hostid=10126 and t.description='{HOST.NAME}: Interface OperState is DOWN while AdminState is up GigabitEthernet2/28 ({ITEM.VA
LUE})' and t.triggerid<>25672
Trx read view will not see trx with id >= 977AC50A, sees < 977ABC48
---TRANSACTION 977AC31F, ACTIVE 118 sec starting index read
mysql tables in use 3, locked 0
286 lock struct(s), heap size 47544, 22003 row lock(s), undo log entries 14656
MySQL thread id 14102, OS thread handle 0x7ffdf8d96700, query id 116167563 localhost zabbix Copying to tmp table
select distinct t.triggerid,t.expression from triggers t,functions f,items i where t.triggerid=f.triggerid and f.itemid=i.itemid a
nd i.hostid=10124 and t.description='{HOST.NAME}: Interface OperState is DOWN while AdminState is up GigabitEthernet4/5 ({ITEM.VAL
UE})' and t.triggerid<>24456
Trx read view will not see trx with id >= 977AC320, sees < 977ABC48
---TRANSACTION 977AC182, ACTIVE 131 sec starting index read
mysql tables in use 3, locked 0
269 lock struct(s), heap size 31160, 22288 row lock(s), undo log entries 14848
MySQL thread id 14215, OS thread handle 0x7ffbbdd9d700, query id 116167845 localhost zabbix Copying to tmp table
select distinct t.triggerid,t.expression from triggers t,functions f,items i where t.triggerid=f.triggerid and f.itemid=i.itemid and i.hostid=10125 and t.description='{HOST.NAME}: Interface OperState is DOWN while AdminState is up Vlan206 ({ITEM.VALUE})' and t.triggerid<>120022
Trx read view will not see trx with id >= 977AC183, sees < 977ABC48
...

It seems that these transactions (SELECT) are locking considerable amount of rows in items table thus blocking UPDATE queries from both poller and paraller lld processes. At least this has very undesired effect with LLD: the items are being marked for deletion.

Comment by Anton Aksola [ 2013 Feb 21 ]

After disabling LLD batch operation as a transaction the problem went away. So I basically commented out DBbegin() and DBcommit() in lld.c. This also has the effect of better performance at least in MySQL as history syncer process is no longer blocked by LLD holding items table locks. So basically we are committing every update operation right after the statement is executed (autocommit mode).

I will see if this has any undesired side effects. One would think there are none as LLD is the only one that updates item parameters.

After 24 hours I haven't seen a single "Lock wait timeout exceeded" -message in zabbix_server.log and no side effects so far. So at least in my case, the LLD was the cause of DB deadlocks.

Edit: also my initial LLD problem vanished as well (actually discovered items being marked for deletion).

Comment by Alf Solli [ 2013 Feb 22 ]

Care to give some details on which lines you commented out? All instances of DBbegin() and DBcommit() ?

I adjusted the discovery frequency to absolute maximum (864000 or something) for it to become less of a problem. But now with ~1000 devices, the slow queries are getting more and more frequent.

Still no negative impacts? I'm curious enough to try to this on our production servers, or at least replicate the whole thing.

Anyone else had success with this?

  • Alf -
Comment by Anton Aksola [ 2013 Feb 25 ]

Alf,

diff -uNrp zabbix-2.0.4/src/libs/zbxdbhigh/lld.c zabbix-2.0.4nblpatches/src/libs/zbxdbhigh/lld.c 
--- zabbix-2.0.4/src/libs/zbxdbhigh/lld.c       2012-12-08 13:09:15.000000000 +0200
+++ zabbix-2.0.4nblpatches/src/libs/zbxdbhigh/lld.c     2013-02-20 10:17:40.000000000 +0200
@@ -2236,7 +2236,7 @@ void      DBlld_process_discovery_rule(zbx_ui
        if (0 == hostid)
                goto clean;
 
-       DBbegin();
+       //DBbegin();
 
        error = zbx_strdup(error, "");
 
@@ -2314,7 +2314,7 @@ error:
 
        DBexecute("%s", sql);
 
-       DBcommit();
+       //DBcommit();
 clean:
        zbx_free(error);
        zbx_free(db_error);

No negative effects so far but I don't think this will resolve the issue completely. According to my logs, deadlocks are still happening but much more rarely.

Comment by Dimitri Bellini [ 2013 Mar 07 ]

I have found the same issue on my installation:
RHEL 5.7 X64 - MySQL 5.5.28 - 64GB RAM - DB on SAN Storage (Dual Path) - DB using Partitioning Table for History

As most of all notice this problem seems related to DB row Locking during the Item Update.

Zabbix_server.log


5590:20130307:090652.499 [Z3005] query failed: [1205] Lock wait timeout exceeded; try restarting transaction [update items set lastclock=1362643489,lastns=849268000,prevvalue=lastvalue,lastvalue='0.001129' where itemid=23269;
update items set lastclock=1362643490,lastns=834822000,prevvalue=lastvalue,lastvalue='0.069760' where itemid=23270;
update items set lastclock=1362643489,lastns=847835000,prevorgvalue='0',prevvalue=lastvalue,lastvalue='0' where itemid=23449;
update items set lastclock=1362643490,lastns=410521000,prevorgvalue='0',prevvalue=lastvalue,lastvalue='0' where itemid=23450;
update items set lastclock=1362643489,lastns=849696000,prevvalue=lastvalue,lastvalue='486273024' where itemid=23509;
update items set lastclock=1362643490,lastns=481507000,prevvalue=lastvalue,lastvalue='enabled(1)' where itemid=62209;
update items set lastclock=1362643490,lastns=812825000,prevvalue=lastvalue,lastvalue='enabled(1)' where itemid=62210;


The strange thing is in my case this happend near every hours and History Syncer going to 100% usage for some minute.
I have try to tweak every parameter on MySQL configuration file but without result.
If someone have some suggestion please help
Regards

Comment by Dimitri Bellini [ 2013 Mar 08 ]

Very strange after i have changed this two parameters on MySQL configuration file (my.cf)i have not see "lock wait update" error.


max_allowed_packet = 256M
innodb_lock_wait_timeout = 500


But i still have every hour a history syncer peek.

Comment by Dimitri Bellini [ 2013 Mar 11 ]

The "Lock wait timeout on update" is not fixed... After some days they came back on my zabbix_server.log.

Comment by Alexander Vladishev [ 2013 Mar 22 ]

Fixed in the development branch svn://svn.zabbix.com/branches/dev/ZBX-5225

Comment by dimir [ 2013 Mar 26 ]

Successfully tested. Please review my small changes in r34653.

Comment by Alexander Vladishev [ 2013 Mar 27 ]

Fixed in versions pre-2.0.6 r34655 and pre-2.1.0 (trunk) r34664.

Comment by richlv [ 2013 Mar 27 ]

(1) let's mention this in whatsnew (one commit message said "each prototype will be processed in own transaction" - anything else ?)

sasha RESOLVED

https://www.zabbix.com/documentation/2.0/manual/introduction/whatsnew206?&#daemon_improvements

<richlv> ooooh, that was a really great description. i reordered it and added some notes on the test runs from the comment here - please review

sasha Thanks! CLOSED

Comment by dimir [ 2013 Mar 28 ]

(2) [S] Split one big transaction for the whole discovery rule into a transaction for each item/trigger/graph prototype.

RESOLVED by me and sasha in r34667:r34694.

CLOSED

Comment by Alexander Vladishev [ 2013 Mar 28 ]

Processing of each prototypes in own transaction are available in versions pre-2.0.6 r34696 and pre-2.1.0 (trunk) r34697.

Comment by Alexander Vladishev [ 2013 Mar 28 ]

Whats new:

  1. Each item, trigger or graph prototype is processed in own transaction. Deadlocks any more won't be!
  2. Better processing of item prototypes.
    • Already discovered items will be updated only if something changed.
    • Only changed fields will be updated.
    • Decreased number and size of sql queries.
    • Better validation of fields after resolving of LLD macros.
      • validity of UTF-8 sequences
      • validity of fields lengths

Tests:

One discovery rule with 8 item prototypes. Received 256 new records.

2.0.5

Action Time, sec. Number of SQL statements Total size of SQL statements, bytes
first discovery (creating of 8 * 256 = 2048 items) 1.272079 8290 1668647
next discovery (updating) 3.657499 8204 2445575
next discovery (changed key in item prototype) 10.474360 10250 2703623

2.0.5 with the fix

Action Time, sec. Number of SQL statements Total size of SQL statements, bytes
first discovery (creating of 8 * 256 = 2048 items) 0.353674 129 888988
next discovery (updating) 0.112492 33 6807
next discovery (changed key in item prototype) 0.731188 49 350631
Comment by Alexander Vladishev [ 2013 Apr 25 ]

(3) Broken support of SNMPv3 SHA/AES authentication in trunk when doing creating of items based on item prototypes.

sasha Fixed in r35273. CLOSED

Generated at Thu Apr 17 12:54:49 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.