[ZBX-15438] Zabbix unresponsive after '[Z3005] query failed: [1205] Lock wait timeout exceeded' Created: 2019 Jan 11  Updated: 2019 Jan 11  Resolved: 2019 Jan 11

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Frontend (F), Server (S)
Affects Version/s: 4.0.2
Fix Version/s: None

Type: Incident report Priority: Blocker
Reporter: Erik De Neve Assignee: Unassigned
Resolution: Won't fix Votes: 0
Labels: database, lld
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Galera MariaDB cluser with 3 nodes (2 masters and 1 arbitrattor).
Physical Machines (192GB mem, SSD disks, Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz)
Running ubuntu 18.04, mariadb 10.1, Zabbix 4.0.2.
Many SNMP checks (network devices) and some external checks.
Intensive use of LLD.


Attachments: File mariadb.conf     File zabbix_server.conf     File zabbix_server.log    

 Description   

 

We run multiple times into the same issue. Zabbix Server is not crached, but data is not collected anymore and the frontend is blocked. In the log we see always the same kind of queries failing and always during the houskeeper execution.

Please note we are intensively using LLD (SNMP and external checks)

It seems that it' started some seconds (+-30") after the housekeeper is executed.
We see always errors on updating item_discovery (see zabbix_server.log attached)

e.g.

22017:20190110:130525.841 [Z3005] query failed: [1205] Lock wait timeout exceeded; try restarting transaction [update item_discovery set lastcheck=1547121824 where (itemid between 616377 and 616391 or itemid between 669993 and 670010 or itemid between 733218 and 733223 or itemid between 735766 and 735771 or itemid between 774180 and 774221 or itemid in (616338,616339,616340,616359,616360,616361));
]
 22017:20190110:130525.841 slow query: 101.629963 sec, "update item_discovery set lastcheck=1547121824 where (itemid between 616377 and 616391 or itemid between 669993 and 670010 or itemid between 733218 and 733223 or itemid between 735766 and 735771 or itemid between 774180 and 774221 or itemid in (616338,616339,616340,616359,616360,616361));

Zabbix stops collecting data, the frontend is not responding, we have to restart MariaDB and Zabbix to get to a stable situation.

It's difficult to reproduce, because it happens at random times (already 3 times last 3weeks)



 Comments   
Comment by Arturs Lontons [ 2019 Jan 11 ]

Hi,
Thank you for reporting the issue.

The root cause of the issue seems to be related to database performance. There are some additional steps that you could perform to improve the DB query performance, for example - enabling large pages, increasing the LLD update interval, implementing DB partitioning and so on. Feel free to use our forums located at https://www.zabbix.com/forum to ask for specific Database performance tuning advice for your specific environment.

Since this is not a Zabbix bug report, I will be closing the ticket.

Generated at Thu Apr 25 09:09:25 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.