Steps to reproduce:
- This is a clean install starting at 6.0.3 LTS that has had the point upgrades done. It is a distributed VM deployment. 2 separate frontend VM's, 2 server VM's with the new built in HA feature and a single mariadb VM. All VM's are Ubuntu 20.04.3 LTS.
- I had seen periodic slow query messages in the server log since I started the setup and hadn't found it to be problem, but over the last day the system became unusable. We had 2 cores and the mariadb was maxing both. We bumped up to 4 cores and then it was maxing all 4 cores.
- Slow queries were 500+ seconds long. I will include some processlist outputs in attachments.
- My system is running nearly 80,000 items. new values per second 192.95
- I did some searching and found a thread referencing a database table that was missing a key, so we checked our create tables statements and they all had keys in them. (I created them using "/usr/share/doc/zabbix-sql-scripts/mysql/server.sql.gz" per the install instructions)
- The longest queries we found were on the triggers, functions, and graphs tables. While they all had keys they did not have a composite index. With no other changes besides creating the index on the table, query times on the tables went from 500+ seconds to 2-3 seconds.
- This query was running for 43 seconds at the time of checking:
select distinct g.graphid,g.name,g.width,g.height,g.yaxismin,g.yaxismax,g.show_work_period,g.show_triggers,g.graphtype,g.show_legend,g.show_3d,g.percent_left,g.percent_right,g.ymin_type,g.ymin_itemid,g.ymax_type,g.ymax_itemid,g.discover from graphs g,graphs_items gi,items i,item_discovery id where g.graphid=gi.graphid and gi.itemid=i.itemid and i.itemid=id.itemid and id.parent_itemid=43199
- We added the command "create index graphs_items_i_2 on graphs_items(graphid,itemid);" and now that query takes 0.001 seconds with no other changes.
- We made similar changes on functions and triggers tables but I can't find in my logs the exact commands we ran to give for examples.
I would like to know if this was the proper fix for this? Should there be a different approach? While the massively slow queries are now resolved for the 3 worst offending tables, it appears there may be other tables that are facing simliar issues. trigger_depends and items are my persistent slow queries now in the 3-5 second range.
I'm not a database expert and so I'm not sure what relevant data I should include. Please let me know what other outputs are needed.