[#ZBXNEXT-6921] Use primary keys for historical tables (PoC)

[ZBXNEXT-6921] Use primary keys for historical tables (PoC) Created: 2021 Sep 17 Updated: 2025 Feb 20 Resolved: 2022 Feb 14
Status:	Closed
Project:	ZABBIX FEATURE REQUESTS
Component/s:	Server (S)
Affects Version/s:	None
Fix Version/s:	6.0.0alpha7, 6.0 (plan)

Type:

Change Request

Priority:

Trivial

Reporter:

Rostislav Palivoda

Assignee:

Dmitrijs Goloscapovs

Resolution:

Fixed

Votes:

2

Labels:

None

Remaining Estimate:

Not Specified

Time Spent:

Not Specified

Original Estimate:

Not Specified

Attachments:

Screenshot from 2021-11-17 12-53-09.png

Issue Links:

Causes
causes	~~ZBX-20603~~	Typo in PK migration guide	Closed
causes	~~ZBX-22695~~	Message "query failed due to primary ...	Closed
causes	~~ZBX-20186~~	Incorrect result displayed on Search ...	Closed
caused by	~~ZBXNEXT-3089~~	history* primary keys	Closed
Duplicate
is duplicated by	~~ZBX-20254~~	Zabbix problem with Percona XtraDB Cl...	Closed
is duplicated by	~~ZBX-23444~~	[Z3008] query failed due to primary k...	Closed

Team:

Team A

Sprint:

Sprint 80 (Sep 2021), Sprint 81 (Oct 2021), Sprint 82 (Nov 2021), Sprint 83 (Dec 2021), Sprint 84 (Jan 2022), Sprint 85 (Feb 2022)

Story Points:

2

Description

Zabbix must use primary keys for all tables including history* tables

New schema will be used only for Zabbix 6.0 and later installations
1. Older schema will be used when upgrading to Zabbix 6.0
Existing database queries must be reviewed to take advantage of the primary keys
1. For example: the function that calculates last value for a given itemid
Zabbix Server syncer processes must nicely handle duplicate data without loss of data in case if a transaction that writes data to history fails
1. Possible solution: if transaction fails due to "primary key violation" then we may split data into two chunks and retry, then repeat. Alternatively we may try to check for duplicates by selecting data from the database
We must make sure that the new schema works fine with PostgreSQL, MySQL, MariaDB and Oracle and does not create any issues with existing partitioning schemas and TimescaleDB logic
We must research benefits of using primary keys v.s. older schema and document findings at least for MySQL and PostgreSQL: performance gains, saving of disk space
1. It must be done for a large data set

Comments

Comment by Vladislavs Sokurenko [ 2021 Oct 06 ]

Original bug report:
~~ZBXNEXT-3089~~

Comment by Dmitrijs Goloscapovs [ 2021 Oct 12 ]

Test results

DB	Setup	Data size	Index size	Sync time	record count	selects	1010000 inserts (last day partition)
MariaDB 10.4.21	Index + Partitioning	15262	8342	1813.06	262393891	70.86, 147.13, 135.07	1m14.036s
MariaDB 10.4.21	PK + Partitioning	14618	0	1330.55	262391437	56.59, 113.18, 109.13	1m9.996s
MariaDB 10.6.4	Index + Partitioning	15184	8336	1802.19	262392533	69.33, 144.17, 129.78	1m13.141s
MariaDB 10.6.4	PK + Partitioning	14620	0	1317.32	262392217	55.34, 105.51, 110.97	1m9.083s
Mysql 8.0.26	Index + Partitioning	15667	8317	2193.11	262393746	79.37, 162.44, 143.60	1m27.083s
Mysql 8.0.26	PK + Partitioning	14856	0	1798.21	262398902	69.53, 139.64, 118.36	1m24.426s
MariaDB 10.4	Index w/o Partitioning	15512	6906	1873.41	262394526	84.13, 195.46, 169.24	1m10.562s
MariaDB 10.4	PK w/o Partitioning	13688.97	0	1341.2	262394469	82.79, 176.25, 157.73	1m7.387s
PostgreSQL 14.0	Index + Partitioning	24GB	7.3GB	1373.16	262325176	35550.794 ms, 36656.102 ms, 71350.073 ms	58.765s
PostgreSQL 14.0	PK + Partitioning	12GB	11GB	1312.29	262556197	36320.581 ms, 37214.731 ms, 71134.220 ms	59.385s
PG 13.4 + TSDB 2.4.2	Index, no compression	13GB	10GB	1738.21	262141305	34968ms, 35734ms, 36152ms	5m15.600s
PG 13.4 + TSDB 2.4.2	PK, no compression	13GB	14GB	1717.43	262155334	37345ms, 38572ms, 38703ms	5m43.560s

Data was initially sent to the proxy (with server being down), and then total history sync time was collected.

Selects used:

select * from history_uint where clock < 1631836800;
select * from history_uint where clock > 1631836800 and clock < 1631923200;
select * from history_uint where clock > 1631836800 and clock < 1632009600;

Inserts contained exactly the same queries for each db test (resembling normal history data, not random).
Queries were executed without SQL_NO_CACHE.

In case when partitioning was used, tables were partitioned by clock, one partition per day (24 hours / 86400 seconds). Data was resembling normal history data, without itemid-clock-ns duplicates.

Primary key was configured as follows:

PRIMARY KEY (`itemid`,`clock`,`ns`)

Relative differences between same db versions should be taken into account.

Comment by Dmitrijs Goloscapovs [ 2021 Nov 05 ]

Available in versions:

pre-6.0.0alpha6 f98a1dae8e

Comment by Marco Hofmann [ 2021 Nov 13 ]

So if I understand that correctly, when I update our Zabbix server that exists since 2.0, from 5.4 to 6.0, nothing will change, right?

Comment by Vladislavs Sokurenko [ 2021 Nov 13 ]

That is right starko, history tables can only be upgraded manually to use primary keys.

Comment by Marco Hofmann [ 2021 Nov 15 ]

What impact does that have for all customers, who upgrade to 6.0 instead of re-installing? I'm no DBA, but if I read this table correctly, it would become a little bit faster, but not tremendously?

(Debian 11 bullseye, MariaDB 10.5, no Partitioning)

Comment by Alexei Vladishev [ 2021 Nov 16 ]

starko , Zabbix will not switch to primary keys automatically when upgrading from earlier releases. It should be done manually. If you install a fresh Zabbix 6.0 then the DB schema will have primary keys for historical tables. Depending on DB engine you may expect better performance and lower (up-to 30%) disk usage.

Comment by Marco Hofmann [ 2021 Nov 16 ]

alexei Thank you very much for the clarification!

Last time something fundamentally changed in the DB schema, the switch to new default charset utf8 and collation utf8_bin for MySQL & MariaDB, the Zabbix server database upgrade process didn't touch the schema, but Zabbix company provided a conversion script, via ~~ZBX-17357~~ which resulted in the following documentation article, which was extremely useful to us: https://www.zabbix.com/documentation/current/manual/appendix/install/db_charset_coll

Will there be a similar conversion script, for customers who want to follow the primary key schema change, after they've upgraded to 6.0 LTS?

Comment by Alexei Vladishev [ 2021 Nov 16 ]

starko , I should have mentioned it earlier, the database conversion scripts will be provided for all support DB engines. It will be part of the official documentation and/or release notes.

Comment by Marco Hofmann [ 2021 Nov 16 ]

Wow, that's great news, thank you very much!

Comment by Dimitri Bellini [ 2021 Nov 17 ]

@Alexei Very interesting topic! Do you think we can have more details about these numbers:

It seems huge difference for Postgres+TSDB for ingestion time and MariaDB seems more fast for "select" compared to Postgres (without TSDB).

Thanks so much

Comment by Alexei Vladishev [ 2021 Nov 17 ]

dimitri.bellini , take these benchmarks with a grain of salt as the configuration of the databases and hardware was different.

Comment by Dimitri Bellini [ 2021 Nov 17 ]

@Alexei ok thanks

Comment by Rainer Stumbaum [ 2021 Nov 23 ]

So what is the current recommendation for operating larger Zabbix installations?
Postgres+TSDB or MySQL?
Why would you run these tests on different hardware/different configuration? To give a bad impression for Postgres+TSDB?

When my colleague saw these results he bashed my for migrating away from MySQL to TimeScaleDB last year with huge pain points (https://support.zabbix.com/browse/ZBX-16347?focusedCommentId=416975&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-416975 ...) .

Sometimes it seems to me your "test environment" while developing consists of monitoring the laptop running your zabbix container and the switch port in front of it.

Comment by Alexei Vladishev [ 2021 Nov 23 ]

rstumbaum, I can assure you that we have a whole range of test platforms available starting from beefy bare metal servers to the whole range of AWS/GCP/Azure server instances and K8S clusters. The test we ran here had only one goal, i.e. discover any drawbacks of having primary keys as well as identify possible benefits. These tests were not aimed to benchmarking of the performance of various storage engines.

There is no silver bullet when it comes to selection of an ideal database. It always depends on your priorities (like speed vs maintainability vs HA options etc).

Comment by Dmitrijs Goloscapovs [ 2022 Jan 11 ]

Available in versions:

pre-6.0.0beta2 - 0c7c2bc853, 37acc41791, d8775d8aed, 7654d91f65, 6cf43e9376, 563b2df62b, f98a1dae8e

Documentation updated:

Comment by dimir [ 2022 Jan 11 ]

The primary keys upgrade guide does not mention removing the CSV files.

Also, will this work for users that use partitions? If not, will we provide any help to them?

Comment by MArk [ 2022 Feb 05 ]

In 3 Database upgrade to primary keys, section Improving performance,

both external links (MySQL 5.7, MySQL 8.0) are pointing to "Not Found" content.

Comment by MArk [ 2022 Feb 05 ]

Also, "3 Database upgrade to primary keys" document has no mention to backup the database first.

Comment by Martins Valkovskis [ 2022 Feb 09 ]

markfree, the issues you mentioned have been fixed. Thank you.

Generated at Fri Apr 11 12:25:30 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.