[ZBXNEXT-6921] Use primary keys for historical tables (PoC) Created: 2021 Sep 17  Updated: 2025 Feb 20  Resolved: 2022 Feb 14

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: 6.0.0alpha7, 6.0 (plan)

Type: Change Request Priority: Trivial
Reporter: Rostislav Palivoda Assignee: Dmitrijs Goloscapovs
Resolution: Fixed Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screenshot from 2021-11-17 12-53-09.png    
Issue Links:
Causes
causes ZBX-20603 Typo in PK migration guide Closed
causes ZBX-22695 Message "query failed due to primary ... Closed
causes ZBX-20186 Incorrect result displayed on Search ... Closed
caused by ZBXNEXT-3089 history* primary keys Closed
Duplicate
is duplicated by ZBX-20254 Zabbix problem with Percona XtraDB Cl... Closed
is duplicated by ZBX-23444 [Z3008] query failed due to primary k... Closed
Team: Team A
Sprint: Sprint 80 (Sep 2021), Sprint 81 (Oct 2021), Sprint 82 (Nov 2021), Sprint 83 (Dec 2021), Sprint 84 (Jan 2022), Sprint 85 (Feb 2022)
Story Points: 2

 Description   
  1. Zabbix must use primary keys for all tables including history* tables
  1. New schema will be used only for Zabbix 6.0 and later installations
    1. Older schema will be used when upgrading to Zabbix 6.0
  2. Existing database queries must be reviewed to take advantage of the primary keys
    1. For example: the function that calculates last value for a given itemid
  3. Zabbix Server syncer processes must nicely handle duplicate data without loss of data in case if a transaction that writes data to history fails
    1. Possible solution: if transaction fails due to "primary key violation" then we may split data into two chunks and retry, then repeat. Alternatively we may try to check for duplicates by selecting data from the database
  4. We must make sure that the new schema works fine with PostgreSQL, MySQL, MariaDB and Oracle and does not create any issues with existing partitioning schemas and TimescaleDB logic
  5. We must research benefits of using primary keys v.s. older schema and document findings at least for MySQL and PostgreSQL: performance gains, saving of disk space
    1. It must be done for a large data set


 Comments   
Comment by Vladislavs Sokurenko [ 2021 Oct 06 ]

Original bug report:
ZBXNEXT-3089

Comment by Dmitrijs Goloscapovs [ 2021 Oct 12 ]

Test results

DB Setup Data size Index size Sync time record count selects 1010000 inserts (last day partition)
MariaDB 10.4.21 Index + Partitioning 15262 8342 1813.06 262393891 70.86, 147.13, 135.07 1m14.036s
MariaDB 10.4.21 PK + Partitioning 14618 0 1330.55 262391437 56.59, 113.18, 109.13 1m9.996s
MariaDB 10.6.4 Index + Partitioning 15184 8336 1802.19 262392533 69.33, 144.17, 129.78 1m13.141s
MariaDB 10.6.4 PK + Partitioning 14620 0 1317.32 262392217 55.34, 105.51, 110.97 1m9.083s
Mysql 8.0.26 Index + Partitioning 15667 8317 2193.11 262393746 79.37, 162.44, 143.60 1m27.083s
Mysql 8.0.26 PK + Partitioning 14856 0 1798.21 262398902 69.53, 139.64, 118.36 1m24.426s
MariaDB 10.4 Index w/o Partitioning 15512 6906 1873.41 262394526 84.13, 195.46, 169.24 1m10.562s
MariaDB 10.4 PK w/o Partitioning 13688.97 0 1341.2 262394469 82.79, 176.25, 157.73 1m7.387s
PostgreSQL 14.0 Index + Partitioning 24GB 7.3GB 1373.16 262325176 35550.794 ms, 36656.102 ms, 71350.073 ms 58.765s
PostgreSQL 14.0 PK + Partitioning 12GB 11GB 1312.29 262556197 36320.581 ms, 37214.731 ms, 71134.220 ms 59.385s
PG 13.4 + TSDB 2.4.2 Index, no compression 13GB 10GB 1738.21 262141305 34968ms, 35734ms, 36152ms 5m15.600s
PG 13.4 + TSDB 2.4.2 PK, no compression 13GB 14GB 1717.43 262155334 37345ms, 38572ms, 38703ms 5m43.560s

 

Data was initially sent to the proxy (with server being down), and then total history sync time was collected.

Selects used:

select * from history_uint where clock < 1631836800;
select * from history_uint where clock > 1631836800 and clock < 1631923200;
select * from history_uint where clock > 1631836800 and clock < 1632009600;

Inserts contained exactly the same queries for each db test (resembling normal history data, not random).
Queries were executed without SQL_NO_CACHE.

In case when partitioning was used, tables were partitioned by clock, one partition per day (24 hours / 86400 seconds). Data was resembling normal history data, without itemid-clock-ns duplicates.

Primary key was configured as follows: 

PRIMARY KEY (`itemid`,`clock`,`ns`)

 
Relative differences between same db versions should be taken into account.
 

Comment by Dmitrijs Goloscapovs [ 2021 Nov 05 ]

Available in versions:

Comment by Marco Hofmann [ 2021 Nov 13 ]

So if I understand that correctly, when I update our Zabbix server that exists since 2.0, from 5.4 to 6.0, nothing will change, right?

Comment by Vladislavs Sokurenko [ 2021 Nov 13 ]

That is right starko, history tables can only be upgraded manually to use primary keys.

Comment by Marco Hofmann [ 2021 Nov 15 ]

What impact does that have for all customers, who upgrade to 6.0 instead of re-installing? I'm no DBA, but if I read this table correctly, it would become a little bit faster, but not tremendously?

(Debian 11 bullseye, MariaDB 10.5, no Partitioning)

Comment by Alexei Vladishev [ 2021 Nov 16 ]

starko , Zabbix will not switch to primary keys automatically when upgrading from earlier releases. It should be done manually. If you install a fresh Zabbix 6.0 then the DB schema will have primary keys for historical tables. Depending on DB engine you may expect better performance and lower (up-to 30%) disk usage.

Comment by Marco Hofmann [ 2021 Nov 16 ]

alexei Thank you very much for the clarification!

Last time something fundamentally changed in the DB schema, the switch to new default charset utf8 and collation utf8_bin for MySQL & MariaDB, the Zabbix server database upgrade process didn't touch the schema, but Zabbix company provided a conversion script, via ZBX-17357 which resulted in the following documentation article, which was extremely useful to us: https://www.zabbix.com/documentation/current/manual/appendix/install/db_charset_coll

Will there be a similar conversion script, for customers who want to follow the primary key schema change, after they've upgraded to 6.0 LTS?

Comment by Alexei Vladishev [ 2021 Nov 16 ]

starko , I should have mentioned it earlier, the database conversion scripts will be provided for all support DB engines. It will be part of the official documentation and/or release notes.

Comment by Marco Hofmann [ 2021 Nov 16 ]

Wow, that's great news, thank you very much!

Comment by Dimitri Bellini [ 2021 Nov 17 ]

@Alexei Very interesting topic! Do you think we can have more details about these numbers:

It seems huge difference for Postgres+TSDB for ingestion time and MariaDB seems more fast for "select" compared to Postgres (without TSDB).

Thanks so much

Comment by Alexei Vladishev [ 2021 Nov 17 ]

dimitri.bellini , take these benchmarks with a grain of salt as the configuration of the databases and hardware was different.

Comment by Dimitri Bellini [ 2021 Nov 17 ]

@Alexei ok thanks

Comment by Rainer Stumbaum [ 2021 Nov 23 ]

So what is the current recommendation for operating larger Zabbix installations?
Postgres+TSDB or MySQL?
Why would you run these tests on different hardware/different configuration? To give a bad impression for Postgres+TSDB?

When my colleague saw these results he bashed my for migrating away from MySQL to TimeScaleDB last year with huge pain points (https://support.zabbix.com/browse/ZBX-16347?focusedCommentId=416975&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-416975 ...) .

Sometimes it seems to me your "test environment" while developing consists of monitoring the laptop running your zabbix container and the switch port in front of it.

Comment by Alexei Vladishev [ 2021 Nov 23 ]

rstumbaum, I can assure you that we have a whole range of test platforms available starting from beefy bare metal servers to the whole range of AWS/GCP/Azure server instances and K8S clusters. The test we ran here had only one goal, i.e. discover any drawbacks of having primary keys as well as identify possible benefits. These tests were not aimed to benchmarking of the performance of various storage engines.  

There is no silver bullet when it comes to selection of an ideal database. It always depends on your priorities (like speed vs maintainability vs HA options etc).

Comment by Dmitrijs Goloscapovs [ 2022 Jan 11 ]

Available in versions:

Documentation updated:

Comment by dimir [ 2022 Jan 11 ]

The primary keys upgrade guide does not mention removing the CSV files.

Also, will this work for users that use partitions? If not, will we provide any help to them?

Comment by MArk [ 2022 Feb 05 ]

In 3 Database upgrade to primary keys, section Improving performance,

both external links (MySQL 5.7, MySQL 8.0) are pointing to "Not Found" content.

Comment by MArk [ 2022 Feb 05 ]

Also, "3 Database upgrade to primary keys" document has no mention to backup the database first.

Comment by Martins Valkovskis [ 2022 Feb 09 ]

markfree, the issues you mentioned have been fixed. Thank you.

Generated at Fri Apr 11 12:25:30 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.