[ZBXNEXT-714] need scalable alternative for the history and items tables Created: 2011 Mar 24 Updated: 2025 Nov 03 |
|
| Status: | Open |
| Project: | ZABBIX FEATURE REQUESTS |
| Component/s: | Server (S) |
| Affects Version/s: | 1.8.2 |
| Fix Version/s: | None |
| Type: | Change Request | Priority: | Major |
| Reporter: | Will Lowe | Assignee: | Alexei Vladishev |
| Resolution: | Unresolved | Votes: | 108 |
| Labels: | cassandra, performance, storage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux, Postgresql |
||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
We have hundreds of monitored servers with thousands of checks in total. The size of the items and various history tables in the Zabbix database is a major scalability problem for us – we've got it running on a very fast RAID array with 10+ disks, but a postgres autovacuum of the items table makes the server almost unusable. Long-term the amount of data we can store in those tables will limit whether we can continue to use Zabbix. Has any thought been given to a more scalable storage mechanism? Some ideas:
|
| Comments |
| Comment by Raymond Kuiper [ 2011 Mar 24 ] |
|
Partitioning of the larger tables will also help in reducing the time it takes to restore a zabbix DB from backup in case of disasters (parallel import). |
| Comment by Daniel Santos [ 2011 Mar 24 ] |
|
This feature will really provide GREAT performance gains in PostgreSQL query plan. Vote for it! |
| Comment by Daniel Santos [ 2011 Mar 24 ] |
|
Maybe this thread could help... http://www.zabbix.com/forum/showthread.php?p=57018 |
| Comment by richlv [ 2011 Mar 24 ] |
|
what's preventing you from using partitioning ? so could this feature request said to be about alternative data storage options ? |
| Comment by Will Lowe [ 2011 Mar 24 ] |
|
Hmm. When I look at my schema, I see a single history table and a single items table. They are so big that I can't even get a real size – SELECT count zabbix=# SELECT (reltuples)::integer FROM pg_class r WHERE relkind = 'r' AND relname = 'history'; ... I was looking for a way to break this up into several smaller tables. E.g. history_2011_01, history_2011_02. You're saying that's already supported? I couldn't find it in the documentation. |
| Comment by richlv [ 2011 Apr 21 ] |
|
not in individual tables, but partitions - see http://dev.mysql.com/doc/refman/5.5/en/partitioning.html for mysql or relevant documentation for other databases |
| Comment by Will Lowe [ 2011 Apr 22 ] |
|
We're using postgresql, which doesn't have great native support for this type of partitioning. I was hoping you were telling me that Zabbix could do it itself. |
| Comment by richlv [ 2011 Jul 27 ] |
|
|
| Comment by Will Lowe [ 2013 Jan 04 ] |
|
Perhaps it would be worthwhile to consider http://graphite.wikidot.com/ as a replacement graphing backend. It's getting lots of traction, scales well, and does a pretty nice job. |
| Comment by Adam Kowalik [ 2013 Oct 11 ] |
|
PostgeSQL also supports partitioning - in fact, that's why i'm using it instead of mysql, see here: https://www.zabbix.org/wiki/Docs/howto/zabbix2_postgresql_autopartitioning |
| Comment by richlv [ 2013 Oct 31 ] |
|
ZBXNEXT-1971 talks about opentsdb |
| Comment by Stefan [ 2013 Nov 04 ] |
|
maybe this is an option? |
| Comment by Vadym Kalsin [ 2014 Feb 27 ] |
|
My idea is to store latest data in zabbix (for the trigger functionality), but all historical data store in graphite. |
| Comment by Marc [ 2014 May 21 ] |
|
Wont ZBXNEXT-714 be covered by |
| Comment by Mahdi Hedhli [ 2014 Sep 03 ] |
|
+1 for OpenTSDB and Graphite |
| Comment by David Parker [ 2014 Dec 13 ] |
|
ZBXNEXT-2640 was closed as a duplicate of this bug. It proposes to solve the backend storage problem by adding an optional call to send data to an AMQ type queue at the point in zabbix server logic where data is being prepared for insertion into sql. This single addition is effectively a modular storage solution - you can pretty much route the data wherever you want once it's in e.g. rabbitmq. It makes OpenTSDB, Graphite, and just about anything else you can think of almost trivial additions from the user perspective, and has the advantage of reducing the zabbix code complexity "load" to only dealing with one single api (the amq api). I believe rabbitmq is a good choice here, but others would work as well. I believe the "hook" would live somewhere in src/libs/zbxdbcache/dbcache.c in the dc_add_history_<type> functions. If the target is rabbitmq, the C client library is this: http://alanxz.github.io/rabbitmq-c/docs/0.5.0/. Implementation would be to add optional configuration to the server config file sufficient to describe a connection to an amq exchange. If that configuration section is populate, code establishes the connection and then pushes data through it. Data would be text format (could be just comma delimited to keep overhead down on the zabbix side). I believe this would not be hard to implement. |
| Comment by Corey Shaw [ 2015 Feb 03 ] |
|
I believe that a lot of the ideas that have been presented here are fantastic. Definitely more database storage options are needed for both storing data AND Zabbix reading it back out. Something lacking here though is the concept of storage "modules". I've heard it be alluded to in #zabbix and such, but figured I'd point it out here. There really needs to be something of a storage framework that people can write pluggable modules for. For instance, let's say that this framework allows me provide my own plugin (or another one someone has written) that tells Zabbix how to store data in OpenTSDB. This same plugin could tell Zabbix how to read data back out of that datastore. To improve on this further it should be possible for a custom plugin to support sharding of the datastore. Why couldn't someone write into the MySQL module the ability for Zabbix read/write data from/to multiple servers? If all Zabbix cares about is function calls to the plugin that provide or store the data, it should be trivial to support that from a Zabbix perspective. The difficulty lies in the pluggable module. Zabbix SIA could provide a base set of modules as they see fit (ie: MySQL, Postgres, etc.) and also rely on the community to provide them. Obviously this is not a minor change and require a significant amount of work. It is possible to get an interim (and let me stress the word "interim") solution probably a bit more quickly. ZBXNEXT-2640 has the concept of adding a "hook" into the current Zabbix code to publish data it collects to an AMQP bus (like RabbitMQ) - this was mentioned in the previous comment by David Parker. This would provide the ability to shove the data off to some other datastore of choice without completely redesigning the Zabbix storage code as it stands now. While it wouldn't provide the ability for Zabbix to read data from the arbitrary datastore, it would allow people to offload some work from the main Zabbix database, thereby freeing up resources (which the Zabbix database desperately is in want of). |
| Comment by Alexander Simenduev [ 2015 Feb 06 ] |
|
I think InfluxDB (http://influxdb.com), should be a very reasonable alternative for the items storage, which have a SQL like query engine. |
| Comment by Corey Shaw [ 2015 Feb 06 ] |
|
One of the current major issues with InfluxDB is that its clustering is experimental at best, so it would suffer from scalability concerns. If it was simply another plugin for a storage framework (like I mentioned in my last comment), that'd be great! |
| Comment by David Parker [ 2015 Feb 07 ] |
|
As you mentioned, having output go through an AMQ (eg rabbitmq) as in ZBXNEXT-2640 is a fast interrim path toward that. If everything gets shoved into queue, it can go wherever the user wants it to go. The effect is much like having a plugin-able storage backend, albeit write-only. |
| Comment by Marc [ 2015 Feb 07 ] |
|
An implementation in the sense of Then there would be an abstraction layer that allows to use almost any thinkable storage backend. |
| Comment by Oleksii Zagorskyi [ 2015 Sep 16 ] |
|
|
| Comment by Vadim Nesterov [ 2015 Sep 16 ] |
|
When Alexey was in Moscow, he was agreed to make something like middleware with support of different storage engines like plugins. |
| Comment by Thierry Sallé [ 2015 Nov 23 ] |
|
Hi ! Regards, Thierry. |
| Comment by Marc [ 2016 Feb 12 ] |
|
In reference to |
| Comment by richlv [ 2016 Jul 28 ] |
|
|
| Comment by Peter Gervai [ 2017 Feb 22 ] |
|
I generally don't like the solutions which suggest manual sharding, partitioning or other manual hacks. Just to name a few older and newer, all of them possibly better than *sql: dalmatinerdb, apache kubu, opentsdb (these all automagically clustered); influx, prometheus, riakTS, kairosDB, graphite/whisper, or even elasticsearch; and some I don't even know: druid, blueflood, atlas, chronix, hawkular, warp10, heroic, btrdb, merticTank... I personally see the first two as a good candidate for myself, but they are very similar from the usage standpoint. All of them great in storing and retrieving time series. |
| Comment by Glebs Ivanovskis (Inactive) [ 2017 Feb 22 ] |
|
A fresh related issue: ZBXNEXT-3661. |
| Comment by Stefan [ 2017 May 05 ] |
|
Hello, just some suggestions:
|
| Comment by Peter Gervai [ 2017 May 18 ] |
|
@stefan krüger: You believe that, and many of us disagrees. RDBMS is not good for storing (sometimes many hundred millions) of simple, unstructured time series data, and even less for retrieving them fast. It doesn't really matter how you try to tweak the schema, you always need lots of dirty (manual) work to make it at least somewhat bearable, but it never will be good. In contrast the clustered time series databases usually provide you with a working, distributed, load balanced and high peformance backend without any fiddling necessary. Retrieval times are (often) magnitudes lower than RDBMS. I understand that some people believes that the solution is not to keep historical data, they delete everything older than a week and be happy. But then we (those who need the history for years) would need a parallel system, gathering the same data but storing it, then another to retrieve and visualise it. Kind of defeats the purpose. I haven't browsed the code but probably all tasks can be somehow traced back to |
| Comment by Rostislav Palivoda (Inactive) [ 2018 Apr 11 ] |
|
Data export to JSON files available in 4.0 alpha5 - |
| Comment by Rostislav Palivoda (Inactive) [ 2019 Sep 25 ] |
|
Elasticsearch added in Timescale support added in Does it still make sense to keep the request open? Please comment with case description. |
| Comment by Glebs Ivanovskis [ 2021 Aug 08 ] |
Export files became "second class" and aren't really a match for what is requested in this ticket. |
| Comment by Sergey [ 2021 Dec 14 ] |
|
Will be support victoriametrics as storage? |
| Comment by Alexei Vladishev [ 2021 Dec 15 ] |
|
serrrios , VictoriaMetrics will definitely be an option once we have history API implemented. |
| Comment by LivreAcesso.Pro [ 2023 Mar 23 ] |
|
What about better schema to improve data density? The order of the columns matter: |
| Comment by LivreAcesso.Pro [ 2023 Mar 23 ] |
| Comment by François Saysset [ 2023 Apr 03 ] |
|
what do you think of clickhouse for storing large historical data ? |
| Comment by LivreAcesso.Pro [ 2023 Jun 24 ] |
|
Can we test VictoriaMetrics as alternative to SQLite? |
| Comment by Peter Gervai [ 2023 Jun 25 ] |
|
I have lost my patience in the recent years waiting for solution, since RDBMS clearly not fitting the case. Since we use VictoriaMetrics for about everything metric, and it handles about 20 billion entry on a single non-clustered virtual machine rather well (both disk usage and response time is excellent) I started to implement querying metrics through zabbix api and inject it into V.M. and use Grafana to watch metrics (not to mention some algorithmic alerts through alertmanager, which shall be fed back to zabbix, enevtually), and tell Zabbix to remove metric stuff older than some ridiculously short time, say, 7 days. It would be nice if zabbix could automagically push metric data to anything, using a plugin-based system (prometheus compatible, influx compatible, tsdb compatible, just to name a few), and maybe able to include Grafana dashboards (even in an iframe), if it's hard to implement actual metric queries against the said protocols and displaying it (but, honestly, Grafana displays graphs a million times better than a hacked-together local graph, why reimplement it again?). Clickhouse is nice but I wouldn't store metrics in a non-metric oriented backend; however, even if it's not perfect, it could handle storage way better than the current solutions. Especially any historical non-metric data. (And Elasticsearch is not resource friendly in any way, mind you.) I was suggesting generic metric API a loong time ago, following the discussions but I have seen marginal demand and will, and probably it's easier for me to syphon out the data than waiting for some miracle to happen. Still I carry hope. |
| Comment by Glebs Ivanovskis [ 2023 Jun 26 ] |
|
Dear grin,
Zabbix already has such "plugin-based system" (see
Totally agree. That's why my loadable module for InfluxDB has integration with Grafana Zabbix plugin. I don't use this solution myself, but since there are virtually no open issues it either works to everyone's satisfaction or nobody uses it. |
| Comment by Stefan [ 2023 Jun 26 ] |
|
since Zabbix supports TimescaleDB, it has a good scalable alternative for history and items.. also Zabbix has the possibility to export metrics everything is fine so for me, that can be closed |
| Comment by Peter Gervai [ 2023 Jun 26 ] |
|
@Glebs While in theory influx can be used in VM in practice as you mention with no label information (group, item name, interface, ...) it has little use, possibly not unrelated to the lack of feedback. I am not sure i would be simple (or possible) to write a module which could generate "whole" data (ts+data+labels), posssibly doing large amount of lookups in the process. For the general case (or for really anything which works on the data, from alerting to various external plugs) the grafana "hack" do not seem to be useful. |
| Comment by Alexei Vladishev [ 2023 Jul 03 ] |
|
We are aiming to introduce a generic HTTP based API for connecting any time series DB engines in 7.0 LTS. It is currently in design phase. |
| Comment by Andrea Rafreider [ 2023 Jul 27 ] |
|
Use of timeseries databases for history usage would be easier if zabbix item keys supported named parameters. Example: This could easily exported in the history file of zabbix and imported in a prometheus compatible timeseries (we use victoriametrics for this) for example a on a timeseries with name vfs.fs.size and labels {filesystem="/usr", mode="used"}Retrocompatibility, for those who need it, could be achieved by supporting in zabbix item.keys both positional and new named parameters. I'm looking forward for a timeseries backend that can work OOTB with zabbix! Keep up with the good work |
| Comment by Ruslan Aznabaev [ 2024 Mar 28 ] |
|
Any update on this? As I see, it has label "Zabbix7.2", but in roadmap this request planned for 7.0. |
| Comment by Alexei Vladishev [ 2024 Apr 02 ] |
|
NexonSU It was initially planned for 7.0, now postponed to later releases. |
| Comment by LivreAcesso.Pro [ 2024 Oct 10 ] |
|
+1 for victoriametrics |
| Comment by Christos Diamantis [ 2024 Oct 16 ] |
|
What about Clickhouse? Really fast OLAP DB that supports SQL, it has MySQL wire protocol for connecting to it and can scale to tremendous size. Only drawback to OLTP databases like MySQL and Postgres that it does not support triggers so you have to reside on an external mechanism. Nevertheless, it supports aggregate functions like timescaledb's continuous aggregates and materialized views. |
| Comment by LivreAcesso.Pro [ 2024 Oct 22 ] |
|
christos.diamantis
|
| Comment by Christos Diamantis [ 2024 Oct 31 ] |
|
Clickhouse is a good choice for big setups indeed. Low end setups would require a lot of tuning but the benefits would not be visible in contrast to Timescale for example. |
| Comment by Peter Gervai [ 2024 Oct 31 ] |
|
Mentioning again, that if possible a generic interface and external plugins/plumbing would be more efficient than coding a half dozen different backend interfaces into zabbix core with wildly different semantics. Victoriametrics is great, clickhouse is great, others like other TSDBs (like influx, timescale), and some thought elastic/opensearch is a good idea, all of these should be using a unified API/interface. Otherwise it requires familiarity with both deep zabbix internals and the internals of the backend system; instead of just writing the maching API endpoints for the specific backend which would require minimal zabbixism from the plugin writer. And I apologise for stating the obvious, repeated every 5 years. |
| Comment by Mickael Martin [ 2025 Jan 16 ] |
|
On roadmap 7.4 :
|
| Comment by François-Hugues MAILLIET [ 2025 Mar 10 ] |
|
With the evolution of our information systems, and in particular containerization, the search is often for horizontal scalability. When using a database such as Postgres, a plugin called Citus makes this horizontal extensibility possible. Perhaps we need to create a specific upgrade request? |
| Comment by Alexei Vladishev [ 2025 Mar 14 ] |
|
francois.hugues.mailliet , Citus might be considered as one of solutions, however now we are more focused on engines designed specifically for time-series and JSON data storage. |
| Comment by François-Hugues MAILLIET [ 2025 Mar 14 ] |
|
alexei : Thanks for this update. TSDB is very interestinfg way and since three years, I only install Zabbix instances with TimescaleDB. But to work on a full containerized architecture, including the storage part, I understand that working on the new JSON data storage possibilities is on the rails. I understand and I agree. I agree because, if I have understood everything correctly, it's a native Postgres functionality. But for the moment, this doesn't necessarily address the horizontal scalability of Postgres database storage. Json data storage It is useful for manipulating semi-structured documents without defining a strict schema, isnn't it? Is this Jira is the must to follow this evolution? |
| Comment by Alexei Vladishev [ 2025 Mar 17 ] |
|
francois.hugues.mailliet, you may follow Zabbix roadmap at https://zabbix.com/roadmap in order to get up-to-date information that also includes actual references to ZBXNEXT tickets. |
| Comment by Martin Alstrup [ 2025 May 23 ] |
|
Just chiming in here to say that TimescaleDB works great for us. A single Postgres VM (10 cores, 32GB RAM on Xeon 6230 VMware hypervisor) carries about 10k NVPS with no issues. No manual partitioning. We do have a similar streaming replica server as well (for failover), and our Grafana connects to that read-only for dashboards. I'm curious what numbers others have to cause performance issues on the Postgres side? |
| Comment by Ruslan Aznabaev [ 2025 May 23 ] |
|
Well, our PostgreSQL can handle 100k NVPS... on new Dell server with 128 cores and 256GB of RAM. But, right now we have problem even with events table, because it's too big... |
| Comment by Klimenko Andrey [ 2025 May 23 ] |
|
It is extremely necessary to move the timeseries to a separate specialized TSDB. Only then will zabbix become a real monitoring system. Until then, it is a complicated toy from the 90s |
| Comment by Andrea Rafreider [ 2025 May 23 ] |
|
I agree a TSDB is needed but also naming convention of zabbix items needs to change before, as it cannot contain information of monitored instances in the zabbix item name |
| Comment by Ruslan Aznabaev [ 2025 May 23 ] |
|
It can, some of our SRE engineers stores a ****ing access.log in VictoriaMetrics labels and... it works. This is wrong on so many levels, but it works. |
| Comment by Alexei Vladishev [ 2025 May 30 ] |
|
Support for alternative scalable storage engines is on the way in Zabbix 8.0. Stay tuned for updates! |
| Comment by Stefan [ 2025 Jun 02 ] |
|
everyone who hit the limit of the current implemantation, that looks like a bigger instance (which is used in a bigger company), so when u really need that feature? why u don't pay for them and asking zabbix how much does it cost to implement this? |
| Comment by Alex Kalimulin [ 2025 Jun 02 ] |
|
To keep the discussion focused, the last few comments have been removed. klimenko.andrei, we kindly ask you to avoid further personal remarks and maintain a respectful tone moving forward. |
| Comment by RobertG [ 2025 Jun 11 ] |
|
We use PostgreSQL with TImescaleDB with compression and love it. Schema changes if and when necessary from new zabbix releases can be tedious to minimize downtime. Current DB is 700GB with compression enabled. |
| Comment by LivreAcesso.Pro [ 2025 Jun 23 ] |
|
TimescaleDB over Citus Data is a good compromise for HA scenarios, alexei |
| Comment by Mathew [ 2025 Jul 28 ] |
|
After being part of the roadmap for 3 |
| Comment by François-Hugues MAILLIET [ 2025 Jul 28 ] |
|
splitice : It is included in the roadmap, but not directly explicit and more associated with telemetry data: |
| Comment by Alexei Vladishev [ 2025 Jul 29 ] |
|
splitice, it is in the roadmap. The functionality for all history tables is coming in Zabbix 8.0 LTS. |
| Comment by Stefan [ 2025 Jul 31 ] |
|
alexei that sounds great.. It would be nice if u can told us in which status u are.. do you elaborating/testing a technology/product (if so which one) or did you already an decision and if so which product/technology will zabbix support?! (that should be NOT a discussion why do u use X and not Y or if u are in elaborating please also take a look at Z) |
| Comment by Alexei Vladishev [ 2025 Aug 01 ] |
|
shad0w , we are about to choose timeseries DB engine, all R&D is already done. I will keep everyone updated. |
| Comment by Stefan [ 2025 Aug 22 ] |
|
alexei thats what I thought |
| Comment by Alexei Vladishev [ 2025 Aug 22 ] |
|
shad0w , we are close, but no decision is made yet. |
| Comment by Mickael Martin [ 2025 Nov 03 ] |
|
So, ClickHouse seems to be one of your choices: https://git.zabbix.com/projects/ZBX/repos/zabbix/compare/diff?sourceBranch=refs%2Fheads%2Ffeature%2FDEV-4427-7.5&targetRepoId=152 |