[ZBX-17674] No way for history export module to gracefully handle backend downtime Created: 2020 May 05  Updated: 2024 Apr 10  Resolved: 2022 Jul 19

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 5.0.0rc1
Fix Version/s: 6.4 (plan)

Type: Documentation task Priority: Minor
Reporter: Glebs Ivanovskis Assignee: Marina Generalova
Resolution: Fixed Votes: 2
Labels: database, loadablemodule, watchdog
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Team: Team D
Sprint: Sprint 90 (Jul 2022)
Story Points: 1

 Description   

Thanks to ZBXNEXT-3353 loadable modules can be used for history export.

Let's imagine a setup where loadable module is used to export history data from Zabbix to some sort of external storage backend in real-time:
Zabbix → module → storage

Let's imagine a situation when external storage is not available or has severe performance issues.
Zabbix → module ↛ storage

The issue is that loadable module has no way to handle this situation gracefully. History export modules are not first-class citizens in Zabbix. There is no caching/buffering and no retries implemented on Zabbix side, which would allow modules to make a pause in sending data to alternative backends and catch up later. Module has a single chance of sending data to storage backend. If backend is having performance issues, module is basically between hammer and anvil. There are following options available (and none of them is particularly good):

  1. Drop the data. Very simple solution, no performance issues. But it will cause data loss, this would undermine trust of module users and eventually no one would use it.
  2. Buffering. It is a viable solution for temporary backend performance problems, but it would significantly increase complexity of the module. And data loss could still potentially happen if Zabbix decided to stop while module had loads of buffered data. Module would need to dump buffers on disk, which would increase complexity even further.
  3. Block and wait. Another simple option. And this ensures consistency between Zabbix database and external backend. Timing out makes no sense in this strategy, module has no choice but to be fully committed. As a downside, when external storage performs poorly, Zabbix will perform poorly as well.

I have personally chosen the latter option. In a way module's behaviour mirrors the relationship between Zabbix database and Zabbix server - when database performs poorly, Zabbix may have troubles too. That's why it is very important to monitor health status of Zabbix database and someone using module behaving like that should monitor availability of external storage as well. Preferably, with an independent monitoring setup.

This is not ideal. I'm not saying that Zabbix should implement buffering and retries for history export, I understand that this simply moves complexity from modules to Zabbix. But since Zabbix and modules are running into the same problem, maybe same solution can be applied. I mean "database watchdog" functionality which in modern Zabbix setup is a duty of alert manager process (documented here, scroll down to "Other parameters" or search for "database down"). When watchdog detects that database is down, an alert will be sent bypassing the usual Zabbix pipeline (triggers → problems → actions → operations). Module can provide a similar "if external storage is down" check and Zabbix can send an alert based on it.

P.S. Module users see this as a bug and since I cannot resolve this issue on my own, I am reporting this as an incident and not a feature request. Who knows, maybe you guys will have a bright idea how to resolve the issue and then it will be just a documentation task.



 Comments   
Comment by Anthony Somerset [ 2020 May 05 ]

This would certainly help solve the chicken and egg scenario at least when using custom history export modules

Comment by Vladislavs Sokurenko [ 2020 Jul 03 ]

Sending watchdog alerts when module returns an error would be a nice feature.

Comment by Glebs Ivanovskis [ 2021 Aug 07 ]

More than a year has passed since the issue became Confirmed. Any updates?

Comment by Vladislavs Sokurenko [ 2021 Aug 07 ]

There was related issue some time ago and number one behavior was chosen, rationale behind is to keep Zabbix server running even if there are problems with export, please see ZBX-16779

Comment by Glebs Ivanovskis [ 2021 Aug 08 ]

I don't understand. Is it a Won't fix, Workaround proposed?

If Zabbix team has an understanding, how loadable modules should/must act in this situation, then desired behaviour needs to be documented.

Comment by Marina Generalova [ 2022 Jul 18 ]

Loadable module documentation has been updated in 5.0, 6.0, 6.2.

Generated at Wed Apr 02 15:48:06 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.