[#ZBX-10751] Modules should load after agent processes fork

[ZBX-10751] Modules should load after agent processes fork Created: 2016 May 05 Updated: 2017 May 30
Status:	Open
Project:	ZABBIX BUGS AND ISSUES
Component/s:	Agent (G)
Affects Version/s:	3.0.2
Fix Version/s:	None

Type:

Incident report

Priority:

Trivial

Reporter:

Ryan Armstrong

Assignee:

Unassigned

Resolution:

Unresolved

Votes:

Labels:

loadablemodule

Remaining Estimate:

Not Specified

Time Spent:

Not Specified

Original Estimate:

Not Specified

Description

Currently the Zabbix agent calls dlopen to load configured modules, before forking its worker processes. This works fine for single threaded modules but breaks any efforts to use multiple threads within a module as thread behavior after a fork is undefined.

Instead, modules should be loaded after the fork of each working process that requires the module.

Some work can be done in the module to overcome this (e.g. using IPC to talk to threads started in the parent PID by zbx_module_init) but sometimes there are third party libraries which are difficult to control (like Go) that spawn threads at dlopen which are required for normal operation in the child processes.

This issue has been discussed by the Go team in the attached link.

Comments

Comment by Glebs Ivanovskis (Inactive) [ 2016 May 05 ]

Any other examples where it might be needed apart from the case that "Go can't run single-threaded"?

I am almost sure that dlopen() and zbx_module_init() won't be moved from Zabbix main process because:

it's much easier to stop one process if module loading failed;
this provides module an opportunity to allocate shared memory segment and/or fork it's own collector-like process to facilitate information exchange and synchronization between module instances used by different pollers/listeners.

Could there be a workaround in the form of little C library which will be dlopen()'ed by Zabbix and which would dlopen() Go library on the first access to module items?

Comment by Ryan Armstrong [ 2016 May 05 ]

Unfortunately I can't think of other definitive examples which are highly concurrent and rely on pthreads in the runtime. However, excluding an entire developer community and any library or runtime with similar design would be unfortunate I feel. I get a fair bit of traffic to my g2z project so there seems to be community interest if we can get it working.

I agree with your points though. I'll think on this a little and experiment with your workaround.

Comment by mbarthelemy [ 2016 May 05 ]

I've also been investigating this project allowing to write Zabbix Loadable Modules in Rust.
I have no idea if the runtime needs any threading by default, but in my case (MongoDB monitoring module), the Rust library/driver for Mongo seems to use threads. Would that trigger the same issue?

Comment by Ryan Armstrong [ 2016 May 10 ]

mbarthelemy have you tried making any Rust calls which use concurrency from a loaded Zabbix module? I'm not familiar enough with Rust to make a suggestion but typically network sockets make use of threads i concurrent runtimes.

Comment by Patrick Hemmer [ 2016 May 20 ]

What about executing an external process instead of loading libraries into the agent? I don't mean one-shot execution like we already have, but launch a long-lived process, and make RPC calls to it. Then the agent & module could do whatever they want (forking/threading) without affecting each other.
It seems like it would be pretty simple to create a basic system where zabbix-agent launches the process, passes an open listener socket on FD#3, and each poller would open a connection to this listener.

Only one process would be spawned, allowing it to fork/thread as much as it wants instead of forcing it to fork. zabbix-agent could provide a hint as to how many pollers might be accessing it simultaneously.
This would also add resiliency as if the module crashes, it won't take the agent/poller with it. The agent could even restart the module (as long as pollers reconnect on socket error).
It would also make it easier to write modules in other languages, especially those that can't be made into loadable libraries.
This should also make it possible to write modules for windows.

Comment by Glebs Ivanovskis (Inactive) [ 2016 May 20 ]

Patrick, have a glance at this page. Zabbix protocols are well documented and all agent protocols are still supported by modern Zabbix server and proxy. Passive checks are dead simple, you can easily write your own agent in your preferred language for your preferred platform.

Comment by Patrick Hemmer [ 2016 May 20 ]

Rewriting the entire agent would be a huge undertaking, and very likely to result in errors. The official zabbix agent has been around for years, and has been thoroughly tested.

Comment by Glebs Ivanovskis (Inactive) [ 2016 May 20 ]

No need to rewrite all Zabbix agent functionality! Zabbix agent can still run alongside your custom agent.

Comment by Ryan Armstrong [ 2016 May 21 ]

Glebs, I have considered an additional agent but how would you implement it along side the Zabbix agent? Only one process can take the TCP port and only one agent port can be defined for each Host. This scenario still means that a module author would have to overcome the challenges to writing stable service registration, agent protocol interface, ACLs, logging, etc. And active checks would become precluded.

I really like Patrick's idea here. He described the same architecture I've experienced writing plugins for projects such as Squid, Vagrant and Packer.

The be honest, I'm hitting constraints with the Zabbix module architecture all the time and I feel this is a large barrier to entry for the community. Consider the number of publicly available templates and user parameter scripts compared to community developed modules.

So far it presents the following challenges:

Loaded modules directly affect the stability of the agent
Modules can only be written in C, precluding languages common amongst monitoring gurus
Deep knowledge of the Zabbix source and build configuration is required
The ABI is not solidified (~~ZBX-10428~~) and does not provide a contract for critical features such as identifying the agent version, writing to the agent log file or outputting valid JSON discovery data
SELinux is more difficult to manage (ZBX-10610)
No development packages are published (~~ZBXNEXT-3157~~) so headers must be deciphered and manually copied or some difficult m4 macro magic is required
No support for Windows (ZBXNEXT-2201)

Patrick's idea presents the following advantages:

Agent stability is unaffected. Runaway modules can be recycled
Modules can be written in any language and packages could be written to simplify and abstract the interface to the agent
No knowledge of Zabbix source or build configuration is required
The agent ABI can be decommissioned and will remove constraints on Zabbix developers
SELinux policies can be more discrete and targeted at the module process instead of the Zabbix agent
No development packages are needed
Windows/other OS support is drastically simplified

The agent could communicate with the loaded modules simply using a pipe for:

advertising agent version and configuration to the module
requesting a list of supported keys
accepting log messages
advertising any change to active check configuration (so collector/aggregator threads in the module can be adjusted)
requesting key values from the module for passive/active checks
accepting 'sender' like values from the module (maybe for log monitoring, etc.)

Comment by Marc [ 2016 May 21 ]

Just in case somebody considers to configure multiple Zabbix agents for one Zabbix host: Be aware of ~~ZBX-8623~~

Comment by Glebs Ivanovskis (Inactive) [ 2016 May 23 ]

Dear Ryan and Patrick,

this is getting slightly off-topic. You can find me in our IRC channel if you want to discuss loadable module issues in detail. I've written a couple of more or less modules recently and I agree with half of your points, but I strongly disagree with the other half.

Loaded modules directly affect the stability of the agent

Who would want to use unstable modules? If one of Zabbix daemon children dies Zabbix daemon stops. Same story with modules.

Modules can only be written in C

AFAIK one still needs some glue code to bind application and library in different languages. It is too much to ask from Zabbix to provide bindings for all possible languages.

Deep knowledge of the Zabbix source and build configuration is required

No! I agree that in some aspects loadable module docs lack information, but one can write modules without looking into Zabbix code.

The ABI is not solidified (~~ZBX-10428~~) and does not provide a contract for critical features such as identifying the agent version, writing to the agent log file or outputting valid JSON discovery data

Even worse, there is no ABI, unfortunately... However, docs say: Another useful header is include/log.h, which defines zabbix_log() function, which can be used for logging and debugging purposes. I will not comment on how this works, but this gives a hint about JSON.

No development packages are published

For basic module you need just module.h, not even sysinc.h because you can include standard library headers yourself.

The agent could communicate with the loaded modules simply using a pipe for:

Replace "pipe" with "socket" and Zabbix agent you are dreaming of will turn into existing Zabbix proxy. But I admit, that managing multi-interface hosts is not convenient in current implementation, as Marc rightly pointed out.

Generated at Wed Apr 02 23:37:04 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.

[ZBX-10751] Modules should load after agent processes fork Created: 2016 May 05 Updated: 2017 May 30