[ZBX-26650] Zabbix Agent 2 Corrupts RPM Database via system.sw.packages.get Created: 2025 Jul 03  Updated: 2025 Sep 10

Status: Confirmed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent2 (G)
Affects Version/s: 7.0.15
Fix Version/s: None

Type: Problem report Priority: Trivial
Reporter: Jan Prusinowski (Inactive) Assignee: Zabbix Development Team
Resolution: Unresolved Votes: 1
Labels: rhel, rpm
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2025-07-09-17-20-37-419.png     PNG File image-2025-07-09-17-22-06-572.png     PNG File image-2025-07-09-17-23-14-994.png    
Issue Links:
Duplicate
Sprint: Support backlog

 Description   

Client is We are facing a critical issue on multiple RHEL 8 systems where the RPM database becomes corrupted after the zabbix_agent2 process executes the system.sw.packages.get item.

After investigating the root cause, we followed this Red Hat knowledge base article:
🔗 https://access.redhat.com/solutions/3330211

The kernel audit logs confirmed that zabbix_agent2 (running as root) sends SIGKILL signals to the rpm processes executing rpm -qa --queryformat ..., which leads to corruption of the RPM database (/var/lib/rpm).
Here’s a sample from the trace:

{{sys.kill: zabbix_agent2(pid:1070317) called kill(2710675, SIGKILL)
sig.send: SIGKILL was sent to rpm (pid:2710675) by uid:0
kprocess.exit: rpm(pid:2710675) - Code 9 - "rpm -qa --queryformat ..."}}

Once the agent terminated the rpm processes, client observed immediate database corruption:

{{error: db5 error(30973) from dbenv>failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages database in /var/lib/rpm}}

Simultaneously, Zabbix Agent 2 logs showed:

{{check 'system.sw.packages.get' is not supported: Timeout occurred while gathering data.
[Sw] Failed to execute command 'rpm -qa', err: Command execution failed: context deadline exceeded.}}

Important context: Client has made the decision to run zabbix_agent2 as root in their environment to ensure access to all critical server components, including hardened or restricted filesystems. This was necessary for complete visibility. Agent command line:

root 1070317 0.5 0.1 2229476 47392 ? Ssl Jun23 77:40 /usr/sbin/zabbix_agent2 -c /etc/zabbix/zabbix_agent2.conf

Summary:

  • Agent version: Zabbix Agent 2 7.0.14
  • OS: RHEL 8
  • Affected item: system.sw.packages.get
  • Impact: RPM DB corruption due to SIGKILLs issued by Zabbix Agent
  • Reproducible: Yes, happens on multiple hosts
  • Workaround: We have temporarily disabled system.sw.packages.get

Other possibly helpfull info:
 

[root@sldjde1221 zabbix]# zabbix_agent2 -V
zabbix_agent2 (Zabbix) 7.0.14
Revision ae76e5efee9 18 June 2025, compilation time: Jun 18 2025 12:12:38, built with: go1.24.1
Plugin communication protocol version is 6.4.0
Copyright (C) 2025 Zabbix SIA
License AGPLv3: GNU Affero General Public License version 3 <https://www.gnu.org/licenses/.>
This is free software: you are free to change and redistribute it according to
the license. There is NO WARRANTY, to the extent permitted by law.
This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/).
Compiled with OpenSSL 1.1.1k FIPS 25 Mar 2021
Running with OpenSSL 1.1.1k FIPS 25 Mar 2021

 

Client uses the library Eclipse Paho (eclipse/paho.mqtt.golang), which is
distributed under the terms of the Eclipse Distribution License 1.0 (The 3-Clause BSD License)
available at https://www.eclipse.org/org/documents/edl-v10.php

Client uses the library go-modbus (goburrow/modbus), which is
distributed under the terms of the 3-Clause BSD License
available at https://github.com/goburrow/modbus/blob/master/LICENSE

 

[root@sldjde1221 zabbix]# time rpm qa --queryformat '%{NAME},%{VERSION}%{RELEASE},%{ARCH},%{SIZE},%{BUILDTIME},%{INSTALLTIME}
n' > /dev/null
real 0m1.479s
user 0m1.369s
sys 0m0.105s

 

# Ansible managed: agent.conf.j2 modified on 2025-04-30 12:04:13 by root on automation-job-146954-ggqbb
#
# This is a configuration file for Zabbix Agent 2
# To get more information about Zabbix, visit http://www.zabbix.com

# This configuration file is "minimalized", which means all the original comments
# are removed. The full documentation for your Zabbix Agent 2 can be found here:
# https://www.zabbix.com/documentation/7.0/en/manual/appendix/config/zabbix_agent2

# Alias=
# AllowKey=
BufferSend=5
BufferSize=100
ControlSocket=/tmp/agent.sock
DebugLevel=3
# DenyKey=
EnablePersistentBuffer=0
ForceActiveChecksOnStart=0
HeartbeatFrequency=60
# HostInterface=
# HostInterfaceItem=
# HostMetadata=
# HostMetadataItem=
Hostname=sldjde1221
# HostnameItem=
Include=/etc/zabbix/zabbix_agent2.d/*.conf
Include=/etc/zabbix/zabbix_agent2.d/plugins.d/*.conf
ListenIP=172.21.14.57
ListenPort=10050
LogFile=/var/log/zabbix/zabbix_agent2.log
LogFileSize=100
LogType=file
# PersistentBufferFile=
PersistentBufferPeriod=1h
PidFile=/var/run/zabbix/zabbix_agent2.pid
PluginSocket=/tmp/agent.plugin.sock
PluginTimeout=3
RefreshActiveChecks=120
Server=172.16.238.0/24,172.16.19.0/24
ServerActive=zabbix-np.REDACTED,slqzbx0366.REDACTED,slqzbx0367.REDACTED,slqzbx0368.REDACTED
# SourceIP=
# StatusPort=
Timeout=3
TLSAccept=psk
# TLSCAFile=
# TLSCertFile=
TLSConnect=psk
# TLSCRLFile=
# TLSKeyFile=
TLSPSKFile=/etc/zabbix/zabbix_agent.psk
TLSPSKIdentity=saqzabbixagent
# TLSServerCertIssuer=
# TLSServerCertSubject=
UnsafeUserParameters=0
# UserParameter=
# UserParameterDir=0

The zabbix-agent2 was installed using Ansible automation and the official Zabbix repository via dnf.
The package in use is:

zabbix-agent2-7.0.14-release1.el8.x86_64

The host is running:

Red Hat Enterprise Linux release 8.10 (Ootpa)
Kernel: 4.18.0-553.56.1.el8_10.x86_64

Client is not exactly sure when the issue began, as the agent was operating normally for some time. However, we recently started rolling out OS updates across several systems and noticed that the RPM database was corrupted on multiple hosts.

After further investigation, including analysis with Red Hat Support, client identified that the root cause was linked to zabbix-agent2 executing the system.sw.packages.get item. The agent times out (context deadline exceeded), sends SIGKILL to the rpm process, and this results in RPM database corruption (DB_RUNRECOVERY).



 Comments   
Comment by Jan Prusinowski (Inactive) [ 2025 Jul 09 ]

Steps to reproduce:

Downloaded RHEL 8.0 at: https://access.redhat.com/downloads/content/479/ver=/rhel---8/8.0/x86_64/product-software
Installed on a VM
Login via SSH as root
Installed Zabbix_agent2 v7.0.16
Edited zabbix_agent2.conf:

Server=192.168.50.0/24
Hostname=Zabbix agent 2 RHEL

Allowed connection in firewall:

sudo firewall-cmd --add-port=10050/tcp --permanent
sudo firewall-cmd --reload

Zabbix server - 7.0.16 running on
Ubuntu VM, PostgreSQL with Timescale
Added host and created an item on it:

After testing the item everything seems to be ok:

To speed things up I did run:

mv /usr/bin/rpm /usr/bin/rpm.real
echo -e '#!/bin/bash\nsleep 10\nexec /usr/bin/rpm.real "$@"' > /usr/bin/rpm
chmod +x /usr/bin/rpm

And from Zabbix server running over and over:

watch -n 4 "zabbix_get -s 192.168.50.232 -k system.sw.packages.get"

On the Host (over and over):

watch -n 5 "rpm -qa"

Im leavig this "for the night" to check if the corruption will occur

Comment by Jan Prusinowski (Inactive) [ 2025 Jul 10 ]

After running about 13.000 iterations the corruption did not occur. I will try to force timeouts to see if this will make any difference.

Comment by Edgar Akhmetshin [ 2025 Jul 17 ]

Why not to use sigterm for correct stopping and after use sigkill if it's not responding, why the first option is to use sigkill?

Generated at Wed Nov 12 17:56:15 EET 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.