[ZBX-15602] SystemD "TimeoutSec=infinity" is bad without units dependency order Created: 2019 Feb 06  Updated: 2024 Apr 10  Resolved: 2021 Jul 21

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Packages (C)
Affects Version/s: 4.0.4
Fix Version/s: 4.0.17rc1, 4.4.5rc1, 4.4 (plan), 5.0.0alpha1, 5.0 (plan)

Type: Problem report Priority: Trivial
Reporter: Tim White Assignee: Jurijs Klopovskis
Resolution: Fixed Votes: 11
Labels: reboot, systemd, timeout
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 18.04.1


Attachments: PNG File A stop job is running for Zabbix Server.png     PNG File A stop job is running for Zabbix Server.png     PNG File multi-user.target.wants.png     PNG File multi-user.target.wants.png     HTML File zabbixlog    
Issue Links:
Duplicate
is duplicated by ZBX-16609 zabbix-server.service missing postgre... Closed
is duplicated by ZBX-16587 zabbix-server.service missing mysql/m... Closed
is duplicated by ZBX-19316 stoping zabbix-server hang Closed
is duplicated by ZBX-19870 Add "After=pgbouncer.service" in syst... Confirmed
Team: Team I
Sprint: Sprint 56 (Sep 2019), Sprint 55 (Aug 2019), Sprint 54 (Jul 2019), Sprint 57 (Oct 2019), Sprint 58 (Nov 2019), Sprint 59 (Dec 2019), Sprint 60 (Jan 2020)
Story Points: 0

 Description   

Steps to reproduce:

  1. On occasion, when rebooting the server, it hangs for 30 minutes waiting for zabbix-server to stop. 30 minutes is the system level timeout to force a reboot

 

Changing TimeoutSec in /lib/systemd/system/zabbix-server.service to something more sane than Infinity would ensure that if the shutdown of Zabbix-server does hang, it can be killed by systemd after a resonable length of time, say 5 minutes.



 Comments   
Comment by Edgar Akhmetshin [ 2019 Feb 06 ]

Hello Tim,

Could you attach log file with Zabbix server shutdown procedure in progress?

Regards,
Edgar

Comment by Tim White [ 2019 Feb 06 ]

I've attached the logs from where I believe the issue was (snipped the redundant 30 minutes). It seems that MySQL (MariaDB) was shutdown before Zabbix-server, and so Zabbix-server keeps trying to reconnect for awhile.

That kind makes this 2 issues. Firstly, we don't have our dependencies correct (we should rely on MySQL/MariaDB) in the systemd file, so that systemd knows how to shutdown/startup the service. Secondly, we should set a suitable timeout for when things do go wrong, as Infinity is not a good default for any service.

Also, I noticed at boot time, that zabbix-server started before the database was ready. This really highlights needing the dependencies in systemd to be correct.

 

Logs from startup showing we are just too quick trying to connect to the database

  1336:20190206:125535.289 using configuration file: /etc/zabbix/zabbix_server.conf
  1336:20190206:125535.344 [Z3001] connection to database 'zabbix' failed: [2002] Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
  1336:20190206:125535.344 database is down: reconnecting in 10 seconds
  1336:20190206:125545.349 database connection re-established
  1336:20190206:125545.352 current database version (mandatory/optional): 04000000/04000003
  1336:20190206:125545.352 required mandatory version: 04000000

 

Proposed SystemD after changing dependencies and Timeout

[Unit]
Description=Zabbix Server
After=syslog.target
After=network.target
After=mysql.service

[Service]
Environment="CONFFILE=/etc/zabbix/zabbix_server.conf"
EnvironmentFile=-/etc/default/zabbix-server
Type=forking
Restart=on-failure
PIDFile=/run/zabbix/zabbix_server.pid
KillMode=control-group
ExecStart=/usr/sbin/zabbix_server -c $CONFFILE
ExecStop=/bin/kill -SIGTERM $MAINPID
RestartSec=10s
TimeoutSec=300s

[Install]
WantedBy=multi-user.target

 

 

Comment by Edgar Akhmetshin [ 2019 Feb 06 ]

Tim,

What operating system is used? How was Zabbix and database installed and from which repository? Please, show output from the following command:

sudo systemctl list-unit-files --type service --state enabled,generated;

Regards,
Edgar

Comment by Tim White [ 2019 Feb 06 ]

Ubuntu 18.04, installed from Zabbix repository deb packages.

$ apt-cache policy zabbix-server-mysql
zabbix-server-mysql:
  Installed: 1:4.0.4-1+bionic
  Candidate: 1:4.0.4-1+bionic
  Version table:
 *** 1:4.0.4-1+bionic 500
        500 http://repo.zabbix.com/zabbix/4.0/ubuntu bionic/main amd64 Packages
        100 /var/lib/dpkg/status
     1:3.0.12+dfsg-1 500
        500 http://au.archive.ubuntu.com/ubuntu bionic/universe amd64 Packages

$ sudo systemctl list-unit-files --type service --state enabled,generated;
UNIT FILE                             STATE    
accounts-daemon.service               enabled  
apache2.service                       enabled  
apparmor.service                      enabled  
apport.service                        generated
atd.service                           enabled  
[email protected]                       enabled  
avahi-daemon.service                  enabled  
blk-availability.service              enabled  
chrony.service                        enabled  
chronyd.service                       enabled  
console-setup.service                 enabled  
cron.service                          enabled  
dbus-org.freedesktop.Avahi.service    enabled  
dbus-org.freedesktop.resolve1.service enabled  
ebtables.service                      enabled  
gammu-smsd.service                    enabled  
[email protected]                        enabled  
grub-common.service                   generated
irqbalance.service                    enabled  
iscsi.service                         enabled  
keyboard-setup.service                enabled  
lvm2-monitor.service                  enabled  
lxcfs.service                         enabled  
lxd-containers.service                enabled  
mariadb.service                       enabled  
mysql.service                         enabled  
mysqld.service                        enabled  
netfilter-persistent.service          enabled  
networkd-dispatcher.service           enabled  
ondemand.service                      enabled  
open-iscsi.service                    enabled  
open-vm-tools.service                 enabled  
pollinate.service                     enabled  
postfix.service                       enabled  
rsync.service                         enabled  
rsyslog.service                       enabled  
salt-minion.service                   enabled  
setvtrgb.service                      enabled  
ssh.service                           enabled  
sshd.service                          enabled  
syslog.service                        enabled  
systemd-resolved.service              enabled  
systemd-timesyncd.service             enabled  
ufw.service                           enabled  
unattended-upgrades.service           enabled  
ureadahead.service                    enabled  
veeamservice.service                  generated
vgauth.service                        enabled  
vnstat.service                        enabled  
vnstatd.service                       enabled  
zabbix-agent.service                  enabled  
zabbix-server.service                 enabled  

52 unit files listed.

Comment by dimir [ 2019 Feb 06 ]

This has already been discussed. Let me share some quotes from IRC:

richlv:

<Richlv> dimir, what would be the desired thing to do when reaching the timeout ?
<Richlv> it seems like in case of zabbix just killing it wouldn't be a good idea anyway
<Richlv> think db upgrade
<Richlv> so you have to think about the longest db upgrade expected
<Richlv> which can easily be hours. so is it worth setting a timeout of, let's say 10 hours ? in old versions, people sometimes had to wait for days, so... maybe even a week ?
<Richlv> so at that point the timeout value becomes quite arbitrary
<Richlv> easier to set it to infinity and document that :)

volter:

<volter> dimir: The "starting in upgrade situations" is probably only relevant, if you don't run the server in the foreground, which I do in Fedora.
<volter> I wonder what the implicit defaults are anyway!
<volter> Defaults to DefaultTimeoutStartSec= from the manager configuration file
<volter> That's 90 seconds in my case.
<volter> Let's think about how bad it could be if you killed Zabbix on an upgrade: Probably not very bad.
<volter> The worst thing that can happen with the history syncer (which creates trends, if I'm not wrong), is: Not much
<volter> And if you need to shut it down, you'll lose some data anyway, unless it's buffered elsewhere.
<volter> I see no compelling reason.
<volter> Why might want to compare this to PG, for instance.
<volter> Where transactions could remain open for hours.
<volter> What is more: https://bugzilla.redhat.com/show_bug.cgi?id=1446015
<volter> "Configures the time to wait for stop. If a service is asked to stop, but does not terminate in the specified time, it will be terminated forcibly via SIGTERM, and after another timeout of equal duration with SIGKILL" even
<volter> (For the stopping part, of course)
<volter> I suggest to don't touch this at all.
<volter> I can't see a big problem.
<volter> I guess the only feasible problem is dataloss, when it comes to the "Stop" part.
<volter> And 90 seconds is a lot, plus, there are 90 more seconds.
<volter> And as far as the startup goes: I saw foreground as the solution, which is also easier for systemd to track and you are getting rid of that "duplicate" pidfile specification.
<volter> However, this has consequences on the logging.
<volter> The systemd journal will capture anything that's emmited through syslog, stdout and stderr.
<volter> See bugzilla ticket!
<volter> Experience tells me, if there is no good reason to change something: Don't.
<dimir> So what do you suggest for those that find 90 seconds not enough?
<volter> I would try to figure out why it's taking so long and if shot down, whether anything critical is happening.
<volter> Furthermore, everybody can easily put their own unit file in /etc/systemd/system to override what the vendor ships.

kodai:

but I remember that some earlier RHEL7 does not support TimeoutSec=infinity
it depends on systemd version, process does not start with TimeoutSec=infinity
then, someone gave me an advice  TimeoutSec=0 is same as infinity
so, I think TimeoutSec=infinity on debian/ubuntu, TimeoutSec=0 on RHEL

 

Comment by dimir [ 2019 Feb 06 ]

There is a reason why there is TimeoutSec=Infinity, but we should document why it is so and how to overcome it.

Comment by Tim White [ 2019 Feb 08 ]

I can see from the IRC logs that TimeoutSec=infinity is intentional, but possibly still misunderstood. SystemD won't wait forever, in the case of a shutdown on Ubuntu, at 30 minutes it will force kill it even with TimeoutSec=infinity.

However, regardless of that, we can actually fix this issue of a long shutdown by fixing the dependencies.

[Unit]
Description=Zabbix Server
After=syslog.target
After=network.target
After=mysql.service

The problem of the long shutdown was that it shuts down MySQL before Zabbix, which is why Zabbix didn't exit. Other than documenting why TimeoutSec=infinity (ideally as a comment in the SystemD file too), we should fix the dependencies of the SystemD file.

Comment by Denis Pantsyrev [ 2019 Jun 27 ]

Still unresolved

Update from 4.0.9 to 4.0.10 takes about 20 minutes. I fix it manually after each update, it's boring. Please add these fixes like in @Tim White comment. It's easy to fix but it's improve product performance!

Regards,
Denis

Comment by Benoît Locher [ 2019 Jul 02 ]

I had the same problem when rebooting my Debian server (Stretch 9.9) : Zabbix service v4.0.10 would hang forever.

Adding the following line (following advice from Tim) :

After=postgresql.service

in [Unit] section solved the problem.

Comment by Tim White [ 2019 Jul 05 ]

I don't think changing this to "Status: Needs documenting" is the right fix. As explained in my earlier comments, the fix is to ensure the dependencies are correct. Yes, documenting why we have a long timeout (even infinity if you guys still want it) is needed, but we really need to fix the dependency order.

Comment by dimir [ 2019 Jul 15 ]

timw_suqld, we can't depend on mysql service for 2 reasons:

  1. They might be using a flavor of MySQL, e. g. MariaDB (mariadb.service in this case).
  2. The database might be running on a separate, dedicated host.

Looks like documenting is the only thing we can do.

Comment by Tim White [ 2019 Jul 16 ]

We can use After with optional dependencies. (https://unix.stackexchange.com/questions/423722/systemd-service-file-with-optional-dependency)

Also, mariadb often provides an alias, so mysql.service is enough to catch MySQL and MariaDB.

So something like the following will fix the dependencies without forcing them to use a particular SQL server, or running it on the same server:

[Unit]
Description=Zabbix Server
After=syslog.target
After=network.target
Wants=mysql.service
After=mysql.service
Wants=postgresql.service
After=postgresql.service

I still think that TimeoutSec=infinity should be fixed (it really doesn't do what people think it does), but at least if you fix the dependencies, it's less likely to bite people trying to shutdown/reboot servers.

Comment by Marek Krolikowski [ 2019 Aug 12 ]

Hey Guys!

I got same problem with Zabbix 4.0.11 on Debian 10.

But Tim got right how to repair this problem.

timw_suqld Thanks!
For me adding this working properly. I use Debian 10 with MariaDB installed on this same machine.

Wants=mysql.service
After=mysql.service
Wants=postgresql.service
After=postgresql.service
Comment by dimir [ 2019 Aug 13 ]

This is all good and in theory we could list all available MySQL flavors in zabbix-server-mysql package:

Wants=mysql.service
Wants=mariadb.service
Wants=percona.service

But there's no way to detect which database (local or remote) the Zabbix server uses. So, even if you have MySQL running locally there could be a situation when Zabbix server does not depend on it, just because it uses remote database. Sorry, there's no clear way, working for all situations, how we could change anything in current situation.

I guess the best way for you would be currently to use

systemctl edit zabbix-server

https://askubuntu.com/questions/659267/how-do-i-override-or-configure-systemd-services

That thing we could document.

Comment by richlv [ 2019 Aug 13 ]

Having local MySQL but using a remote is an edge case, document it.
Adding those entries might help majority of the users.

Comment by Vladislavs Sokurenko [ 2019 Aug 13 ]

related issue ZBX-16078

Comment by Glebs Ivanovskis [ 2019 Aug 13 ]

I totally agree with richlv. I my understanding the majority of Zabbix installations will have Zabbix server/proxy and DB server on the same box. Among the rest who will use a dedicated DB server, Zabbix will likely run on a dedicated box as well. And the proposed change should not affect a very unlikely use case mentioned by dimir in any detrimental way.

Comment by dimir [ 2019 Aug 13 ]

Imagine you have some broken local installation of MySQL that you used sometimes for testing, that is not working anymore. Adding proposed changes becomes regression for such setup.

Comment by Glebs Ivanovskis [ 2019 Aug 13 ]

Sorry, I can't push my imagination that far. This sounds like even more of an edge case, almost like "what if dinosaur comments on this ticket".

Comment by richlv [ 2019 Aug 13 ]

The said dinosaur could have hacked into the box and deliberately installed MySQL there to mess with the user.
I know, I know - and edge case - but who can disprove it?

Comment by dimir [ 2019 Aug 13 ]

Not many of us have big experience in packaging, very complex and interesting area. Not many imagine all the aspects of it, how tiny little change can break things for some users out there far away, yes, with the OS versions from dinosaur times, how different the installations are... Not many of us know and not many of us care.

In my opinion the worst thing in packaging is regression. And I'm not interested in any details: if I have everything working for years and this upgrade breaks my installation - I become very desperate and I will not think of the software as stable any more.

Comment by dimir [ 2019 Aug 13 ]

Additional things to check the behavior if we are to modify something (thanks, kodai!):

  • how does zabbix-server act if we list After and Wants = mysql/mariadb/percona, but the mysql/mariadb/percona is disabled (do we wait for it forever)
  • make sure the behavior does not depend on specific systemd version
Comment by Jackie Hunt [ 2019 Aug 22 ]

I ran into this issue with postgresql.  Please include it in any fix and/or documentation.

Comment by Tim White [ 2019 Aug 23 ]

Wants=

A weaker version of Requires=. Units listed in this option will be started if the configuring unit is. However, if the listed units fail to start or cannot be added to the transaction, this has no impact on the validity of the transaction as a whole. This is the recommended way to hook start-up of one unit to the start-up of another unit.

This should prevent regressions. The main issue is startup/shutdown order. Currently, if the database engine shutsdown before Zabbix, we end up with Zabbix unable to shutdown correctly, and so the timeout is an issue. With Wants, if the service fails to start/stop, we get the same behavior as currently, we still try and start/stop Zabbix. This is the advantage of Wants of Require in this situation.

And with the After= tags, the order is defined.

Regarding systemd version, I can't find a changelog entry for when it was added, but I see references > 3 years old about using it, so I get a feeling it's been around a long time.

Comment by Vladislavs Sokurenko [ 2019 Sep 30 ]

Zabbix server requires a running database, if database is not available then it cannot be shut down properly without loosing collected history. That is why Zabbix server is waiting for the database to be UP again.

Comment by Tim White [ 2019 Oct 01 ]

@Vladislavs, this is why the dependencies need to be fixed. If the dependencies are fixed, it'll ensure at shutdown that Zabbix Server shuts down BEFORE the database shuts down. When the server is shutting down, the database isn't going to come back up to allow Zabbix to shutdown.

Also, at some point you need to declare that data as lost, if you've not had a database available in say 10 minutes, it's probably not going to come back, so loss of data will occur. Given that Zabbix is trying to shutdown anyway, it shouldn't be collecting new data, and so some loss of data at shutdown is acceptable.

Comment by Adam Garrett [ 2019 Oct 02 ]

I just started experiencing this issue as well today. Adding the line After=mysql.service resolved this issue.

Thanks, Tim.

Comment by Krasherwares [ 2019 Oct 09 ]

При установке Zabbix с использованием образа debian-9.5.0-i386-xfce-CD-1.iso получил ошибку:

Дьявол крылся в мелочах. Как и сказал timw_suqld. Надо дописать:
After=mysql.service
В секции [Unit] файла zabbix-server.service:
/etc/systemd/system/multi-user.target.wants/

Проблема зависания при перезагрузке ушла.

Comment by Marcel Wiechmann [ 2019 Oct 09 ]

Also suffering under the same issues here and editing the zabbix-server.service file fixed the problem. I only want to mention that a documentation should mention the different options for the After and Wants value for the MySQL Installation (mysql.service or mariadb.service).

Comment by dimir [ 2019 Oct 14 ]

Krasherwares, this issue tracker is international, please use only English language.

Comment by Krasherwares [ 2019 Oct 16 ]

dimir, no problem (the same in English):
When installing Zabbix using the debian-9.5.0-i386-xfce-CD-1 image.ISO got error:

The devil is in the details. Just like Tim White said. Need to finish:

After=mysql.service

In the [Unit] section of the file zabbix-server.service:
/etc/systemd/system/multi-user.target.wants/

The hang problem on reboot is gone.

Comment by tbsky [ 2019 Oct 22 ]

the current packaging way is broken for 99% of users. if you want to document, then document it for the 1% edge usage. but please make the package works by default for rest 99% of users.

Comment by Oleksii Zagorskyi [ 2019 Dec 08 ]

I came here after my own frustration ...
I updated my OS/zabbix on Raspberypi and switched to systemd.

That was sad to discover that we don't have dependencies and on OS reboot I had to wait ~30 minutes until systemd finally kills zabbix_server, which lost connection to mariadb and retried to connect, while ignoring received SIGTERM at all.

As we see, while issue summary is about TimeoutSec=infinity (or 0, for systemd version < 229), looking to discussion it's clear that 99% of complains would be resolved by just defining order (not dependency) on services start and, most important - termination !

I did test, as there were concerns and can say for sure that adding just one line "After=mysql.service" resolves the issue.
It's safe to list many possible engines (mysql* backend) and it will not cause any issue. Tested by adding "fakee.service".
Tested also when mariadb.service is disabled. During reboot it did not cause any issue.
Tested also when mariadb start is failed (conf syntax error for intention). Again, during reboot it did not cause any issue.
In all cases zabbix_server daemon has is started and main process is trying to connect.

Interesting that when I add "After=mysql.service", it actually controls "mariadb.service":

# systemctl show zabbix-server  | grep "^After"
After=systemd-journald.socket mariadb.service fakee.service sysinit.target network.target syslog.target basic.target system.slice

that's because of symlinks added. If "After=mysqld.service" is added too, then the command still shows only "mariadb.service" actually added.

On Debian/Ubuntu, MariaDB creates these symlinks, so any unit file name may be used (mysql/mysqld/mariadb):

# systemctl enable mariadb
Created symlink /etc/systemd/system/mysql.service → /lib/systemd/system/mariadb.service.
Created symlink /etc/systemd/system/mysqld.service → /lib/systemd/system/mariadb.service.
Created symlink /etc/systemd/system/multi-user.target.wants/mariadb.service → /lib/systemd/system/mariadb.service.

Because mariadb unit has this sections:

[Install]
WantedBy=multi-user.target
Alias=mysql.service
Alias=mysqld.service

I've also checked Percona packages for different OS:
For RHEL systems unit name is "mysqld" and it adds alias "mysql".
For Debian/Ubuntu unit name is "mysql", no aliases.

MySQL (not MariaDB):
Debian/Ubuntu v5.7 and v8.0 - "mysql", no aliases.
RHEL8 v8.0 - "mysqld", no aliases.

Looks like "mysql" is very compatible for many cases, except of RHEL8/MySQL v8.0, which uses "mysqld" only.
So, we have to add both "mysql" and "mysqld".

For PostgreSQL it's much more simple, everywhere it's "postgresql.service" without aliases.
I think it would be pretty ok to add it as a 3rd line to be unified for all packages:

[Unit]
After=mysql.service
After=mysqld.service
After=postgresql.service

So, dimir, I ask to do that.
This is small and save solution and it would resolve current discussion for 99,9%.

I'll update this issue properties as I'm pretty sure I'll do a proper thing

Comment by dimir [ 2019 Dec 09 ]

It was decided to add After.

Comment by Jurijs Klopovskis [ 2019 Dec 20 ]

Fixed in 3.0.29, 4.0.16 & 4.4.4 releases.

Comment by Oleksii Zagorskyi [ 2019 Dec 21 ]

Just for a record. For %mysql% packages, all 3 services were added for "After" - mysql, mysqld, mariadb.
Ok, let it be.

Anyway, THANK YOU !

Comment by Glebs Ivanovskis [ 2020 Mar 26 ]

Similar issue: ZBX-17492.

Comment by Glebs Ivanovskis [ 2020 Jul 12 ]

ZBX-16609 was closed as a Duplicate of this ticket, but not addressed, as far as I can tell.

Comment by dimir [ 2020 Jul 13 ]

yurii, could you confirm the same logic was applied to PostgreSQL in 4.0.16, 4.4.4 and 5.0.0? As far as I can tell this issue was fixed for both MySQL*/PostgreSQL in packages.

Comment by Jurijs Klopovskis [ 2020 Jul 13 ]

We have

After=syslog.target
After=network.target
After=mysql.service
After=mysqld.service
After=mariadb.service
After=postgresql.service

in the service file.

Debian-based distros typically include versions in systemd service names, thus presumably a simple  After=postgresql.service directive will not cut it.

Must investigate.

Comment by Glebs Ivanovskis [ 2020 Jul 13 ]

dimir, yurii, thank you for looking into it!

Reporter of ZBX-17492 suggests an array of After=postgresql-<version>.service and also suggests to add pgbouncer to the party.

Comment by Александр Иванович Шабуров [ 2021 May 04 ]

Hi!

Here are logs when runing "systemctl stop zabbiх-server" on reboot computer. Das zabbix-server ask postgres for signup and smart shutdown? If so then why 15:11:20.867 errors occures? It is hangs zabbix-server on shutdown computer. If zabbix-server is not the reason for for signup and smart shutdown, how should I know about source this signal?

Thanks

"journalctl -u zabbix-server"

May 04 15:11:20 db-mon-wtc.microsoft.platina.ru systemd[1]: Stopping Zabbix Server...
May 04 15:11:50 db-mon-wtc.microsoft.platina.ru systemd[1]: zabbix-server.service: State 'stop-sigterm' timed out. Killing.
May 04 15:11:50 db-mon-wtc.microsoft.platina.ru systemd[1]: zabbix-server.service: Killing process 2042 (zabbix_server) with signal SIGKILL.
May 04 15:11:50 db-mon-wtc.microsoft.platina.ru systemd[1]: zabbix-server.service: Killing process 2058 (zabbix_server) with signal SIGKILL.

.................................................

May 04 15:11:50 db-mon-wtc.microsoft.platina.ru systemd[1]: zabbix-server.service: Killing process 2097 (zabbix_server) with signal SIGKILL.
May 04 15:11:50 db-mon-wtc.microsoft.platina.ru systemd[1]: zabbix-server.service: Main process exited, code=killed, status=9/KILL
May 04 15:11:50 db-mon-wtc.microsoft.platina.ru systemd[1]: zabbix-server.service: Failed with result 'timeout'.
May 04 15:11:50 db-mon-wtc.microsoft.platina.ru systemd[1]: Stopped Zabbix Server.

"journalctl -u pgpro"

May 04 15:11:50 db-mon-wtc.microsoft.platina.ru systemd[1]: Stopping PostgreSQL database server...
May 04 15:11:50 db-mon-wtc.microsoft.platina.ru pgpro.systemd[2884]: stop

postgres log

2021-05-04 15:11:20.474 MSK [1611] LOG: received SIGHUP, reloading configuration files
2021-05-04 15:11:20.476 MSK [1611] LOG: received smart shutdown request
2021-05-04 15:11:20.479 MSK [1781] LOG: terminating TimescaleDB job scheduler due to administrator command
2021-05-04 15:11:20.479 MSK [1781] FATAL: terminating connection due to administrator command
2021-05-04 15:11:20.481 MSK [2059] FATAL: terminating connection due to administrator command
2021-05-04 15:11:20.482 MSK [1775] LOG: terminating TimescaleDB background worker launcher due to administrator command
2021-05-04 15:11:20.482 MSK [2099] FATAL: terminating connection due to administrator command
2021-05-04 15:11:20.483 MSK [1775] FATAL: terminating connection due to administrator command
2021-05-04 15:11:20.484 MSK [2100] FATAL: terminating connection due to administrator command
2021-05-04 15:11:20.484 MSK [1780] LOG: terminating TimescaleDB job scheduler due to administrator command
2021-05-04 15:11:20.484 MSK [1780] FATAL: terminating connection due to administrator command

..............................

2021-05-04 15:11:20.499 MSK [2121] FATAL: terminating connection due to administrator command
2021-05-04 15:11:20.549 MSK [1611] LOG: background worker "TimescaleDB Background Worker Launcher" (PID 1775) exited with exit code 1
2021-05-04 15:11:20.549 MSK [1611] LOG: background worker "logical replication launcher" (PID 1776) exited with exit code 1
2021-05-04 15:11:20.549 MSK [1611] LOG: background worker "TimescaleDB Background Worker Scheduler" (PID 1780) exited with exit code 1
2021-05-04 15:11:20.549 MSK [1611] LOG: background worker "TimescaleDB Background Worker Scheduler" (PID 1781) exited with exit code 1
2021-05-04 15:11:20.550 MSK [1769] LOG: shutting down
2021-05-04 15:11:20.867 MSK [2797] FATAL: the database system is shutting down
2021-05-04 15:11:20.868 MSK [2796] FATAL: the database system is shutting down
2021-05-04 15:11:20.876 MSK [2798] FATAL: the database system is shutting down
2021-05-04 15:11:25.935 MSK [1611] LOG: database system is shut down

zabbix-server log in the same time

2042:20210504:151120.541 Got signal [signal:15(SIGTERM),sender_pid:2771,sender_uid:0,reason:0]. Exiting ...
2066:20210504:151120.542 syncing history data in progress...
2066:20210504:151120.542 [Z3005] query failed: [0] PGRES_FATAL_ERROR:FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[begin;]
2072:20210504:151120.546 [Z3005] query failed: [0] PGRES_FATAL_ERROR:FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[select taskid,type,clock,ttl from task where status in (1,2) order by taskid]
2097:20210504:151120.548 [Z3005] query failed: [0] PGRES_FATAL_ERROR:FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[begin;]
2072:20210504:151120.868 [Z3001] connection to database 'zabbix' failed: [0] FATAL: the database system is shutting down

2072:20210504:151120.868 database is down: reconnecting in 10 seconds
2066:20210504:151120.869 [Z3001] connection to database 'zabbix' failed: [0] FATAL: the database system is shutting down

2066:20210504:151120.869 database is down: reconnecting in 10 seconds
2097:20210504:151120.877 [Z3001] connection to database 'zabbix' failed: [0] FATAL: the database system is shutting down

2097:20210504:151120.878 database is down: reconnecting in 10 seconds
2072:20210504:151130.869 [Z3001] connection to database 'zabbix' failed: [0] could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5456?

2072:20210504:151130.869 database is down: reconnecting in 10 seconds
2066:20210504:151130.869 [Z3001] connection to database 'zabbix' failed: [0] could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5456?

2066:20210504:151130.869 database is down: reconnecting in 10 seconds
2097:20210504:151130.878 [Z3001] connection to database 'zabbix' failed: [0] could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5456?

2097:20210504:151130.878 database is down: reconnecting in 10 seconds
2072:20210504:151140.869 [Z3001] connection to database 'zabbix' failed: [0] could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5456?

2072:20210504:151140.869 database is down: reconnecting in 10 seconds
2066:20210504:151140.870 [Z3001] connection to database 'zabbix' failed: [0] could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5456?

2066:20210504:151140.870 database is down: reconnecting in 10 seconds
2097:20210504:151140.878 [Z3001] connection to database 'zabbix' failed: [0] could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5456?

2097:20210504:151140.878 database is down: reconnecting in 10 seconds

 

 

Comment by Jurijs Klopovskis [ 2021 May 05 ]

Hi shab2,

The issue is with the database being shut down before Zabbix server had time to sync data.

To mitigate this problem we have added several After statements to the server and proxy systemd unit files.

[Unit]
Description=Zabbix Server
After=syslog.target
After=network.target
After=mysql.service
After=mysqld.service
After=mariadb.service
After=postgresql.service
After=pgbouncer.service
After=postgresql-9.4.service
After=postgresql-9.5.service
After=postgresql-9.6.service
After=postgresql-10.service
After=postgresql-11.service
After=postgresql-12.service
After=postgresql-13.service

This should cover most people.

If this does not work for you,  it is always possible to add a similar directive for the database server unit on your own using systemctl edit command.

Generated at Fri Apr 26 23:09:18 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.