[ZBX-11203] database upgrade from Zabbix 3.0 to Zabbix 3.2 fails Created: 2016 Sep 14 Updated: 2024 Apr 10 Resolved: 2019 Apr 03 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 3.2.0 |
Fix Version/s: | 4.0.7rc1, 4.2.1rc1, 4.4.0alpha1, 4.4 (plan) |
Type: | Problem report | Priority: | Blocker |
Reporter: | sles | Assignee: | Vladislavs Sokurenko |
Resolution: | Fixed | Votes: | 5 |
Labels: | dbpatches, packaging, upgrade | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
centos 7 x86-64 |
Issue Links: |
|
||||||||||||||||||||||||||||
Team: | |||||||||||||||||||||||||||||
Sprint: | Sprint 50 (Mar 2019), Sprint 51 (Apr 2019) | ||||||||||||||||||||||||||||
Story Points: | 1 |
Description |
During upgrade from 3.0.4 to 3.2.0 I get: 814:20160914:164040.194 current database version (mandatory/optional): 03010002/03010002 814:20160914:164040.194 required mandatory version: 03020000 814:20160914:164040.194 starting automatic database upgrade 814:20160914:164040.195 [Z3005] query failed: [1091] Can't DROP 'id'; check that column/key exists [alter table history_text drop column id] 814:20160914:164040.195 database upgrade failed |
Comments |
Comment by Aleksandrs Saveljevs [ 2016 Sep 14 ] | ||||||||||||||||||||||||
Could you please post the structure of table "history_text" before the upgrade? What database engine do you use? | ||||||||||||||||||||||||
Comment by sles [ 2016 Sep 14 ] | ||||||||||||||||||||||||
I looked into logs more deeply while restoring database from backup and found that for some reason zabbix server was stopped during upgrade: 38158:20160914:163545.748 current database version (mandatory/optional): 03000000/03000000 38158:20160914:163545.748 required mandatory version: 03020000 38158:20160914:163545.748 starting automatic database upgrade 38158:20160914:163545.917 completed 1% of database upgrade 38158:20160914:163546.805 completed 2% of database upgrade 38158:20160914:163546.903 completed 3% of database upgrade 38158:20160914:163632.375 Got signal [signal:15(SIGTERM),sender_pid:38837,sender_uid:0,reason:0]. Exiting ... This is why I got error message then. Could you , please, close this bug report? | ||||||||||||||||||||||||
Comment by Aleksandrs Saveljevs [ 2016 Sep 14 ] | ||||||||||||||||||||||||
Thank you! Closing as "Won't fix" then. | ||||||||||||||||||||||||
Comment by Alexander Yesipov [ 2016 Sep 14 ] | ||||||||||||||||||||||||
Same for me: 24288:20160914:154628.225 current database version (mandatory/optional): 03000000/03000000 24288:20160914:154628.225 required mandatory version: 03020000 24288:20160914:154628.225 starting automatic database upgrade 24288:20160914:154628.530 completed 1% of database upgrade 24288:20160914:154628.741 Got signal [signal:15(SIGTERM),sender_pid:24323,sender_uid:0,reason:0]. Exiting ... Linux 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 04:13:05 CDT 2016 x86_64 x86_64 x86_64 GNU/Linux update via yum without manual server stopping | ||||||||||||||||||||||||
Comment by richlv [ 2016 Sep 14 ] | ||||||||||||||||||||||||
if those were the zabbix-provided packages and the server was not accidentally stopped manually (by attempting to restart it, for example), there might indeed be something to fix on the zabbix side - but we can also wait and see whether anybody else faces such a problem | ||||||||||||||||||||||||
Comment by Alexander Yesipov [ 2016 Sep 14 ] | ||||||||||||||||||||||||
Yep, packages were from zabbix repo. Overall process: | ||||||||||||||||||||||||
Comment by Alexander Yesipov [ 2016 Sep 14 ] | ||||||||||||||||||||||||
I've got second zabbix3.0 instance which I gonna update tonight. Current history_log structure is: mysql> describe history_log; +------------+---------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +------------+---------------------+------+-----+---------+-------+ | id | bigint(20) unsigned | NO | PRI | NULL | | | itemid | bigint(20) unsigned | NO | MUL | NULL | | | clock | int(11) | NO | | 0 | | | timestamp | int(11) | NO | | 0 | | | source | varchar(64) | NO | | | | | severity | int(11) | NO | | 0 | | | value | text | NO | | NULL | | | logeventid | int(11) | NO | | 0 | | | ns | int(11) | NO | | 0 | | +------------+---------------------+------+-----+---------+-------+ I will post feedback here. | ||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2016 Sep 14 ] | ||||||||||||||||||||||||
Just a few question appeared in my mind: Note - in both cases posted above (by different users), sender_pid:NNNNN is higher that PID of running main process, so the SIGTERM signal sender has been started later after zabbix server daemon started. We need kodai.terashima attention here too. | ||||||||||||||||||||||||
Comment by Alexander Yesipov [ 2016 Sep 14 ] | ||||||||||||||||||||||||
I did upgrade and results are the same. Here some logs: 1786:20160914:220927.619 Got signal [signal:15(SIGTERM),sender_pid:25289,sender_uid:0,reason:0]. Exiting ... 1786:20160914:220929.631 syncing history data... 1786:20160914:220929.632 syncing history data done 1786:20160914:220929.632 syncing trends data... 1786:20160914:220929.849 syncing trends data done 1786:20160914:220929.850 Zabbix Server stopped. Zabbix 3.0.4 (revision 61185). 25294:20160914:220929.912 Starting Zabbix Server. Zabbix 3.2.0 (revision 62485). 25294:20160914:220929.912 ****** Enabled features ****** 25294:20160914:220929.912 SNMP monitoring: YES 25294:20160914:220929.912 IPMI monitoring: YES 25294:20160914:220929.912 Web monitoring: YES 25294:20160914:220929.912 VMware monitoring: YES 25294:20160914:220929.912 SMTP authentication: YES 25294:20160914:220929.912 Jabber notifications: YES 25294:20160914:220929.912 Ez Texting notifications: YES 25294:20160914:220929.912 ODBC: YES 25294:20160914:220929.912 SSH2 support: YES 25294:20160914:220929.912 IPv6 support: YES 25294:20160914:220929.912 TLS support: YES 25294:20160914:220929.912 ****************************** 25294:20160914:220929.912 using configuration file: /etc/zabbix/zabbix_server.conf 25294:20160914:220929.922 current database version (mandatory/optional): 03000000/03000000 25294:20160914:220929.922 required mandatory version: 03020000 25294:20160914:220929.922 starting automatic database upgrade 25294:20160914:220929.956 completed 1% of database upgrade 25294:20160914:220930.130 Got signal [signal:15(SIGTERM),sender_pid:25329,sender_uid:0,reason:0]. Exiting ... 25294:20160914:220932.225 syncing history data... 25294:20160914:220932.225 syncing history data done 25294:20160914:220932.225 syncing trend data... 25294:20160914:220932.225 syncing trend data done 25294:20160914:220932.225 Zabbix Server stopped. Zabbix 3.2.0 (revision 62485). 25334:20160914:220932.242 Starting Zabbix Server. Zabbix 3.2.0 (revision 62485). 25334:20160914:220932.242 ****** Enabled features ****** 25334:20160914:220932.242 SNMP monitoring: YES 25334:20160914:220932.242 IPMI monitoring: YES 25334:20160914:220932.242 Web monitoring: YES 25334:20160914:220932.242 VMware monitoring: YES 25334:20160914:220932.242 SMTP authentication: YES 25334:20160914:220932.242 Jabber notifications: YES 25334:20160914:220932.242 Ez Texting notifications: YES 25334:20160914:220932.242 ODBC: YES 25334:20160914:220932.242 SSH2 support: YES 25334:20160914:220932.242 IPv6 support: YES 25334:20160914:220932.242 TLS support: YES 25334:20160914:220932.242 ****************************** 25334:20160914:220932.242 using configuration file: /etc/zabbix/zabbix_server.conf 25334:20160914:220932.247 current database version (mandatory/optional): 03010000/03010000 25334:20160914:220932.247 required mandatory version: 03020000 25334:20160914:220932.247 starting automatic database upgrade 25334:20160914:220932.248 [Z3005] query failed: [1091] Can't DROP 'id'; check that column/key exists [alter table history_log drop column id] 25334:20160914:220932.248 database upgrade failed systemd logs Sep 14 22:09:27 hostname systemd[1]: Stopping Zabbix Server... Sep 14 22:09:29 hostname systemd[1]: Starting Zabbix Server... Sep 14 22:09:29 hostname systemd[1]: PID file /run/zabbix/zabbix_server.pid not readable (yet?) after start. Sep 14 22:09:29 hostname systemd[1]: Started Zabbix Server. Sep 14 22:09:30 hostname systemd[1]: Stopping Zabbix Server... Sep 14 22:09:32 hostname systemd[1]: Starting Zabbix Server... Sep 14 22:09:32 hostname systemd[1]: Started Zabbix Server. Sep 14 22:09:32 hostname systemd[1]: zabbix-server.service: main process exited, code=exited, status=1/FAILURE Sep 14 22:09:32 hostname kill[25336]: Usage: Sep 14 22:09:32 hostname kill[25336]: kill [options] <pid|name> [...] Sep 14 22:09:32 hostname kill[25336]: Options: Sep 14 22:09:32 hostname kill[25336]: -a, --all do not restrict the name-to-pid conversion to processes Sep 14 22:09:32 hostname kill[25336]: with the same uid as the present process Sep 14 22:09:32 hostname kill[25336]: -s, --signal <sig> send specified signal Sep 14 22:09:32 hostname kill[25336]: -q, --queue <sig> use sigqueue(2) rather than kill(2) Sep 14 22:09:32 hostname kill[25336]: -p, --pid print pids without signaling them Sep 14 22:09:32 hostname kill[25336]: -l, --list [=<signal>] list signal names, or convert one to a name Sep 14 22:09:32 hostname kill[25336]: -L, --table list signal names and numbers Sep 14 22:09:32 hostname kill[25336]: -h, --help display this help and exit Sep 14 22:09:32 hostname kill[25336]: -V, --version output version information and exit Sep 14 22:09:32 hostname kill[25336]: For more details see kill(1). Sep 14 22:09:32 hostname systemd[1]: zabbix-server.service: control process exited, code=exited status=1 Sep 14 22:09:32 hostname systemd[1]: Unit zabbix-server.service entered failed state. Sep 14 22:09:32 hostname systemd[1]: zabbix-server.service failed. Sep 14 22:09:42 hostname systemd[1]: zabbix-server.service holdoff time over, scheduling restart. | ||||||||||||||||||||||||
Comment by Horigome Yoshihito [ 2016 Sep 15 ] | ||||||||||||||||||||||||
I also have the same symptoms has occurred. zabbix log
32673:20160915:102757.191 Starting Zabbix Server. Zabbix 3.2.0 (revision 62485).
32673:20160915:102757.191 ****** Enabled features ******
32673:20160915:102757.191 SNMP monitoring: YES
32673:20160915:102757.191 IPMI monitoring: YES
32673:20160915:102757.191 Web monitoring: YES
32673:20160915:102757.191 VMware monitoring: YES
32673:20160915:102757.191 SMTP authentication: YES
32673:20160915:102757.191 Jabber notifications: YES
32673:20160915:102757.191 Ez Texting notifications: YES
32673:20160915:102757.191 ODBC: YES
32673:20160915:102757.191 SSH2 support: YES
32673:20160915:102757.191 IPv6 support: YES
32673:20160915:102757.191 TLS support: YES
32673:20160915:102757.191 ******************************
32673:20160915:102757.191 using configuration file: /etc/zabbix/zabbix_server.conf
32673:20160915:102757.201 current database version (mandatory/optional): 03010000/03010000
32673:20160915:102757.201 required mandatory version: 03020000
32673:20160915:102757.201 starting automatic database upgrade
32673:20160915:102757.201 [Z3005] query failed: [1091] Can't DROP 'id'; check that column/key exists [alter table history_log drop column id]
32673:20160915:102757.202 database upgrade failed
system log Sep 15 10:28:27 zabbix-kometch systemd[1]: zabbix-server.service holdoff time over, scheduling re Sep 15 10:28:27 zabbix-kometch systemd[1]: Starting Zabbix Server... Sep 15 10:28:27 zabbix-kometch systemd[1]: PID file /run/zabbix/zabbix_server.pid not readable (y Sep 15 10:28:27 zabbix-kometch systemd[1]: Started Zabbix Server. Sep 15 10:28:27 zabbix-kometch systemd[1]: zabbix-server.service: main process exited, code=exite Sep 15 10:28:27 zabbix-kometch kill[32692]: Usage: Sep 15 10:28:27 zabbix-kometch kill[32692]: kill [options] <pid|name> [...] Sep 15 10:28:27 zabbix-kometch kill[32692]: Options: Sep 15 10:28:27 zabbix-kometch kill[32692]: -a, --all do not restrict the name-to-pi Sep 15 10:28:27 zabbix-kometch kill[32692]: with the same uid as the present process Sep 15 10:28:27 zabbix-kometch kill[32692]: -s, --signal <sig> send specified signal Sep 15 10:28:27 zabbix-kometch kill[32692]: -q, --queue <sig> use sigqueue(2) rather than ki Sep 15 10:28:27 zabbix-kometch kill[32692]: -p, --pid print pids without signaling t Sep 15 10:28:27 zabbix-kometch kill[32692]: -l, --list [=<signal>] list signal names, or convert Sep 15 10:28:27 zabbix-kometch kill[32692]: -L, --table list signal names and numbers Sep 15 10:28:27 zabbix-kometch kill[32692]: -h, --help display this help and exit Sep 15 10:28:27 zabbix-kometch kill[32692]: -V, --version output version information and exit Sep 15 10:28:27 zabbix-kometch kill[32692]: For more details see kill(1). Sep 15 10:28:27 zabbix-kometch systemd[1]: zabbix-server.service: control process exited, code=ex Sep 15 10:28:27 zabbix-kometch systemd[1]: Unit zabbix-server.service entered failed state. Sep 15 10:28:27 zabbix-kometch systemd[1]: zabbix-server.service failed. It has been confirmed that there is no id field. DB scheme MariaDB [zabbix]> describe history_log; +------------+---------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +------------+---------------------+------+-----+---------+-------+ | itemid | bigint(20) unsigned | NO | MUL | NULL | | | clock | int(11) | NO | | 0 | | | timestamp | int(11) | NO | | 0 | | | source | varchar(64) | NO | | | | | severity | int(11) | NO | | 0 | | | value | text | NO | | NULL | | | logeventid | int(11) | NO | | 0 | | | ns | int(11) | NO | | 0 | | +------------+---------------------+------+-----+---------+-------+ 8 rows in set (0.00 sec) | ||||||||||||||||||||||||
Comment by sles [ 2016 Sep 15 ] | ||||||||||||||||||||||||
Hello! Yesterday I completed upgrade by running zabbix server 3.2.0 manually for first run. | ||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2016 Sep 15 ] | ||||||||||||||||||||||||
We should probably protect critical DB upgrade sections by blocking signals like was requested in ZBX-10630 for cache operations.
| ||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2016 Sep 15 ] | ||||||||||||||||||||||||
systemd may send SIGKILL as well, after an app does not react on sent SIGTERM, so protecting zabbix_server by signals will not help, IMO. | ||||||||||||||||||||||||
Comment by richlv [ 2016 Sep 15 ] | ||||||||||||||||||||||||
i think systemd unit file says that there must be more than one zabbix server process. if only one is there, something's not good & it must be shot. | ||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2016 Sep 15 ] | ||||||||||||||||||||||||
Dear zalex_ua, signals are not sent uniquely by systemd. | ||||||||||||||||||||||||
Comment by sles [ 2016 Sep 15 ] | ||||||||||||||||||||||||
> User can sabotage upgrade process this way too. No , really, user can't . | ||||||||||||||||||||||||
Comment by Blake Burgess [ 2016 Sep 16 ] | ||||||||||||||||||||||||
Ok so got hit with this today. | ||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2016 Sep 16 ] | ||||||||||||||||||||||||
Blake, disable/enable zabbix-server service was not required. Stop/start it would be enough. | ||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2016 Sep 16 ] | ||||||||||||||||||||||||
Richlv, in unit file I don't see obvious setting which would require more than one process:
[Unit]
Description=Zabbix Server
After=syslog.target
After=network.target
[Service]
Environment="CONFFILE=/etc/zabbix/zabbix_server.conf"
EnvironmentFile=-/etc/sysconfig/zabbix-server
Type=forking
Restart=on-failure
PIDFile=/run/zabbix/zabbix_server.pid
KillMode=control-group
ExecStart=/usr/sbin/zabbix_server -c $CONFFILE
ExecStop=/bin/kill -SIGTERM $MAINPID
RestartSec=10s
[Install]
WantedBy=multi-user.target
But closests thing is Type=forking might have have some affect, should be checked | ||||||||||||||||||||||||
Comment by Andris Zeila [ 2016 Sep 19 ] | ||||||||||||||||||||||||
I'm not sure if the server was restarted because it appeared as failed to systemd or it was just normal upgrade process. From the posted systemd log it looks like server was simply restarted again: Sep 14 22:09:27 hostname systemd[1]: Stopping Zabbix Server... Sep 14 22:09:29 hostname systemd[1]: Starting Zabbix Server... Sep 14 22:09:29 hostname systemd[1]: PID file /run/zabbix/zabbix_server.pid not readable (yet?) after start. Sep 14 22:09:29 hostname systemd[1]: Started Zabbix Server. Sep 14 22:09:30 hostname systemd[1]: Stopping Zabbix Server... Sep 14 22:09:32 hostname systemd[1]: Starting Zabbix Server... Need to check if zabbix is really restarted during normal upgrade process. But the real problem here is killing zabbix during implicit commit statement. Maybe putting such upgrade steps into temporary functions would work, not sure. | ||||||||||||||||||||||||
Comment by Andris Zeila [ 2016 Sep 19 ] | ||||||||||||||||||||||||
I tested upgrade with basic configuration and it did not restart zabbix server twice. On the other hand sleeping server for 10 minutes before database upgrade also did not cause systemd to restart it. wiper Tried with larger history_text table (1m records). During yum update the server was restarted and database successfully upgraded. 4619:20160920:040555.573 Starting Zabbix Server. Zabbix 3.0.4 (revision 61185). 4619:20160920:040555.574 ****** Enabled features ****** 4619:20160920:040555.574 SNMP monitoring: YES 4619:20160920:040555.574 IPMI monitoring: YES 4619:20160920:040555.574 Web monitoring: YES 4619:20160920:040555.574 VMware monitoring: YES 4619:20160920:040555.574 SMTP authentication: YES 4619:20160920:040555.574 Jabber notifications: YES 4619:20160920:040555.574 Ez Texting notifications: YES 4619:20160920:040555.574 ODBC: YES 4619:20160920:040555.574 SSH2 support: YES 4619:20160920:040555.574 IPv6 support: YES 4619:20160920:040555.574 TLS support: YES 4619:20160920:040555.574 ****************************** 4619:20160920:040555.574 using configuration file: /etc/zabbix/zabbix_server.conf 4619:20160920:040555.596 current database version (mandatory/optional): 03000000/03000000 4619:20160920:040555.596 required mandatory version: 03000000 4619:20160920:040555.638 server #0 started [main process] 4661:20160920:040555.646 server #24 started [escalator #1] 4663:20160920:040555.647 server #26 started [self-monitoring #1] 4655:20160920:040555.648 server #18 started [http poller #1] 4647:20160920:040555.648 server #10 started [trapper #2] 4649:20160920:040555.649 server #12 started [trapper #4] 4638:20160920:040555.651 server #2 started [db watchdog #1] 4640:20160920:040555.651 server #3 started [poller #1] 4641:20160920:040555.653 server #4 started [poller #2] 4644:20160920:040555.653 server #7 started [poller #5] 4651:20160920:040555.655 server #14 started [icmp pinger #1] 4652:20160920:040555.655 server #15 started [alerter #1] 4654:20160920:040555.655 server #17 started [timer #1] 4642:20160920:040555.655 server #5 started [poller #3] 4643:20160920:040555.657 server #6 started [poller #4] 4659:20160920:040555.659 server #22 started [history syncer #3] 4657:20160920:040555.659 server #20 started [history syncer #1] 4637:20160920:040555.659 server #1 started [configuration syncer #1] 4660:20160920:040555.662 server #23 started [history syncer #4] 4662:20160920:040555.662 server #25 started [proxy poller #1] 4645:20160920:040555.664 server #8 started [unreachable poller #1] 4646:20160920:040555.666 server #9 started [trapper #1] 4648:20160920:040555.668 server #11 started [trapper #3] 4650:20160920:040555.670 server #13 started [trapper #5] 4653:20160920:040555.672 server #16 started [housekeeper #1] 4658:20160920:040555.672 server #21 started [history syncer #2] 4656:20160920:040555.767 server #19 started [discoverer #1] 4619:20160920:040817.723 Got signal [signal:15(SIGTERM),sender_pid:4819,sender_uid:0,reason:0]. Exiting ... 4619:20160920:040819.746 syncing history data... 4619:20160920:040819.746 syncing history data done 4619:20160920:040819.746 syncing trends data... 4619:20160920:040819.746 syncing trends data done 4619:20160920:040819.747 Zabbix Server stopped. Zabbix 3.0.4 (revision 61185). 4825:20160920:040819.780 Starting Zabbix Server. Zabbix 3.2.0 (revision 62485). 4825:20160920:040819.780 ****** Enabled features ****** 4825:20160920:040819.780 SNMP monitoring: YES 4825:20160920:040819.780 IPMI monitoring: YES 4825:20160920:040819.780 Web monitoring: YES 4825:20160920:040819.780 VMware monitoring: YES 4825:20160920:040819.780 SMTP authentication: YES 4825:20160920:040819.780 Jabber notifications: YES 4825:20160920:040819.780 Ez Texting notifications: YES 4825:20160920:040819.780 ODBC: YES 4825:20160920:040819.780 SSH2 support: YES 4825:20160920:040819.780 IPv6 support: YES 4825:20160920:040819.780 TLS support: YES 4825:20160920:040819.780 ****************************** 4825:20160920:040819.780 using configuration file: /etc/zabbix/zabbix_server.conf 4825:20160920:040819.793 current database version (mandatory/optional): 03000000/03000000 4825:20160920:040819.793 required mandatory version: 03020000 4825:20160920:040819.793 starting automatic database upgrade 4825:20160920:040819.800 completed 1% of database upgrade 4825:20160920:040819.807 completed 2% of database upgrade 4825:20160920:040819.832 completed 3% of database upgrade 4825:20160920:040827.014 slow query: 7.181938 sec, "alter table history_text drop column id" 4825:20160920:040827.048 completed 5% of database upgrade 4825:20160920:040827.097 completed 6% of database upgrade 4825:20160920:040827.120 completed 7% of database upgrade 4825:20160920:040827.128 completed 8% of database upgrade 4825:20160920:040827.143 completed 10% of database upgrade 4825:20160920:040827.151 completed 11% of database upgrade 4825:20160920:040827.155 completed 12% of database upgrade 4825:20160920:040827.174 completed 13% of database upgrade 4825:20160920:040827.183 completed 15% of database upgrade 4825:20160920:040827.191 completed 16% of database upgrade 4825:20160920:040827.199 completed 17% of database upgrade 4825:20160920:040827.201 completed 18% of database upgrade 4825:20160920:040827.218 completed 20% of database upgrade 4825:20160920:040827.227 completed 21% of database upgrade 4825:20160920:040827.230 completed 22% of database upgrade 4825:20160920:040827.259 completed 24% of database upgrade 4825:20160920:040827.302 completed 25% of database upgrade 4825:20160920:040827.358 completed 26% of database upgrade 4825:20160920:040827.369 completed 27% of database upgrade 4825:20160920:040827.377 completed 29% of database upgrade 4825:20160920:040827.383 completed 30% of database upgrade 4825:20160920:040827.384 completed 31% of database upgrade 4825:20160920:040827.389 completed 32% of database upgrade 4825:20160920:040827.389 Action "Report problems to Zabbix administrators" condition "Trigger value = PROBLEM" will be removed during database upgrade: this type of condition is not supported anymore 4825:20160920:040827.390 completed 34% of database upgrade 4825:20160920:040827.402 completed 35% of database upgrade 4825:20160920:040827.416 completed 36% of database upgrade 4825:20160920:040827.422 completed 37% of database upgrade 4825:20160920:040827.429 completed 39% of database upgrade 4825:20160920:040827.434 completed 40% of database upgrade 4825:20160920:040827.438 completed 41% of database upgrade 4825:20160920:040827.446 completed 43% of database upgrade 4825:20160920:040827.463 completed 44% of database upgrade 4825:20160920:040827.475 completed 45% of database upgrade 4825:20160920:040827.480 completed 46% of database upgrade 4825:20160920:040827.498 completed 48% of database upgrade 4825:20160920:040827.516 completed 49% of database upgrade 4825:20160920:040827.528 completed 50% of database upgrade 4825:20160920:040827.529 completed 51% of database upgrade 4825:20160920:040827.530 completed 53% of database upgrade 4825:20160920:040827.536 completed 54% of database upgrade 4825:20160920:040827.543 completed 55% of database upgrade 4825:20160920:040827.548 completed 56% of database upgrade 4825:20160920:040827.565 completed 58% of database upgrade 4825:20160920:040827.572 completed 59% of database upgrade 4825:20160920:040827.576 completed 60% of database upgrade 4825:20160920:040827.593 completed 62% of database upgrade 4825:20160920:040827.610 completed 63% of database upgrade 4825:20160920:040827.615 completed 64% of database upgrade 4825:20160920:040827.631 completed 65% of database upgrade 4825:20160920:040827.637 completed 67% of database upgrade 4825:20160920:040827.641 completed 68% of database upgrade 4825:20160920:040827.646 completed 69% of database upgrade 4825:20160920:040827.649 completed 70% of database upgrade 4825:20160920:040827.666 completed 72% of database upgrade 4825:20160920:040827.675 completed 73% of database upgrade 4825:20160920:040827.683 completed 74% of database upgrade 4825:20160920:040827.687 completed 75% of database upgrade 4825:20160920:040827.693 completed 77% of database upgrade 4825:20160920:040827.697 completed 78% of database upgrade 4825:20160920:040827.701 completed 79% of database upgrade 4825:20160920:040827.703 completed 81% of database upgrade 4825:20160920:040827.718 completed 82% of database upgrade 4825:20160920:040827.725 completed 83% of database upgrade 4825:20160920:040827.726 completed 84% of database upgrade 4825:20160920:040827.728 completed 86% of database upgrade 4825:20160920:040827.732 completed 87% of database upgrade 4825:20160920:040827.739 completed 88% of database upgrade 4825:20160920:040827.750 completed 89% of database upgrade 4825:20160920:040827.762 completed 91% of database upgrade 4825:20160920:040827.768 completed 92% of database upgrade 4825:20160920:040827.776 completed 93% of database upgrade 4825:20160920:040827.781 completed 94% of database upgrade 4825:20160920:040827.786 completed 96% of database upgrade 4825:20160920:040827.796 completed 97% of database upgrade 4825:20160920:040827.797 completed 98% of database upgrade 4825:20160920:040827.797 completed 100% of database upgrade 4825:20160920:040827.797 database upgrade fully completed I don't think that server was killed because of slow database update. | ||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2016 Sep 19 ] | ||||||||||||||||||||||||
Andris, did you do complete 3.0 -> 3.2 upgrade using zabbix packages (yum update..., like people above) ? wiper Yes, as closely I could. I did have empty database (will try with some data later), but I can't find why systemd whould want to kill a running process without enabled watchdog. | ||||||||||||||||||||||||
Comment by Alexander Yesipov [ 2016 Sep 19 ] | ||||||||||||||||||||||||
I've tried to stop and disable zabbix-server unit before upgrading, everything went without process killing. | ||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2016 Sep 20 ] | ||||||||||||||||||||||||
Hello Could you please be so kind and try this manual below to determine why process was killed ? You can start with Do you think that your system might be running out of memory ? | ||||||||||||||||||||||||
Comment by Kodai Terashima [ 2016 Sep 20 ] | ||||||||||||||||||||||||
It seems that the problem is related to systemd, however upgrade succeeded with bit large DB on Ubuntu Xenial. Upgrading process took around 30 seconds, but no problem. I will check it on CentOS as well. Which minor version of CentOS and systemd are you using? The information below are useful to reproduce this problem. cat /etc/redhat-release rpm -qa | grep systemd | ||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2016 Sep 20 ] | ||||||||||||||||||||||||
Vladislavs, it's less likely that OOM killer activity could appear at db upgrade stage (especially before all zabbix processes are started), on systems where zabbix server was previously working successfully. | ||||||||||||||||||||||||
Comment by sles [ 2016 Sep 21 ] | ||||||||||||||||||||||||
[root@zabbix ~]# cat /etc/redhat-release and this is not out of memory , just because | ||||||||||||||||||||||||
Comment by sles [ 2016 Sep 21 ] | ||||||||||||||||||||||||
>Upgrading process took around 30 seconds you have small database, right? | ||||||||||||||||||||||||
Comment by Andris Zeila [ 2016 Sep 21 ] | ||||||||||||||||||||||||
Actually populating problems table from events during 3.0->3.2 database upgrade could easily use a lot of memory on large installations. But this would happen after history table changes and if Zabbix would be killed at this point the dbversion table would correctly reflect the real database state. In the logs posted by Alexander the server was restarted right after starting it first time after upgrade. That's why I'm wondering if it's simply being restarted twice during upgrade process rather than systemd forcing restart because of some startup/running error detection. | ||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2016 Sep 21 ] | ||||||||||||||||||||||||
sles, you missed result of rpm -qa | grep systemd, requested by Kodai | ||||||||||||||||||||||||
Comment by sles [ 2016 Sep 21 ] | ||||||||||||||||||||||||
[root@zabbix ~]# rpm -qa | grep systemd imho, it is impossible to have different versions if one install updates | ||||||||||||||||||||||||
Comment by richlv [ 2016 Sep 21 ] | ||||||||||||||||||||||||
it's definitely not oom-killer. oom-killer won't send you a nice sigterm, you would get sigkilled | ||||||||||||||||||||||||
Comment by Anderson Ayres [ 2016 Nov 22 ] | ||||||||||||||||||||||||
Any news about that?
18671:20161122:124324.208 Starting Zabbix Server. Zabbix 3.2.1 (revision 62890).
18671:20161122:124324.208 ****** Enabled features ******
18671:20161122:124324.208 SNMP monitoring: YES
18671:20161122:124324.208 IPMI monitoring: YES
18671:20161122:124324.208 Web monitoring: YES
18671:20161122:124324.209 VMware monitoring: YES
18671:20161122:124324.209 SMTP authentication: YES
18671:20161122:124324.209 Jabber notifications: YES
18671:20161122:124324.209 Ez Texting notifications: YES
18671:20161122:124324.209 ODBC: YES
18671:20161122:124324.209 SSH2 support: YES
18671:20161122:124324.209 IPv6 support: YES
18671:20161122:124324.209 TLS support: YES
18671:20161122:124324.209 ******************************
18671:20161122:124324.210 using configuration file: /etc/zabbix/zabbix_server.conf
18671:20161122:124324.222 current database version (mandatory/optional): 03010002/03010002
18671:20161122:124324.222 required mandatory version: 03020000
18671:20161122:124324.222 starting automatic database upgrade
18671:20161122:124324.223 [Z3005] query failed: [1091] Can't DROP 'id'; check that column/key exists [alter table history_text drop column id]
18671:20161122:124324.223 database upgrade failed
| ||||||||||||||||||||||||
Comment by Anderson Ayres [ 2016 Nov 22 ] | ||||||||||||||||||||||||
Well, for those who got this error: # systemctl stop zabbix-server # mysql -u user -pPassword mysql> use zabbix; mysql> alter table history_text add column id BIGINT(20); mysql> alter table history_log add column id BIGINT(20); # systemctl start zabbix-server - Wait for upgrade to complete: # tailf /var/log/zabbix/zabbix_server.log Done. | ||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2016 Nov 24 ] | ||||||||||||||||||||||||
What I can observe in attached logs:
Based on these observations I have a suspicion that proper stopping of Zabbix server by systemd is a problem. My hypothesis of what's happening there:
Non-atomicity of steps 4. and 6. leads to a race condition. Probably there should be a sleep interval between stopping and starting service to let systemd killing instinct calm down. | ||||||||||||||||||||||||
Comment by Oleksii Zagorskyi [ 2016 Nov 24 ] | ||||||||||||||||||||||||
It's indeed a real scenario, thank you Gleb ! Maybe for upgrade procedure we could suggest or even require to stop zabbix-server daemon before trying to upgrade to major version? wiper At least for major version upgrade I'd strongly recommend making database backup first (it's also mentioned in server upgrade steps). So that would require manual upgrade procedure. glebs.ivanovskis Quick and dirty fix - sleep() before starting actual upgrade? kodai I think it's not possible to sleep between stop and start for systemd. I guess it's better not to use pidfile, using Type=simple and starting zabbix process in foreground may be a solution for this problem glebs.ivanovskis Some say that service restart is done not by yum/apt-get but by packages' postinst scripts. Kodai, could you please comment on that? kodai Yes, package try restart processes during upgrade if it is running. This behavior is general, same as other packages. glebs.ivanovskis Thank you for response! Is it possible to use stop-sleep-start instead of restart in postinst script if timeout between stop and start is not possible in systemd's restart? By the way, I can provide a "retarder" loadable module if you need to test a slowly stopping server without creating a huge configuration. | ||||||||||||||||||||||||
Comment by sles [ 2016 Nov 25 ] | ||||||||||||||||||||||||
> Maybe for upgrade procedure we could suggest or even require to stop zabbix-server daemon before trying to upgrade to major version? I guess only real way is to disable automatic upgrades in this case... | ||||||||||||||||||||||||
Comment by Alex [ 2016 Dec 01 ] | ||||||||||||||||||||||||
I have the same problrem: 32494:20161201:163201.452 [Z3005] query failed: [1206] The total number of locks exceeds the lock table size [insert into event_recovery (eventid,r_eventid) values | ||||||||||||||||||||||||
Comment by Sinisa [ 2017 Jan 13 ] | ||||||||||||||||||||||||
I had the problem when upgrading from 3.0.7 to 3.2.3. My database is on CentOs 7 with Postgresql 9.2. 17366:20170113:122843.317 Got signal [signal:15(SIGTERM),sender_pid:16561,sender_uid:0,reason:0]. Exiting ... 17365:20170113:122843.318 Got signal [signal:15(SIGTERM),sender_pid:16561,sender_uid:0,reason:0]. Exiting ... 17363:20170113:122844.950 syncing history data... 17363:20170113:122844.994 syncing history data done 17363:20170113:122844.994 syncing trends data... 16593:20170113:122908.208 Starting Zabbix Server. Zabbix 3.2.3 (revision 64610). 16593:20170113:122908.208 ****** Enabled features ****** 16593:20170113:122908.209 SNMP monitoring: YES 16593:20170113:122908.209 IPMI monitoring: YES 16593:20170113:122908.209 Web monitoring: YES 16593:20170113:122908.209 VMware monitoring: YES 16593:20170113:122908.209 SMTP authentication: NO 16593:20170113:122908.209 Jabber notifications: YES 16593:20170113:122908.209 Ez Texting notifications: YES 16593:20170113:122908.209 ODBC: YES 16593:20170113:122908.209 SSH2 support: YES 16593:20170113:122908.209 IPv6 support: NO 16593:20170113:122908.209 TLS support: NO 16593:20170113:122908.209 ****************************** 16593:20170113:122908.209 using configuration file: /opt/zabbix-3.2.3/etc/zabbix_server.conf 16593:20170113:122908.226 current database version (mandatory/optional): 03000000/03000000 16593:20170113:122908.227 required mandatory version: 03020000 16593:20170113:122908.227 starting automatic database upgrade 16593:20170113:122909.043 completed 1% of database upgrade 16593:20170113:122909.477 completed 2% of database upgrade 16593:20170113:122909.477 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: index "history_text_2" does not exist [drop index history_text_2] 16593:20170113:122909.478 database upgrade failed After add index history_text_2 on history_log (create index history_text_2 on history_log (itemid);) upgrade successfully finish! 16708:20170113:130258.051 Starting Zabbix Server. Zabbix 3.2.3 (revision 64610). 16708:20170113:130258.051 ****** Enabled features ****** 16708:20170113:130258.051 SNMP monitoring: YES 16708:20170113:130258.051 IPMI monitoring: YES 16708:20170113:130258.051 Web monitoring: YES 16708:20170113:130258.051 VMware monitoring: YES 16708:20170113:130258.051 SMTP authentication: NO 16708:20170113:130258.051 Jabber notifications: YES 16708:20170113:130258.051 Ez Texting notifications: YES 16708:20170113:130258.051 ODBC: YES 16708:20170113:130258.051 SSH2 support: YES 16708:20170113:130258.052 IPv6 support: NO 16708:20170113:130258.052 TLS support: NO 16708:20170113:130258.052 ****************************** 16708:20170113:130258.052 using configuration file: /opt/zabbix-3.2.3/etc/zabbix_server.conf 16708:20170113:130258.092 current database version (mandatory/optional): 03010001/03010001 16708:20170113:130258.092 required mandatory version: 03020000 16708:20170113:130258.092 starting automatic database upgrade 16708:20170113:130258.340 completed 1% of database upgrade 16708:20170113:130258.391 completed 2% of database upgrade 16708:20170113:130258.710 completed 3% of database upgrade 16708:20170113:130259.137 completed 5% of database upgrade NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "trigger_tag_pkey" for table "trigger_tag" 16708:20170113:130259.145 completed 6% of database upgrade 16708:20170113:130259.147 completed 7% of database upgrade 16708:20170113:130259.165 completed 9% of database upgrade NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "event_tag_pkey" for table "event_tag" 16708:20170113:130259.175 completed 10% of database upgrade 16708:20170113:130259.177 completed 11% of database upgrade 16708:20170113:130259.186 completed 12% of database upgrade 16708:20170113:130259.221 completed 14% of database upgrade 16708:20170113:130259.309 completed 15% of database upgrade NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "problem_pkey" for table "problem" 16708:20170113:130259.313 completed 16% of database upgrade Maybe someone help! | ||||||||||||||||||||||||
Comment by max kub [ 2017 Jun 20 ] | ||||||||||||||||||||||||
Hello, I have a similar issue. Upgrading from 3.0.9 to 3.2.x running on debian jessie. I stopped the server befor "apt-get install --only-upgrade zabbix-server-mysql zabbix-frontend-php zabbix-agent". Btw. first I had this error too. But I could avoid it, by not restart the server while upgrading database: 18671:20161122:124324.223 [Z3005] query failed: [1091] Can't DROP 'id'; check that column/key exists [alter table history_text drop column id]
18671:20161122:124324.223 database upgrade failed
I also changed to recommended collation 'utf8_bin' with no luck. What is the solution? How to upgrade DB manually? Thanks. zabbix-server.log
3713:20170620:082346.484 IPv6 support: YES
3713:20170620:082346.484 TLS support: YES
3713:20170620:082346.484 ******************************
3713:20170620:082346.484 using configuration file: /etc/zabbix/zabbix_server.conf
3713:20170620:082346.489 current database version (mandatory/optional): 03000000/03000000
3713:20170620:082346.490 required mandatory version: 03020000
3713:20170620:082346.490 starting automatic database upgrade
3713:20170620:082346.861 completed 1% of database upgrade
3713:20170620:082347.473 completed 2% of database upgrade
3713:20170620:082347.724 completed 3% of database upgrade
3713:20170620:082939.847 slow query: 352.122721 sec, "alter table history_text drop column id"
3713:20170620:082939.882 completed 5% of database upgrade
3713:20170620:082940.747 completed 6% of database upgrade
3713:20170620:082941.299 completed 7% of database upgrade
3713:20170620:082941.913 completed 8% of database upgrade
3713:20170620:082942.529 completed 10% of database upgrade
3713:20170620:082942.800 completed 11% of database upgrade
3713:20170620:082942.966 completed 12% of database upgrade
3713:20170620:082943.195 completed 13% of database upgrade
3713:20170620:082943.620 completed 15% of database upgrade
3713:20170620:082943.893 completed 16% of database upgrade
3713:20170620:082944.218 completed 17% of database upgrade
3713:20170620:082944.465 completed 18% of database upgrade
3713:20170620:082944.933 completed 20% of database upgrade
3713:20170620:082945.168 completed 21% of database upgrade
3713:20170620:082945.283 completed 22% of database upgrade
3713:20170620:082945.534 completed 24% of database upgrade
3713:20170620:082945.803 completed 25% of database upgrade
3713:20170620:082946.067 completed 26% of database upgrade
3713:20170620:083153.385 slow query: 3.095132 sec, "insert into problem (eventid,source,object,objectid) values (16685497,3,4,4780613),(14294956,3,4,2632664),(15805118,3,4,3993602),(15867477,3,4,4049926),(15038907,3,4,3299055),(16946137,3,4,5024993),(14922493,3,4,3194783),(15390572,3,4,3620347),(16958406,3,4,5036151),(14171859,3,4,2516020),(14227407,3,4,2569501),(15733479,3,4,3930744),(16394568,3,4,4504454),(16092718,3,4,4248719),(16759622,3,4,4850282),(15071737,3,4,3328635),(14210347,3,4,2553331),(14016084,3,4,2370426),(16551244,3,4,4651256),(16921719,3,4,5002650),(16847391,3,4,4933320),(15177345,3,4,3425227),(15357457,3,4,3589815),(14045158,3,4,2397444),(15538086,3,4,3756376),(14755681,3,4,3043446),(16967651,3,4,5044542),(16430661,3,4,4538659),(16934494,3,4,5014289),(14452590,3,4,2775284),(15618484,3,4,3830922),(15223677,3,4,3468255),(16716887,3,4,4810222),(13938165,3,4,2297530),(
| ||||||||||||||||||||||||
Comment by max kub [ 2017 Jun 20 ] | ||||||||||||||||||||||||
Nevermind. It took 2 hours... "patience.. patience, you must have, young padawan..." | ||||||||||||||||||||||||
Comment by James Murray [ 2017 Jul 17 ] | ||||||||||||||||||||||||
I'm in the middle of a database upgrade. I don't want to stop it, but I am concerned it isn't actually doing anything. According to the logs it has been at 3% coming up on an hour. It is a VM running CentOS7. I found the size earlier, it is around 300GB stored on SAS. Top shows that mysql is chugging around 50% of 12 assigned cores of a UCS blade. Any recommendations? I'm going to give it a few hours, but I'm really hoping to see 4% soon. 1-3 happened very quickly. [root@mci2-zabbix-master zabbix]# date Mon Jul 17 10:53:33 CDT 2017 [root@mci2-zabbix-master zabbix]# tailf /var/log/zabbix/zabbix_server.log 3377:20170717:100433.659 TLS support: YES 3377:20170717:100433.659 ****************************** 3377:20170717:100433.659 using configuration file: /etc/zabbix/zabbix_server.conf 3377:20170717:100433.698 current database version (mandatory/optional): 03000000/03000000 3377:20170717:100433.698 required mandatory version: 03020000 3377:20170717:100433.698 starting automatic database upgrade 3377:20170717:100433.786 completed 1% of database upgrade 3377:20170717:100433.839 completed 2% of database upgrade 3377:20170717:100438.502 slow query: 4.662223 sec, "drop index history_text_2 on history_text" 3377:20170717:100438.504 completed 3% of database upgrade | ||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2017 Jul 17 ] | ||||||||||||||||||||||||
FYI: It is dropping id field of history_text table. | ||||||||||||||||||||||||
Comment by James Murray [ 2017 Jul 17 ] | ||||||||||||||||||||||||
Thanks man. I'm not SQL man, but I just took a look at the process and it is moving. MariaDB [(none)]> SHOW PROCESSLIST; +------+--------+-----------+----------+---------+------+-------------------+-----------------------------------------+----------+ | Id | User | Host | db | Command | Time | State | Info | Progress | +------+--------+-----------+----------+---------+------+-------------------+-----------------------------------------+----------+ | 1348 | zabbix | localhost | zabbixdb | Query | 8192 | copy to tmp table | alter table history_text drop column id | 5.110 | | 7995 | root | localhost | NULL | Query | 0 | NULL | SHOW PROCESSLIST | 0.000 | +------+--------+-----------+----------+---------+------+-------------------+-----------------------------------------+----------+ 2 rows in set (0.00 sec) | ||||||||||||||||||||||||
Comment by Glebs Ivanovskis (Inactive) [ 2017 Jul 17 ] | ||||||||||||||||||||||||
I am no DB expert either, but as far as I know this operation shouldn't take that much time. Do you have custom indexes in Zabbix DB schema? | ||||||||||||||||||||||||
Comment by James Murray [ 2017 Jul 17 ] | ||||||||||||||||||||||||
I don't believe we have anything custom. We are an MSP and we've dumped a lot of proxies into this. I'd imagine we're on the large side of Zabbix instances and it has been collecting for a at least a couple months. | ||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2019 Apr 01 ] | ||||||||||||||||||||||||
Fixed in development branch: | ||||||||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2019 Apr 03 ] | ||||||||||||||||||||||||
Fixed in:
|