[ZBXNEXT-3234] PGRES_FATAL_ERROR in zabbix log after restarting Postgresql service Created: 2016 Mar 01  Updated: 2017 May 31

Status: Reopened
Project: ZABBIX FEATURE REQUESTS
Component/s: Proxy (P), Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Change Request Priority: Major
Reporter: Sergey Lisikh Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: database, postgresql
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Centos 6.5/7.1 x64, PostgreSQL 9.5, Zabbix 2.4.7


Attachments: File zabbix_server_with_localhost_PSQL_DB.log     File zabbix_server_with_remote_PSQL_DB.log    

 Description   

After restarting postgresql I get "[Z3005] query failed: [0] PGRES_FATAL_ERROR:FATAL:" errors in zabbix log. They stabely appear from time to time, until I restart zabbix-server.
Errors differ slightly on the basis of "DBHost" configuration (remote or localhost). See logs for details.
Don't know, is this critical or not?

To reproduce:
1) set DEBUG to 3 in zabbix_server.conf
2) run DB service, run zabbix server, check that they work
3) stop DB service, wait 10-20 sec, start it again
4) see zabbix logs, PGRES_FATAL_ERROR will appear periodically (think, it depends on the count of hosts and items)



 Comments   
Comment by Sergey Lisikh [ 2016 Mar 02 ]

This error also appear on debug level=2 with interval from 1 to 10-15 minutes.

Comment by Aleksandrs Saveljevs [ 2016 Mar 02 ]

Here is a longer part of the log on my system:

 14399:20160302:111456.497 Starting Zabbix Server. Zabbix 2.4.8rc1 (revision {ZABBIX_REVISION}).
 14399:20160302:111456.497 ****** Enabled features ******
 14399:20160302:111456.497 SNMP monitoring:           YES
 14399:20160302:111456.497 IPMI monitoring:           YES
 14399:20160302:111456.497 WEB monitoring:            YES
 14399:20160302:111456.497 VMware monitoring:         YES
 14399:20160302:111456.497 Jabber notifications:      YES
 14399:20160302:111456.497 Ez Texting notifications:  YES
 14399:20160302:111456.497 ODBC:                      YES
 14399:20160302:111456.497 SSH2 support:              YES
 14399:20160302:111456.497 IPv6 support:              YES
 14399:20160302:111456.497 ******************************
 14399:20160302:111456.497 using configuration file: /home/zabbix/zabbix-bin/etc/zabbix_server.conf
 14399:20160302:111456.508 current database version (mandatory/optional): 02040000/02040000
 14399:20160302:111456.508 required mandatory version: 02040000
 14399:20160302:111456.517 server #0 started [main process]
 14403:20160302:111456.517 server #1 started [configuration syncer #1]
 14404:20160302:111456.517 server #2 started [db watchdog #1]
 14405:20160302:111456.517 server #3 started [poller #1]
 14406:20160302:111456.517 server #4 started [unreachable poller #1]
 14408:20160302:111456.518 server #5 started [trapper #1]
 14409:20160302:111456.518 server #6 started [trapper #2]
 14411:20160302:111456.518 server #7 started [trapper #3]
 14413:20160302:111456.518 server #8 started [trapper #4]
 14415:20160302:111456.519 server #9 started [trapper #5]
 14421:20160302:111456.520 server #13 started [timer #1]
 14422:20160302:111456.520 server #14 started [http poller #1]
 14420:20160302:111456.520 server #12 started [housekeeper #1]
 14418:20160302:111456.520 server #11 started [alerter #1]
 14426:20160302:111456.521 server #16 started [history syncer #1]
 14430:20160302:111456.522 server #18 started [ipmi poller #1]
 14433:20160302:111456.522 server #19 started [java poller #1]
 14417:20160302:111456.522 server #10 started [icmp pinger #1]
 14434:20160302:111456.522 server #20 started [proxy poller #1]
 14429:20160302:111456.523 server #17 started [escalator #1]
 14435:20160302:111456.524 server #21 started [self-monitoring #1]
 14423:20160302:111456.539 server #15 started [discoverer #1]
 14429:20160302:111520.531 [Z3005] query failed: [0] result is NULL [select escalationid,actionid,triggerid,eventid,r_eventid,nextcheck,esc_step,status,itemid from escalations order by actionid,triggerid,itemid,escalationid]
 14429:20160302:111520.531 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select escalationid,actionid,triggerid,eventid,r_eventid,nextcheck,esc_step,status,itemid from escalations order by actionid,triggerid,itemid,escalationid]
 14429:20160302:111520.531 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

 14429:20160302:111520.531 database is down: reconnecting in 10 seconds
 14422:20160302:111521.533 [Z3005] query failed: [0] result is NULL [select h.hostid,h.host,h.name,t.httptestid,t.name,t.variables,t.headers,t.agent,t.authentication,t.http_user,t.http_password,t.http_proxy,t.retries,t.ssl_cert_file,t.ssl_key_file,t.ssl_key_password,t.verify_peer,t.verify_host from httptest t,hosts h where t.hostid=h.hostid and t.nextcheck<=1456910121 and mod(t.httptestid,1)=0 and t.status=0 and h.proxy_hostid is null and h.status=0 and (h.maintenance_status=0 or h.maintenance_type=0)]
 14422:20160302:111521.533 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select h.hostid,h.host,h.name,t.httptestid,t.name,t.variables,t.headers,t.agent,t.authentication,t.http_user,t.http_password,t.http_proxy,t.retries,t.ssl_cert_file,t.ssl_key_file,t.ssl_key_password,t.verify_peer,t.verify_host from httptest t,hosts h where t.hostid=h.hostid and t.nextcheck<=1456910121 and mod(t.httptestid,1)=0 and t.status=0 and h.proxy_hostid is null and h.status=0 and (h.maintenance_status=0 or h.maintenance_type=0)]
 14422:20160302:111521.533 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

 14422:20160302:111521.533 database is down: reconnecting in 10 seconds
 14426:20160302:111521.556 [Z3005] query failed: [0] result is NULL [begin;]
 14426:20160302:111521.556 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

 14426:20160302:111521.556 database is down: reconnecting in 10 seconds
 14403:20160302:111526.520 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

 14403:20160302:111526.520 database is down: reconnecting in 10 seconds
 14418:20160302:111526.528 [Z3005] query failed: [0] result is NULL [select a.alertid,a.mediatypeid,a.sendto,a.subject,a.message,a.status,mt.mediatypeid,mt.type,mt.description,mt.smtp_server,mt.smtp_helo,mt.smtp_email,mt.exec_path,mt.gsm_modem,mt.username,mt.passwd,a.retries from alerts a,media_type mt where a.mediatypeid=mt.mediatypeid and a.status=0 and a.alerttype=0 order by a.alertid]
 14418:20160302:111526.528 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select a.alertid,a.mediatypeid,a.sendto,a.subject,a.message,a.status,mt.mediatypeid,mt.type,mt.description,mt.smtp_server,mt.smtp_helo,mt.smtp_email,mt.exec_path,mt.gsm_modem,mt.username,mt.passwd,a.retries from alerts a,media_type mt where a.mediatypeid=mt.mediatypeid and a.status=0 and a.alerttype=0 order by a.alertid]
 14418:20160302:111526.528 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

 14418:20160302:111526.528 database is down: reconnecting in 10 seconds
 14429:20160302:111530.531 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

 14429:20160302:111530.531 database is down: reconnecting in 10 seconds
 14422:20160302:111531.533 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

 14422:20160302:111531.534 database is down: reconnecting in 10 seconds
 14426:20160302:111531.556 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

 14426:20160302:111531.556 database is down: reconnecting in 10 seconds
 14403:20160302:111536.520 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

 14403:20160302:111536.520 database is down: reconnecting in 10 seconds
 14418:20160302:111536.528 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

 14418:20160302:111536.529 database is down: reconnecting in 10 seconds
 14429:20160302:111540.532 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

 14429:20160302:111540.532 database is down: reconnecting in 10 seconds
 14422:20160302:111541.534 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

 14422:20160302:111541.534 database is down: reconnecting in 10 seconds
 14426:20160302:111541.556 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

 14426:20160302:111541.556 database is down: reconnecting in 10 seconds
 14403:20160302:111546.527 database connection re-established
 14418:20160302:111546.532 database connection re-established
 14429:20160302:111550.535 database connection re-established
 14422:20160302:111551.538 database connection re-established
 14426:20160302:111551.560 database connection re-established
 14423:20160302:111556.546 [Z3005] query failed: [0] result is NULL [select distinct r.druleid,r.iprange,r.name,c.dcheckid,r.proxy_hostid from drules r left join dchecks c on c.druleid=r.druleid and c.uniq=1 where r.status=0 and (r.nextcheck<=1456910156 or r.nextcheck>1456910156+r.delay) and mod(r.druleid,1)=0]
 14423:20160302:111556.546 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select distinct r.druleid,r.iprange,r.name,c.dcheckid,r.proxy_hostid from drules r left join dchecks c on c.druleid=r.druleid and c.uniq=1 where r.status=0 and (r.nextcheck<=1456910156 or r.nextcheck>1456910156+r.delay) and mod(r.druleid,1)=0]
 14411:20160302:111558.141 [Z3005] query failed: [0] result is NULL [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null]
 14411:20160302:111558.141 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null]
 14405:20160302:111558.780 [Z3005] query failed: [0] result is NULL [select hostid,key_,state,evaltype,formula,error,lifetime from items where itemid=23278]
 14405:20160302:111558.780 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select hostid,key_,state,evaltype,formula,error,lifetime from items where itemid=23278]
 14421:20160302:111600.534 [Z3005] query failed: [0] result is NULL [begin;]
 14413:20160302:111658.151 [Z3005] query failed: [0] result is NULL [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null]
 14413:20160302:111658.151 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null]
 14415:20160302:111758.162 [Z3005] query failed: [0] result is NULL [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null]
 14415:20160302:111758.162 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null]
 14408:20160302:111858.174 [Z3005] query failed: [0] result is NULL [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null]
 14408:20160302:111858.174 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null]
 14409:20160302:112358.213 [Z3005] query failed: [0] result is NULL [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null]
 14409:20160302:112358.213 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null]
 14399:20160302:114444.374 Got signal [signal:15(SIGTERM),sender_pid:15844,sender_uid:1000,reason:0]. Exiting ...
 14399:20160302:114446.378 syncing history data...
 14399:20160302:114446.380 syncing history data done
 14399:20160302:114446.380 syncing trends data...
 14399:20160302:114446.392 syncing trends data done
 14399:20160302:114446.392 Zabbix Server stopped. Zabbix 2.4.8rc1 (revision {ZABBIX_REVISION}).

It can be seen that there are failed queries after database connection was re-established. However, these failed queries probably come from processes which did not notice that the database was down before and were using the old connection descriptor for their query. This looks scary, but does not seem to cause a problem: these failed queries will be repeated immediately after reconnection.

Comment by Sergey Lisikh [ 2016 Mar 02 ]

Thanks for reply, Aleksandrs!
I've calmed down

Comment by Glebs Ivanovskis (Inactive) [ 2016 Apr 04 ]

From the log attached by asaveljevs we can see that the fact that database was down and came back was noticed by alerter, escalator, http poller (these are processes which use direct queries instead of caches), configuration syncer and history syncer processes (which communicate with database a lot by design). Processes that do not notice such short database downtime are trappers, pollers, discoverer, timer, ... (which use mostly caches or spend a lot of time simply sleeping) Since Zabbix processes do not share information about database status this situation is more or less expected.

Comment by Glebs Ivanovskis (Inactive) [ 2016 Apr 07 ]

After some discussion we decided to leave things as they are. Some of Zabbix processes communicate to database not very frequently. Between rare queries connection handle may become invalid. Query then fails and Zabbix process tries to reconnect immediately. If the reconnection is successful on the first attempt, no messages about database down/up are logged, just the message about a failed query. Rationale:

  • We don't want to add "database is down", "database is up" messages on every reconnect, because this would litter logs and frighten users.
  • We don't want to add "successfully reconnected" message only, because it would be like ")" without "(".
  • We don't want to get rid of failed query messages, because we believe it's not good to hide such things.

As suggested by richlv moved to feature requests to track user interest in this issue.

Generated at Thu Mar 28 13:01:59 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.