[ZBXNEXT-3234] PGRES_FATAL_ERROR in zabbix log after restarting Postgresql service Created: 2016 Mar 01 Updated: 2017 May 31 |
|
Status: | Reopened |
Project: | ZABBIX FEATURE REQUESTS |
Component/s: | Proxy (P), Server (S) |
Affects Version/s: | None |
Fix Version/s: | None |
Type: | Change Request | Priority: | Major |
Reporter: | Sergey Lisikh | Assignee: | Unassigned |
Resolution: | Unresolved | Votes: | 0 |
Labels: | database, postgresql | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Centos 6.5/7.1 x64, PostgreSQL 9.5, Zabbix 2.4.7 |
Attachments: | zabbix_server_with_localhost_PSQL_DB.log zabbix_server_with_remote_PSQL_DB.log |
Description |
After restarting postgresql I get "[Z3005] query failed: [0] PGRES_FATAL_ERROR:FATAL:" errors in zabbix log. They stabely appear from time to time, until I restart zabbix-server. To reproduce: |
Comments |
Comment by Sergey Lisikh [ 2016 Mar 02 ] |
This error also appear on debug level=2 with interval from 1 to 10-15 minutes. |
Comment by Aleksandrs Saveljevs [ 2016 Mar 02 ] |
Here is a longer part of the log on my system: 14399:20160302:111456.497 Starting Zabbix Server. Zabbix 2.4.8rc1 (revision {ZABBIX_REVISION}). 14399:20160302:111456.497 ****** Enabled features ****** 14399:20160302:111456.497 SNMP monitoring: YES 14399:20160302:111456.497 IPMI monitoring: YES 14399:20160302:111456.497 WEB monitoring: YES 14399:20160302:111456.497 VMware monitoring: YES 14399:20160302:111456.497 Jabber notifications: YES 14399:20160302:111456.497 Ez Texting notifications: YES 14399:20160302:111456.497 ODBC: YES 14399:20160302:111456.497 SSH2 support: YES 14399:20160302:111456.497 IPv6 support: YES 14399:20160302:111456.497 ****************************** 14399:20160302:111456.497 using configuration file: /home/zabbix/zabbix-bin/etc/zabbix_server.conf 14399:20160302:111456.508 current database version (mandatory/optional): 02040000/02040000 14399:20160302:111456.508 required mandatory version: 02040000 14399:20160302:111456.517 server #0 started [main process] 14403:20160302:111456.517 server #1 started [configuration syncer #1] 14404:20160302:111456.517 server #2 started [db watchdog #1] 14405:20160302:111456.517 server #3 started [poller #1] 14406:20160302:111456.517 server #4 started [unreachable poller #1] 14408:20160302:111456.518 server #5 started [trapper #1] 14409:20160302:111456.518 server #6 started [trapper #2] 14411:20160302:111456.518 server #7 started [trapper #3] 14413:20160302:111456.518 server #8 started [trapper #4] 14415:20160302:111456.519 server #9 started [trapper #5] 14421:20160302:111456.520 server #13 started [timer #1] 14422:20160302:111456.520 server #14 started [http poller #1] 14420:20160302:111456.520 server #12 started [housekeeper #1] 14418:20160302:111456.520 server #11 started [alerter #1] 14426:20160302:111456.521 server #16 started [history syncer #1] 14430:20160302:111456.522 server #18 started [ipmi poller #1] 14433:20160302:111456.522 server #19 started [java poller #1] 14417:20160302:111456.522 server #10 started [icmp pinger #1] 14434:20160302:111456.522 server #20 started [proxy poller #1] 14429:20160302:111456.523 server #17 started [escalator #1] 14435:20160302:111456.524 server #21 started [self-monitoring #1] 14423:20160302:111456.539 server #15 started [discoverer #1] 14429:20160302:111520.531 [Z3005] query failed: [0] result is NULL [select escalationid,actionid,triggerid,eventid,r_eventid,nextcheck,esc_step,status,itemid from escalations order by actionid,triggerid,itemid,escalationid] 14429:20160302:111520.531 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select escalationid,actionid,triggerid,eventid,r_eventid,nextcheck,esc_step,status,itemid from escalations order by actionid,triggerid,itemid,escalationid] 14429:20160302:111520.531 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? 14429:20160302:111520.531 database is down: reconnecting in 10 seconds 14422:20160302:111521.533 [Z3005] query failed: [0] result is NULL [select h.hostid,h.host,h.name,t.httptestid,t.name,t.variables,t.headers,t.agent,t.authentication,t.http_user,t.http_password,t.http_proxy,t.retries,t.ssl_cert_file,t.ssl_key_file,t.ssl_key_password,t.verify_peer,t.verify_host from httptest t,hosts h where t.hostid=h.hostid and t.nextcheck<=1456910121 and mod(t.httptestid,1)=0 and t.status=0 and h.proxy_hostid is null and h.status=0 and (h.maintenance_status=0 or h.maintenance_type=0)] 14422:20160302:111521.533 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select h.hostid,h.host,h.name,t.httptestid,t.name,t.variables,t.headers,t.agent,t.authentication,t.http_user,t.http_password,t.http_proxy,t.retries,t.ssl_cert_file,t.ssl_key_file,t.ssl_key_password,t.verify_peer,t.verify_host from httptest t,hosts h where t.hostid=h.hostid and t.nextcheck<=1456910121 and mod(t.httptestid,1)=0 and t.status=0 and h.proxy_hostid is null and h.status=0 and (h.maintenance_status=0 or h.maintenance_type=0)] 14422:20160302:111521.533 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? 14422:20160302:111521.533 database is down: reconnecting in 10 seconds 14426:20160302:111521.556 [Z3005] query failed: [0] result is NULL [begin;] 14426:20160302:111521.556 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? 14426:20160302:111521.556 database is down: reconnecting in 10 seconds 14403:20160302:111526.520 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? 14403:20160302:111526.520 database is down: reconnecting in 10 seconds 14418:20160302:111526.528 [Z3005] query failed: [0] result is NULL [select a.alertid,a.mediatypeid,a.sendto,a.subject,a.message,a.status,mt.mediatypeid,mt.type,mt.description,mt.smtp_server,mt.smtp_helo,mt.smtp_email,mt.exec_path,mt.gsm_modem,mt.username,mt.passwd,a.retries from alerts a,media_type mt where a.mediatypeid=mt.mediatypeid and a.status=0 and a.alerttype=0 order by a.alertid] 14418:20160302:111526.528 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select a.alertid,a.mediatypeid,a.sendto,a.subject,a.message,a.status,mt.mediatypeid,mt.type,mt.description,mt.smtp_server,mt.smtp_helo,mt.smtp_email,mt.exec_path,mt.gsm_modem,mt.username,mt.passwd,a.retries from alerts a,media_type mt where a.mediatypeid=mt.mediatypeid and a.status=0 and a.alerttype=0 order by a.alertid] 14418:20160302:111526.528 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? 14418:20160302:111526.528 database is down: reconnecting in 10 seconds 14429:20160302:111530.531 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? 14429:20160302:111530.531 database is down: reconnecting in 10 seconds 14422:20160302:111531.533 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? 14422:20160302:111531.534 database is down: reconnecting in 10 seconds 14426:20160302:111531.556 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? 14426:20160302:111531.556 database is down: reconnecting in 10 seconds 14403:20160302:111536.520 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? 14403:20160302:111536.520 database is down: reconnecting in 10 seconds 14418:20160302:111536.528 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? 14418:20160302:111536.529 database is down: reconnecting in 10 seconds 14429:20160302:111540.532 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? 14429:20160302:111540.532 database is down: reconnecting in 10 seconds 14422:20160302:111541.534 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? 14422:20160302:111541.534 database is down: reconnecting in 10 seconds 14426:20160302:111541.556 [Z3001] connection to database 'zabbix_2_4' failed: [0] could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"? 14426:20160302:111541.556 database is down: reconnecting in 10 seconds 14403:20160302:111546.527 database connection re-established 14418:20160302:111546.532 database connection re-established 14429:20160302:111550.535 database connection re-established 14422:20160302:111551.538 database connection re-established 14426:20160302:111551.560 database connection re-established 14423:20160302:111556.546 [Z3005] query failed: [0] result is NULL [select distinct r.druleid,r.iprange,r.name,c.dcheckid,r.proxy_hostid from drules r left join dchecks c on c.druleid=r.druleid and c.uniq=1 where r.status=0 and (r.nextcheck<=1456910156 or r.nextcheck>1456910156+r.delay) and mod(r.druleid,1)=0] 14423:20160302:111556.546 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select distinct r.druleid,r.iprange,r.name,c.dcheckid,r.proxy_hostid from drules r left join dchecks c on c.druleid=r.druleid and c.uniq=1 where r.status=0 and (r.nextcheck<=1456910156 or r.nextcheck>1456910156+r.delay) and mod(r.druleid,1)=0] 14411:20160302:111558.141 [Z3005] query failed: [0] result is NULL [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null] 14411:20160302:111558.141 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null] 14405:20160302:111558.780 [Z3005] query failed: [0] result is NULL [select hostid,key_,state,evaltype,formula,error,lifetime from items where itemid=23278] 14405:20160302:111558.780 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select hostid,key_,state,evaltype,formula,error,lifetime from items where itemid=23278] 14421:20160302:111600.534 [Z3005] query failed: [0] result is NULL [begin;] 14413:20160302:111658.151 [Z3005] query failed: [0] result is NULL [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null] 14413:20160302:111658.151 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null] 14415:20160302:111758.162 [Z3005] query failed: [0] result is NULL [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null] 14415:20160302:111758.162 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null] 14408:20160302:111858.174 [Z3005] query failed: [0] result is NULL [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null] 14408:20160302:111858.174 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null] 14409:20160302:112358.213 [Z3005] query failed: [0] result is NULL [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null] 14409:20160302:112358.213 [Z3005] query failed: [0] PGRES_FATAL_ERROR: [select hostid,status from hosts where host='Zabbix server' and status in (0,1) and flags<>2 and proxy_hostid is null] 14399:20160302:114444.374 Got signal [signal:15(SIGTERM),sender_pid:15844,sender_uid:1000,reason:0]. Exiting ... 14399:20160302:114446.378 syncing history data... 14399:20160302:114446.380 syncing history data done 14399:20160302:114446.380 syncing trends data... 14399:20160302:114446.392 syncing trends data done 14399:20160302:114446.392 Zabbix Server stopped. Zabbix 2.4.8rc1 (revision {ZABBIX_REVISION}). It can be seen that there are failed queries after database connection was re-established. However, these failed queries probably come from processes which did not notice that the database was down before and were using the old connection descriptor for their query. This looks scary, but does not seem to cause a problem: these failed queries will be repeated immediately after reconnection. |
Comment by Sergey Lisikh [ 2016 Mar 02 ] |
Thanks for reply, Aleksandrs! |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Apr 04 ] |
From the log attached by asaveljevs we can see that the fact that database was down and came back was noticed by alerter, escalator, http poller (these are processes which use direct queries instead of caches), configuration syncer and history syncer processes (which communicate with database a lot by design). Processes that do not notice such short database downtime are trappers, pollers, discoverer, timer, ... (which use mostly caches or spend a lot of time simply sleeping) Since Zabbix processes do not share information about database status this situation is more or less expected. |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Apr 07 ] |
After some discussion we decided to leave things as they are. Some of Zabbix processes communicate to database not very frequently. Between rare queries connection handle may become invalid. Query then fails and Zabbix process tries to reconnect immediately. If the reconnection is successful on the first attempt, no messages about database down/up are logged, just the message about a failed query. Rationale:
As suggested by richlv moved to feature requests to track user interest in this issue. |