Loading...

Type: Problem report
Resolution: Fixed
Priority: Trivial
Fix Version/s: 5.0.19rc1, 5.4.9rc1, 6.0.0beta1, 6.0 (plan)
Affects Version/s: None
Component/s: Templates (T)
Labels:
None

Sprint:
Sprint 80 (Sep 2021), Sprint 81 (Oct 2021), Sprint 82 (Nov 2021), Sprint 83 (Dec 2021)
Story Points:
4

Master tas for PostgreSQL issues:

(1) from https://support.zabbix.com/browse/ZBX-19873
In database discovery there is an item "Checksum failures".
It has 2 preprocessing options, JSONPATH and CHANGE PER SECOND.
It is possible that DB returns value NULL for checksum failures, in this scenario item preprocessing fails and item becomes unsupported.
In addition, because other parameters are changing in the master item, this ends up filling log files with long error messages, because the error text changes every time.

Please add another preprocessing step to validate the data or even replace NULL with 0 to avoid this.

(2) from https://support.zabbix.com/browse/ZBX-19735

Hi!
I have recovered from backup pgsql server (Point in time recovery).

postgres=# select pg_last_xact_replay_timestamp();
 pg_last_xact_replay_timestamp 
-------------------------------
 2021-06-26 22:11:23.259069+03

Item "Replication: lag in seconds":

pgsql.replication.lag.sec["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.DB}"]

And looks like trigger not working correctly for me.
I got errors event like

PostgreSQL: Streaming lag with {#MASTER} is too high (over {$PG.REPL_LAG.MAX.WARN} in 5m)

In my case I change user parameter SQL from

CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()

to

CASE WHEN pg_last_wal_receive_lsn() is NULL or pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()

And it corrects the issue.

Hope you can update this script.
Thank you!

(3) from https://support.zabbix.com/browse/ZBX-19734

In PostgreSQL Agent 2 template documentation:

https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/postgresql_agent2?at=release/5.0

In section

1. Create PostgreSQL user for monitoring (<password> at your discretion):

Missing important GRANT section:

GRANT EXECUTE ON FUNCTION pg_ls_waldir() TO zbx_monitor;

Otherwise in fronted error appear
Cannot fetch data: ERROR: permission denied for function pg_ls_waldir (SQLSTATE 42501).

(4) from https://support.zabbix.com/browse/ZBX-19672

When using the template PostgreSQL by Zabbix agent 2.

There is an error in the database(slave+master) log :

ERROR: column "received_lsn" does not exist at character 36
HINT: Perhaps you meant to reference the column "pg_stat_wal_receiver.received_tli".
STATEMENT: SELECT pg_catalog.pg_wal_lsn_diff (received_lsn, pg_last_wal_replay_lsn())
FROM pg_stat_wal_receiver;

(5) from https://support.zabbix.com/browse/ZBX-18889
Steps to reproduce:

Install postgresql 13 primary and standby instance ,add them with agent2 template
Navigate to items of standby host
Change filter to not supported
You will see Template DB PostgreSQL Agent 2: Replication: Lag in bytes not supported because received_lsn column changed to written_lsn in postgresql 13.
Also there is problem with item Template DB PostgreSQL Agent 2: PostgreSQL: Get bgwriter: Checkpoint: Checkpoint sync time. in json sync_time called checkpoint_sync_time. But it can be channed in template, just updating you..

Result:
See screenshot...

Expected:
Replication lag work with postgres 13

(6)

Bug in template PostgreSQL by agent2 dependent item "WAL: Segments count", Preprocessing: JSONPATH: $.write has to be JSONPATH: $.count

(7)

Archiving metrics

Count of files in archive_status need to archive (pgsql.archive.count_files_to_archive)
Count of files need to archive (pgsql.archive.size_files_to_archive)

The metric is calculated by directly browsing the FS catalog. For this, functions are used that are available by default only to the superuser. You can instead use information from system views and functions.

Connection metrics

Connections sum: Total (pgsql.connections.total)
Connections sum: Total% (pgsql.connections.total_pct)
Connections sum: Waiting (pgsql.connections.waiting)

The calculation does not exclude system processes.

Cache hit (pgsql.cache.hit ["\{$ PG.URI}"])
Calculates the metric for the entire running time of the cluster. Mamonsu got around this by creating a Calculated Item that takes deltas by other metrics (pgsql.dbstat.sum.blks_hit.rate, pgsql.dbstat.sum.blks_read.rate).

flush lag, write lag and replay lag per each sender process (pgsql.replication.process [uri, username, password])
When forming an item, the PID of the WAL transmitter is used, which constantly changes. As a result, over time, a huge bunch of lag metrics will be generated for the host.
There are also a couple of metrics that have raised questions. This is not to say that they are erroneous, but certainly not the most convenient and understandable. Metrics:

Number of standby servers (pgsql.replication.count [uri, username, password])
The meaning of the metric is not entirely correct. It is stated that standby servers will be counted, however, if there are multiple subscriptions to a logical replica, they will all be counted as different servers.

PostgreSQL uptime, in milliseconds (pgsql.uptime [<commonParams>])
It is more convenient to measure in unixtime and display the launch date and time.
More about this source text

depends on

ZBX-20282 postgres template: permission denied

Closed

part of

ZBX-19672 PostgreSQL 13 - template zabbix abent 2.

Closed

ZBX-19873 Postgresql by agent2 template has incorrect preprocessing steps

Closed

ZBX-22261 Postgresql by agent2 template (still) has incorrect preprocessing steps

Closed

ZBX-18889 Postgresql 13 issue

Closed

ZBX-19734 PostgreSQL Agent 2 documentation GRANT missing

Closed

ZBX-19735 pgsql.replication.lag.sql

Closed

(2 part of)

Details

Description

Attachments

Issue Links

Activity

People

Dates