[ZBX-19897] Bug in template PostgreSQL by agent2 dependent item Created: 2021 Sep 01  Updated: 2024 Apr 10  Resolved: 2021 Dec 13

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Templates (T)
Affects Version/s: None
Fix Version/s: 5.0.19rc1, 5.4.9rc1, 6.0.0beta1, 6.0 (plan)

Type: Problem report Priority: Trivial
Reporter: Aleksandrs Larionovs (Inactive) Assignee: Vyacheslav Khaliev
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Causes
Sub-task
depends on ZBX-20282 postgres template: permission denied Closed
part of ZBX-19672 PostgreSQL 13 - template zabbix abent... Closed
part of ZBX-19873 Postgresql by agent2 template has inc... Closed
part of ZBX-22261 Postgresql by agent2 template (still)... Closed
part of ZBX-18889 Postgresql 13 issue Closed
part of ZBX-19734 PostgreSQL Agent 2 documentation GRAN... Closed
part of ZBX-19735 pgsql.replication.lag.sql Closed
Team: Team INT
Sprint: Sprint 80 (Sep 2021), Sprint 81 (Oct 2021), Sprint 82 (Nov 2021), Sprint 83 (Dec 2021)
Story Points: 4

 Description   

Master tas for PostgreSQL issues:

(1) from https://support.zabbix.com/browse/ZBX-19873
In database discovery there is an item "Checksum failures".
It has 2 preprocessing options, JSONPATH and CHANGE PER SECOND.
It is possible that DB returns value NULL for checksum failures, in this scenario item preprocessing fails and item becomes unsupported.
In addition, because other parameters are changing in the master item, this ends up filling log files with long error messages, because the error text changes every time.

Please add another preprocessing step to validate the data or even replace NULL with 0 to avoid this.

(2) from https://support.zabbix.com/browse/ZBX-19735

Hi!
I have recovered from backup pgsql server (Point in time recovery).

postgres=# select pg_last_xact_replay_timestamp();
 pg_last_xact_replay_timestamp 
-------------------------------
 2021-06-26 22:11:23.259069+03

Item "Replication: lag in seconds":

pgsql.replication.lag.sec["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.DB}"]

And looks like trigger not working correctly for me.
I got errors event like

PostgreSQL: Streaming lag with {#MASTER} is too high (over {$PG.REPL_LAG.MAX.WARN} in 5m)

In my case I change user parameter SQL from

CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()

to

CASE WHEN pg_last_wal_receive_lsn() is NULL or pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()

And it corrects the issue.

Hope you can update this script.
Thank you!

(3) from https://support.zabbix.com/browse/ZBX-19734 

In PostgreSQL Agent 2 template documentation:

https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/db/postgresql_agent2?at=release/5.0

In section

1. Create PostgreSQL user for monitoring (<password> at your discretion):

Missing important GRANT section:

GRANT EXECUTE ON FUNCTION pg_ls_waldir() TO zbx_monitor;

Otherwise in fronted error appear
Cannot fetch data: ERROR: permission denied for function pg_ls_waldir (SQLSTATE 42501).

(4) from https://support.zabbix.com/browse/ZBX-19672

When using the template PostgreSQL by Zabbix agent 2.

There is an error in the database(slave+master) log :

ERROR: column "received_lsn" does not exist at character 36
HINT: Perhaps you meant to reference the column "pg_stat_wal_receiver.received_tli".
STATEMENT: SELECT pg_catalog.pg_wal_lsn_diff (received_lsn, pg_last_wal_replay_lsn())
FROM pg_stat_wal_receiver;

(5) from https://support.zabbix.com/browse/ZBX-18889
Steps to reproduce:

  1. Install postgresql 13 primary and standby instance ,add them with agent2 template
  2. Navigate to items of standby host
  3. Change filter to not supported
  4. You will see Template DB PostgreSQL Agent 2Replication: Lag in bytes not supported because received_lsn column changed to written_lsn in postgresql 13.
  5. Also there is problem with item  Template DB PostgreSQL Agent 2: PostgreSQL: Get bgwriter: Checkpoint: Checkpoint sync time. in json sync_time called checkpoint_sync_time. But it can be channed in template, just updating you..

Result:
See screenshot...

Expected:
Replication lag work with postgres 13

(6)

Bug in template PostgreSQL by agent2 dependent item "WAL: Segments count", Preprocessing: JSONPATH: $.write has to be JSONPATH: $.count

 

 

(7)

Archiving metrics

Count of files in archive_status need to archive (pgsql.archive.count_files_to_archive)
Count of files need to archive (pgsql.archive.size_files_to_archive)

The metric is calculated by directly browsing the FS catalog. For this, functions are used that are available by default only to the superuser. You can instead use information from system views and functions.

Connection metrics

Connections sum: Total (pgsql.connections.total)
Connections sum: Total% (pgsql.connections.total_pct)
Connections sum: Waiting (pgsql.connections.waiting)

The calculation does not exclude system processes.

Cache hit (pgsql.cache.hit ["\{$ PG.URI}"])
Calculates the metric for the entire running time of the cluster. Mamonsu got around this by creating a Calculated Item that takes deltas by other metrics (pgsql.dbstat.sum.blks_hit.rate, pgsql.dbstat.sum.blks_read.rate).

flush lag, write lag and replay lag per each sender process (pgsql.replication.process [uri, username, password])
When forming an item, the PID of the WAL transmitter is used, which constantly changes. As a result, over time, a huge bunch of lag metrics will be generated for the host.
There are also a couple of metrics that have raised questions. This is not to say that they are erroneous, but certainly not the most convenient and understandable. Metrics:

Number of standby servers (pgsql.replication.count [uri, username, password])
The meaning of the metric is not entirely correct. It is stated that standby servers will be counted, however, if there are multiple subscriptions to a logical replica, they will all be counted as different servers.

PostgreSQL uptime, in milliseconds (pgsql.uptime [<commonParams>])
It is more convenient to measure in unixtime and display the launch date and time.
More about this source text



 Comments   
Comment by Vyacheslav Khaliev [ 2021 Dec 02 ]

Available in:

Generated at Mon May 19 06:44:29 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.