-
Problem report
-
Resolution: Fixed
-
Critical
-
6.0.21, 6.0.28, 6.4.13, 7.0.0beta2
-
Ubuntu 20.04, Postgresql 13.10
or RHEL:
Zabbix 6.0.25 (4 Cpu, 64 GB RAM)
PostgreSQL 13.13 + TimescaleDB 2.12 (4 Cpu, 64G RAM)
-
S24-W16/17, S24-W18/19
-
0.25
After upgrading from Zabbix 4.4 to Zabbix 6.0 and applying the schema updates for Primary Key, we were seeing poor performance from the "Problems" page and widget - it would take upwards of 30 seconds to display the problems. Investigating on the DB side, we were seeing multiple queries constantly running of the format:
SELECT h.itemid, MAX(h.clock) AS clock FROM history h WHERE h.itemid=$itemId AND h.clock>$timestamp GROUP BY h.itemid
After some searching, we found this query seemed to be defined in /usr/share/zabbix/include/classes/api/managers/CHistoryManager.php and it is used when using Postgresql with Primary Key on the history table(s).
As a test, we updated the CHistoryManager file and changed the function call for getLastValuesFromSqlWithPk to instead call getLastValuesFromSql and saw significant improvement in the load time for the "Problems" page/widget (less than 1 second compared to more than 30 seconds previously)
The instance in question is doing roughly 2600 NVPS with many items retaining 90 days of history - so lots of rows in the "history" tables. As a result of the large amount of history, we have implemented partitioning on the database. We are only seeing this issue on a single instance - the one with the highest number of NVPS.
While investigating I came across ZBX-20644 which seems to be a similar (but still different) issue of the "Latest Data" page when using partitions.
Steps to reproduce:
- Enable Primary Key (history_pk_prepare.sql) (and partitioning?) on history tables
- Import lots of history data, or wait for lots of data to be collected
- In Zabbix web front-end click on "Monitoring -> Problems"
- Wait for page to load
Result:
"Problems" page takes 30+ seconds to load
Expected:
"Problems" page loads quickly