[ZBX-23064] Problem with restart server - syncing trend data Created: 2023 Jul 05 Updated: 2025 Mar 04 Resolved: 2023 Oct 13 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | None |
Fix Version/s: | 5.0.39rc1, 6.0.23rc1, 6.4.8rc1, 7.0.0alpha7, 7.0 (plan) |
Type: | Problem report | Priority: | Trivial |
Reporter: | Albert Przybylski | Assignee: | Vladislavs Sokurenko |
Resolution: | Fixed | Votes: | 12 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
DB: (PostgreSQL) 14.5 (Ubuntu 14.5-2.pgdg20.04+2) + timescaledb 2.7.1 Server: HA Zabbix two nodes cluster - 6.4.0 8 CPU, 94 GB RAM, zabbix-server 6.4.0 Number of hosts (enabled) - 28072 Everything is monitored by 14 active zabbix proxies. |
Attachments: |
![]() ![]() ![]() ![]() ![]() ![]() |
||||||||||||||||||||||||||||
Issue Links: |
|
||||||||||||||||||||||||||||
Team: | |||||||||||||||||||||||||||||
Sprint: | Sprint 104 (Sep 2023), Sprint 105 (Oct 2023) | ||||||||||||||||||||||||||||
Story Points: | 0.5 |
Description |
Steps to reproduce: When we have to switch node from zabbix01 to zabbix02, we are doing restart or stop server on active node: systemctl restart (or stop) zabbix-server After that We had a lot of logs something like: 1069915:20230705:094022.360 syncing trend data... 1069915:20230705:094351.568 slow query: 202.510611 sec, "select distinct itemid from trends where clock>=1688540400 and (itemid in (1000xITEMID))" 1069915:20230705:095751.112 slow query: 785.832866 sec, "select distinct itemid from trends_uint where clock>=1688540400 and (itemid in (1000xITEMID))" 1069915:20230705:100035.095 slow query: 41.348138 sec, "select distinct itemid from trends_uint where clock>=1688540400 and (itemid in (1000xITEMID))" full queries from restart in attachment. We didn't have any other logs, especially - no more slow queries ("INSERT INTO trends" was running less that 1-2s)
Result: From another restart we checked this SELECT (shortened for readability):
496064:20230705:113740.118 slow query: 237.515113 sec, "select distinct itemid from trends where clock>=1688547600 and (itemid in (1000xITEMID))"
Timestamp: 1688547600 -> 11:00:00 (GMT +2) So we check the same SELECT on database (we know we have to check this before 12:00:00 [GMT +2]): EXPLAIN ANALIYZE select distinct itemid from trends where clock>=1688547600 and (itemid in (1000xITEMID)); Execution time: 11:38:00 (GMT +2): Unique (cost=2.95..2.95 rows=1 width=8) (actual time=28570.089..28570.096 rows=0 loops=1) -> Custom Scan (SkipScan) on _hyper_6_1328_chunk (cost=2.95..2.95 rows=1 width=8) (actual time=28570.088..28570.094 rows=0 loops=1) -> Index Only Scan using "1328_989_trends_pkey" on _hyper_6_1328_chunk (cost=2.95..4737681.97 rows=1 width=8) (actual time=28570.063..28570.082 rows=0 loops=1) Index Cond: ((itemid > NULL::bigint) AND (clock >= 1688547600)) Filter: ((itemid = ANY ('{1000xITEMID}'::bigint[]))) Heap Fetches: 197123 Planning Time: 4.849 ms Execution Time: 28570.172 ms (8 rows) Execution time: 11:52:00 (GMT +2): Unique (cost=2.95..2.95 rows=1 width=8) (actual time=38709.553..38709.556 rows=0 loops=1) -> Custom Scan (SkipScan) on _hyper_6_1328_chunk (cost=2.95..2.95 rows=1 width=8) (actual time=38709.551..38709.554 rows=0 loops=1) -> Index Only Scan using "1328_989_trends_pkey" on _hyper_6_1328_chunk (cost=2.95..4737681.97 rows=1 width=8) (actual time=38709.462..38709.547 rows=0 loops=1) Index Cond: ((itemid > NULL::bigint) AND (clock >= 1688547600)) Filter: ((itemid = ANY ('{1000xITEMID}'::bigint[]))) Heap Fetches: 197123 Planning Time: 14.917 ms Execution Time: 38709.745 ms (8 rows) We know that someting must be cached, but still - it takes quite a long time. Longer than expected. Expected: |
Comments |
Comment by dimir [ 2023 Jul 06 ] |
Did you check the health of your database? |
Comment by Albert Przybylski [ 2023 Jul 06 ] |
May I ask for clarification? |
Comment by Albert Przybylski [ 2023 Jul 17 ] |
After analyzing this problem on our side, we have the following conclusions:
trends and latest chunk: zabbix=# select count(*) from trends; count ------------ 2986117705 (1 row) zabbix=# select count(*) from _timescaledb_internal._hyper_6_1328_chunk; count ----------- 718816405 (1 row) trends_uint and latest chunk: zabbix=# select count(*) from trends_uint; count ------------ 9277173205 (1 row) zabbix=# select count(*) from _timescaledb_internal._hyper_7_1327_chunk; count ------------ 1633955881 (1 row) trend settings - 365 days, data from only 8 months.
EXPLAIN analyze select distinct itemid from trends where clock>=1689588000 and (itemid in (14758658,14758660,14758666,14758671,14758678,14758680) or itemid in (14758686,14758691,14758698,14758700,14758706,14758711,14758718,14758720,14758726,14758731,14758732,14758758,14758760,14758766,14758768,14758771,14758772,14758778,14758780,14758786,14758788,14758791,14758792,14758798,14758800,14758806,14758808,14758811,14758812,14758818,14758820,14758826,14758828,14758829,14758831,14758832,14759218,14759220,14759226,14759231,14759258,14759260,14759266,14759271,14759278,14759280,14759286,14759291,14759294,14759299)); Unique (cost=0.57..0.58 rows=1 width=8) (actual time=200484.526..200484.536 rows=0 loops=1) -> Custom Scan (SkipScan) on _hyper_6_1328_chunk (cost=0.57..0.57 rows=1 width=8) (actual time=200484.524..200484.532 rows=0 loops=1) -> Index Only Scan using "1328_989_trends_pkey" on _hyper_6_1328_chunk (cost=0.57..11469897.98 rows=1 width=8) (actual time=200484.480..200484.512 rows=0 loops=1) Index Cond: ((itemid > NULL::bigint) AND (clock >= 1689588000)) Filter: ((itemid = ANY ('{14758658,14758660,14758666,14758671,14758678,14758680}'::bigint[])) OR (itemid = ANY ('{14758686,14758691,14758698,14758700,14758706,14758711,14758718,14758720,14758726,14758731,14758732,14758758,14758760,14758766,14758768,14758771,14758772,14758778,14758780,14758786,14758788,14758791,14758792,14758798,14758800,14758806,14758808,14758811,14758812,14758818,14758820,14758826,14758828,14758829,14758831,14758832,14759218,14759220,14759226,14759231,14759258,14759260,14759266,14759271,14759278,14759280,14759286,14759291,14759294,14759299}'::bigint[]))) Heap Fetches: 0 Planning Time: 1.283 ms Execution Time: 200484.667 ms (8 rows) EXPLAIN analyze select itemid from trends where clock>=1689588000 and (itemid in (14758658,14758660,14758666,14758671,14758678,14758680) or itemid in (14758686,14758691,14758698,14758700,14758706,14758711,14758718,14758720,14758726,14758731,14758732,14758758,14758760,14758766,14758768,14758771,14758772,14758778,14758780,14758786,14758788,14758791,14758792,14758798,14758800,14758806,14758808,14758811,14758812,14758818,14758820,14758826,14758828,14758829,14758831,14758832,14759218,14759220,14759226,14759231,14759258,14759260,14759266,14759271,14759278,14759280,14759286,14759291,14759294,14759299)); Index Scan using _hyper_6_1328_chunk_trends_clock_idx on _hyper_6_1328_chunk (cost=0.57..1.76 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=1) Index Cond: (clock >= 1689588000) Filter: ((itemid = ANY ('{14758658,14758660,14758666,14758671,14758678,14758680}'::bigint[])) OR (itemid = ANY ('{14758686,14758691,14758698,14758700,14758706,14758711,14758718,14758720,14758726,14758731,14758732,14758758,14758760,14758766,14758768,14758771,14758772,14758778,14758780,14758786,14758788,14758791,14758792,14758798,14758800,14758806,14758808,14758811,14758812,14758818,14758820,14758826,14758828,14758829,14758831,14758832,14759218,14759220,14759226,14759231,14759258,14759260,14759266,14759271,14759278,14759280,14759286,14759291,14759294,14759299}'::bigint[]))) Planning Time: 0.795 ms Execution Time: 0.015 ms (5 rows) The difference at 1000xITEMID is 30 minutes vs 2s.
Our questions: 1. would it be possible to change the query regarding the existence of trends without the "DISTINCT" filter and possibly checking duplicated records on the zabbix server side? |
Comment by Vladislavs Sokurenko [ 2023 Jul 17 ] |
Could it help if history syncers would sync trends on exit ? |
Comment by Albert Przybylski [ 2023 Jul 17 ] |
If I understand correctly, the idea is that the syncers should first synchronize the entire history and then count the trends at the end during normal operation? I don't think so because:
|
Comment by Albert Przybylski [ 2023 Jul 25 ] |
any update? |
Comment by Albert Przybylski [ 2023 Aug 08 ] |
any update? |
Comment by Michael Suoyanen [ 2023 Aug 28 ] |
I can confirm the existence of this problem. It also impactful for our PostgreSQL + timescaledb installation with required server performance of 3340 new values per second. $ sudo egrep 'trends_uint|trends ' /var/log/zabbix/zabbix_server.log | cut -c1-130 840135:20230827:005538.879 slow query: 37.561645 sec, "select distinct itemid from trends_uint where clock>=1693080000 and (itemid 840135:20230827:005601.676 slow query: 22.579087 sec, "select distinct itemid from trends where clock>=1693080000 and (itemid in ( 840144:20230827:015540.943 slow query: 39.698662 sec, "select distinct itemid from trends_uint where clock>=1693083600 and (itemid 840144:20230827:015605.205 slow query: 24.058609 sec, "select distinct itemid from trends where clock>=1693083600 and (itemid in ( 840131:20230827:025524.860 slow query: 22.945736 sec, "select distinct itemid from trends where clock>=1693087200 and (itemid in ( 840131:20230827:035534.760 slow query: 33.317425 sec, "select distinct itemid from trends_uint where clock>=1693090800 and (itemid 840131:20230827:035538.982 slow query: 3.958162 sec, "select distinct itemid from trends where clock>=1693090800 and (itemid in (4 840138:20230827:045512.787 slow query: 11.721433 sec, "select distinct itemid from trends_uint where clock>=1693094400 and (itemid 840138:20230827:045537.145 slow query: 23.897606 sec, "select distinct itemid from trends where clock>=1693094400 and (itemid in ( 840141:20230827:055536.451 slow query: 35.010828 sec, "select distinct itemid from trends_uint where clock>=1693098000 and (itemid 840141:20230827:055601.587 slow query: 24.615052 sec, "select distinct itemid from trends where clock>=1693098000 and (itemid in ( 840137:20230827:065537.109 slow query: 36.038789 sec, "select distinct itemid from trends_uint where clock>=1693101600 and (itemid 840137:20230827:065602.859 slow query: 25.538206 sec, "select distinct itemid from trends where clock>=1693101600 and (itemid in ( 840144:20230827:075542.070 slow query: 41.027687 sec, "select distinct itemid from trends_uint where clock>=1693105200 and (itemid 840144:20230827:075608.136 slow query: 25.855912 sec, "select distinct itemid from trends where clock>=1693105200 and (itemid in ( 840131:20230827:085541.298 slow query: 40.086800 sec, "select distinct itemid from trends_uint where clock>=1693108800 and (itemid 840131:20230827:085606.695 slow query: 25.221894 sec, "select distinct itemid from trends where clock>=1693108800 and (itemid in ( 840131:20230827:085611.167 slow query: 4.323703 sec, "select distinct itemid from trends_uint where clock>=1693108800 and (itemid 840144:20230827:095541.959 slow query: 40.421769 sec, "select distinct itemid from trends_uint where clock>=1693112400 and (itemid 840144:20230827:095607.966 slow query: 25.752856 sec, "select distinct itemid from trends where clock>=1693112400 and (itemid in ( 840137:20230827:105541.501 slow query: 40.177062 sec, "select distinct itemid from trends_uint where clock>=1693116000 and (itemid 840133:20230827:115541.932 slow query: 40.821343 sec, "select distinct itemid from trends_uint where clock>=1693119600 and (itemid 840133:20230827:115608.806 slow query: 26.600506 sec, "select distinct itemid from trends where clock>=1693119600 and (itemid in ( 840135:20230827:125543.009 slow query: 41.932686 sec, "select distinct itemid from trends_uint where clock>=1693123200 and (itemid 840135:20230827:125609.644 slow query: 26.390846 sec, "select distinct itemid from trends where clock>=1693123200 and (itemid in ( 840135:20230827:125614.248 slow query: 4.439916 sec, "select distinct itemid from trends_uint where clock>=1693123200 and (itemid 840132:20230827:135545.037 slow query: 44.007107 sec, "select distinct itemid from trends_uint where clock>=1693126800 and (itemid 840132:20230827:135612.996 slow query: 27.766573 sec, "select distinct itemid from trends where clock>=1693126800 and (itemid in ( 840133:20230827:145546.903 slow query: 44.912346 sec, "select distinct itemid from trends_uint where clock>=1693130400 and (itemid 840133:20230827:145615.778 slow query: 28.673617 sec, "select distinct itemid from trends where clock>=1693130400 and (itemid in ( 840133:20230827:145621.448 slow query: 5.473375 sec, "select distinct itemid from trends_uint where clock>=1693130400 and (itemid 840134:20230827:155545.418 slow query: 44.355170 sec, "select distinct itemid from trends_uint where clock>=1693134000 and (itemid 840134:20230827:155614.172 slow query: 28.498124 sec, "select distinct itemid from trends where clock>=1693134000 and (itemid in ( 840138:20230827:165529.878 slow query: 27.152335 sec, "select distinct itemid from trends where clock>=1693137600 and (itemid in ( 840137:20230827:175545.125 slow query: 44.009693 sec, "select distinct itemid from trends_uint where clock>=1693141200 and (itemid 840137:20230827:175612.547 slow query: 26.868086 sec, "select distinct itemid from trends where clock>=1693141200 and (itemid in ( 840135:20230827:185543.837 slow query: 42.567579 sec, "select distinct itemid from trends_uint where clock>=1693144800 and (itemid 840135:20230827:185610.388 slow query: 26.340447 sec, "select distinct itemid from trends where clock>=1693144800 and (itemid in ( 840129:20230827:195529.526 slow query: 26.670334 sec, "select distinct itemid from trends where clock>=1693148400 and (itemid in ( 840132:20230827:205544.984 slow query: 43.860205 sec, "select distinct itemid from trends_uint where clock>=1693152000 and (itemid 840132:20230827:205612.367 slow query: 27.225530 sec, "select distinct itemid from trends where clock>=1693152000 and (itemid in ( 840132:20230827:205616.958 slow query: 4.415102 sec, "select distinct itemid from trends_uint where clock>=1693152000 and (itemid 840144:20230827:215546.163 slow query: 45.099487 sec, "select distinct itemid from trends_uint where clock>=1693155600 and (itemid 840144:20230827:215614.039 slow query: 27.673360 sec, "select distinct itemid from trends where clock>=1693155600 and (itemid in ( 840141:20230827:225542.892 slow query: 41.506527 sec, "select distinct itemid from trends_uint where clock>=1693159200 and (itemid 840141:20230827:225612.291 slow query: 28.481733 sec, "select distinct itemid from trends where clock>=1693159200 and (itemid in ( 840133:20230827:235544.126 slow query: 43.086927 sec, "select distinct itemid from trends_uint where clock>=1693162800 and (itemid 840133:20230827:235611.330 slow query: 26.915976 sec, "select distinct itemid from trends where clock>=1693162800 and (itemid in ( 840139:20230828:005549.359 slow query: 48.293633 sec, "select distinct itemid from trends_uint where clock>=1693166400 and (itemid 840139:20230828:005618.976 slow query: 29.368438 sec, "select distinct itemid from trends where clock>=1693166400 and (itemid in ( 840137:20230828:015555.142 slow query: 54.105984 sec, "select distinct itemid from trends_uint where clock>=1693170000 and (itemid 840137:20230828:015625.525 slow query: 30.180718 sec, "select distinct itemid from trends where clock>=1693170000 and (itemid in ( 840134:20230828:025545.904 slow query: 44.847353 sec, "select distinct itemid from trends_uint where clock>=1693173600 and (itemid 840134:20230828:035506.787 slow query: 5.606605 sec, "select distinct itemid from trends_uint where clock>=1693177200 and (itemid 840134:20230828:035537.862 slow query: 30.805581 sec, "select distinct itemid from trends where clock>=1693177200 and (itemid in ( 840136:20230828:045545.394 slow query: 44.175888 sec, "select distinct itemid from trends_uint where clock>=1693180800 and (itemid 840136:20230828:045555.515 slow query: 9.704390 sec, "select distinct itemid from trends where clock>=1693180800 and (itemid in (5 840135:20230828:055542.317 slow query: 41.187523 sec, "select distinct itemid from trends_uint where clock>=1693184400 and (itemid 840135:20230828:055611.038 slow query: 28.112856 sec, "select distinct itemid from trends where clock>=1693184400 and (itemid in ( 840144:20230828:065543.768 slow query: 42.100340 sec, "select distinct itemid from trends_uint where clock>=1693188000 and (itemid 840144:20230828:065611.897 slow query: 27.928841 sec, "select distinct itemid from trends where clock>=1693188000 and (itemid in ( 840140:20230828:075546.431 slow query: 44.697599 sec, "select distinct itemid from trends_uint where clock>=1693191600 and (itemid 840140:20230828:075614.798 slow query: 28.155597 sec, "select distinct itemid from trends where clock>=1693191600 and (itemid in ( 840138:20230828:085531.014 slow query: 29.481114 sec, "select distinct itemid from trends where clock>=1693195200 and (itemid in ( 840132:20230828:095548.111 slow query: 47.057455 sec, "select distinct itemid from trends_uint where clock>=1693198800 and (itemid 840132:20230828:095617.674 slow query: 29.288304 sec, "select distinct itemid from trends where clock>=1693198800 and (itemid in ( 840130:20230828:105546.387 slow query: 45.012145 sec, "select distinct itemid from trends_uint where clock>=1693202400 and (itemid 840130:20230828:105616.985 slow query: 29.924186 sec, "select distinct itemid from trends where clock>=1693202400 and (itemid in ( 840132:20230828:115557.630 slow query: 56.440338 sec, "select distinct itemid from trends_uint where clock>=1693206000 and (itemid 840132:20230828:115629.199 slow query: 31.272121 sec, "select distinct itemid from trends where clock>=1693206000 and (itemid in ( 840136:20230828:125548.332 slow query: 47.253398 sec, "select distinct itemid from trends_uint where clock>=1693209600 and (itemid 840136:20230828:125618.384 slow query: 29.751092 sec, "select distinct itemid from trends where clock>=1693209600 and (itemid in ( 840136:20230828:125623.781 slow query: 5.230062 sec, "select distinct itemid from trends_uint where clock>=1693209600 and (itemid 840136:20230828:125630.392 slow query: 3.048426 sec, "select distinct itemid from trends where clock>=1693209600 and (itemid in (3 840136:20230828:135553.023 slow query: 51.934488 sec, "select distinct itemid from trends_uint where clock>=1693213200 and (itemid 840136:20230828:135624.011 slow query: 30.784102 sec, "select distinct itemid from trends where clock>=1693213200 and (itemid in ( 840133:20230828:145551.281 slow query: 49.348155 sec, "select distinct itemid from trends_uint where clock>=1693216800 and (itemid 840133:20230828:145622.586 slow query: 31.108760 sec, "select distinct itemid from trends where clock>=1693216800 and (itemid in ( 840133:20230828:145627.630 slow query: 4.880079 sec, "select distinct itemid from trends_uint where clock>=1693216800 and (itemid 840139:20230828:155549.383 slow query: 48.181702 sec, "select distinct itemid from trends_uint where clock>=1693220400 and (itemid 840139:20230828:155621.620 slow query: 31.969781 sec, "select distinct itemid from trends where clock>=1693220400 and (itemid in ( 840135:20230828:165549.778 slow query: 48.707330 sec, "select distinct itemid from trends_uint where clock>=1693224000 and (itemid 840135:20230828:165621.841 slow query: 30.515057 sec, "select distinct itemid from trends where clock>=1693224000 and (itemid in (
|
Comment by Vladislavs Sokurenko [ 2023 Sep 25 ] |
Is it faster if query is changed the following way: EXPLAIN ANALIYZE select itemid from trends where clock=1688547600 and (itemid in (1000xITEMID)); |
Comment by Vladislavs Sokurenko [ 2023 Sep 26 ] |
Also the root cause could be that chunk time interval for trends is too big: PERFORM create_hypertable('trends', 'clock', chunk_time_interval => 2592000, migrate_data => true); PERFORM create_hypertable('trends_uint', 'clock', chunk_time_interval => 2592000, migrate_data => true); UPDATE config SET db_extension='timescaledb',hk_history_global=1,hk_trends_global=1; |
Comment by Vladislavs Sokurenko [ 2023 Sep 26 ] |
Suggested changes (remove distinct and or to simplify query): |
Comment by Vladislavs Sokurenko [ 2023 Sep 27 ] |
Also for faster restart it would be nice to store last trend in item_rtdata but out of scope of this task. |
Comment by Vladislavs Sokurenko [ 2023 Sep 28 ] |
Implemented in pull request feature/ZBX-23064-6.0 |
Comment by Vladislavs Sokurenko [ 2023 Oct 05 ] |
Implemented in:
|
Comment by LivreAcesso.Pro [ 2023 Oct 05 ] |
This can speed-up some things: ALTER EXTENSION timescaledb UPDATE; SELECT set_chunk_time_interval('history', 3600); SELECT set_chunk_time_interval('history_uint', 3600); SELECT set_chunk_time_interval('history_str', 3600); SELECT set_chunk_time_interval('trends', 84600); SELECT set_chunk_time_interval('trends_uint', 84600); |
Comment by Ruslan Aznabaev [ 2024 Jul 01 ] |
I don't think what this issue was fixed. |