-
Problem report
-
Resolution: Unresolved
-
Trivial
-
None
-
None
-
S24-W50/51/52
-
2
Hi. I use zabbix server 6.4.12 with HA cluster (3 nodes) with Postgresql 13.7 and Elasticsearch 7.10.2 as a history storage.
My server config is(perfomance options)
StartDBSyncers=100
StartPollers=200
StartPreprocessors=200
StartPollersUnreachable=100
StartHistoryPollers=100
StartTrappers=500
StartPingers=100
StartDiscoverers=20
StartHTTPPollers=5
StartTimers=20
StartEscalators=70
StartAlerters=30
SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
StartSNMPTrapper=1
HousekeepingFrequency=1
MaxHousekeeperDelete=10000
CacheSize=4G
HistoryCacheSize=128M
HistoryIndexCacheSize=128M
TrendCacheSize=96M
TrendFunctionCacheSize=64M
ValueCacheSize=3G
Main data server gathers from active proxies.
Server instances HW config:
58 CPU 32G RAM
Postgresql: 48 CPU 32G RAM 1.5T NVME
Elasticsearch: 48 CPU 64G RAM 1T SSD each node in 20 nodes cluster
This setup works fine until active HA node restarts. The active node changes and zabbix server starts. After that it seems like zabbix server starts fill value cache from history storage. I check cache by runtime command diaginfo=valuecache and see that cache filling with too big amount of data for items. As far as this process going all items has a growing lag of last execution time. This occurs switching to problem state for a lot of triggers, because of nodata in items. After history loading finished, all delays are quickly released and triggers switched to recovery too. This process takes about 40-60 minutes, it's critically long time.
About HW resources usage: I see only elasticsearch have hi usage of HW resources during this process.
For example I see in valuecache top something like this:
== value cache diagnostic information == Items:1656883 values:29262184 mode:0 time:0.616933 Memory: size: free:2132556224 used:941124992 chunks: free:337247 used:8884246 min:24 max:2021103920 buckets: 24:1085 32:67598 40:2665 48:32936 56:329 64:54257 72:236 80:64970 88:232 96:6 112:254 120:8 128:22 144:241 152:1 160:75 168:4 176:4 192:3312 200:9 208:931 216:10 224:904 232:20 240:802 248:4 256+:106332 Top.values: itemid:2484077 values:14082 request.values:1 itemid:2484074 values:14075 request.values:3 itemid:2484067 values:14074 request.values:3 itemid:1855764 values:14065 request.values:3 itemid:1855767 values:14065 request.values:1 itemid:1855757 values:14057 request.values:3 itemid:1224231 values:11038 request.values:2 itemid:1709840 values:11026 request.values:2 itemid:3774916 values:10006 request.values:3 itemid:1249227 values:9404 request.values:3 itemid:1249220 values:9400 request.values:3 itemid:1249230 values:9400 request.values:1 itemid:914472 values:7388 request.values:2 itemid:3774763 values:7358 request.values:2 itemid:2690617 values:6737 request.values:5 itemid:323702 values:4880 request.values:3 itemid:3774762 values:4416 request.values:6 itemid:1249154 values:4415 request.values:3 itemid:1249155 values:4415 request.values:3 itemid:2374951 values:4414 request.values:3 itemid:3629937 values:4414 request.values:3 itemid:2104685 values:4413 request.values:3 itemid:2975960 values:4409 request.values:2 itemid:1249211 values:4383 request.values:1 itemid:2376807 values:4109 request.values:2 Top.request.values: itemid:2142673 values:267 request.values:241 itemid:2142639 values:159 request.values:145 itemid:2517005 values:103 request.values:73 itemid:2516394 values:103 request.values:73 itemid:2343903 values:103 request.values:73 itemid:2517004 values:102 request.values:73 itemid:2516393 values:102 request.values:73 itemid:2343902 values:102 request.values:73 itemid:318651 values:102 request.values:73 itemid:2343262 values:100 request.values:73 itemid:2333068 values:100 request.values:73 itemid:318736 values:99 request.values:73 itemid:318650 values:99 request.values:73 itemid:354746 values:99 request.values:73 itemid:3629366 values:98 request.values:73 itemid:318735 values:98 request.values:73 itemid:354745 values:98 request.values:73 itemid:354918 values:98 request.values:73 itemid:3629365 values:97 request.values:73 itemid:354831 values:97 request.values:73 itemid:2343261 values:97 request.values:73 itemid:354917 values:97 request.values:73 itemid:318821 values:97 request.values:73 itemid:379123 values:97 request.values:73 itemid:354832 values:97 request.values:73 ==
Let's see this items from Top.values - (see screenshot 1)
and item and trigger config(screenshots 2-4).
Trigger requires just last 3 values, so i expected that for each item like this cache keeps 3 values, but it really it try keep 14082.
All values in Top.values have the same issue.
For example let's see one more:
itemid:3774916 values:10006 request.values:3
screenshots 5 -8.
There is the same situation: we need just 3 values but keep in cache 10006 an try fill it on server start.
- depends on
-
ZBX-25777 Reduce startup load due to trigger evaluation within 30 seconds
- Confirmed
- is duplicated by
-
ZBXNEXT-5937 Date math support for elasticsearch queries
- Closed
- part of
-
ZBXNEXT-6487 Elastic product support is still experimental - drop of support or further development
- Open
-
ZBXNEXT-714 need scalable alternative for the history and items tables
- Open