[ZBXNEXT-3071] History cache optimization Created: 2015 Dec 14  Updated: 2016 Jun 04  Resolved: 2016 Jan 22

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Proxy (P), Server (S)
Affects Version/s: None
Fix Version/s: 3.0.0alpha6

Type: Change Request Priority: Major
Reporter: Andris Zeila Assignee: Unassigned
Resolution: Fixed Votes: 1
Labels: cache, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File history-cache-optimization.png    
Issue Links:
Duplicate
is duplicated by ZBX-9201 History syncers can not process a lot... Closed

 Description   

When a small set of items starts flooding history cache with values, history syncers can't process the cached data fast enough.

This happens because the less the stored items/values ratio is, the more it slows down picking items for processing.

To improve it the history cache internal structure must be redesigned.



 Comments   
Comment by Andris Zeila [ 2015 Dec 16 ]

Specification at https://www.zabbix.org/wiki/Docs/specs/ZBXNEXT-3071

Comment by Filipe Paternot [ 2015 Dec 16 ]

Question:

On history cache you say:

...
The history cache memory allocator must handle out of memory situations without exiting with low memory error.

How will it handle? Currently, we've seen the history cache request new data from proxy (passive proxy) or accept newer data from it (active proxy) and drop old data within the cache. Is this the desired behavior? Should it stay that way? Within our company, we debated and thought no, but perhaps there are such cases (lower time to flush the cache).

The reason we thought the history cache must not be replaced when it's full is because we can throttle how much data we store in zabbix proxy (ProxyOfflineBuffer). If that data is still there, it should be sent to Zabbix Server and stored in database. Otherwise we drop it at proxy right away.

Since it's not clear on the specs what should happen in such case, we believe it's worth the discussion.

Comment by Andris Zeila [ 2015 Dec 16 ]

History cache does not drop data. When it is full the process attempting to add new data will wait for 1 second before trying again until it succeeds (history syncers have freed enough space). This is the current functionality and it will stay the same way.

Proxy can drop data from history tables in offline mode depending on ProxyOfflineBuffer parameter, but this feature is not related to history cache.

Comment by Filipe Paternot [ 2015 Dec 16 ]

Ok then, thanks for the clarification.

Comment by Andris Zeila [ 2015 Dec 16 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-3071

Comment by Alexander Vladishev [ 2015 Dec 17 ]

(1) memory description is incorrect:

zbx_mem_create(&hc_mem, hc_shm_key, ZBX_NO_MUTEX, CONFIG_HISTORY_CACHE_SIZE, "history cache size",
                "HistoryCacheSize", 1);
zbx_mem_create(&hc_index_mem, hc_index_shm_key, ZBX_NO_MUTEX, CONFIG_HISTORY_INDEX_CACHE_SIZE,
                "history index cache size", "HistoryIndexCacheSize", 0);

history cache and history index cache must be accordingly.

wiper RESOLVED in r57261

sasha CLOSED

Comment by Alexander Vladishev [ 2016 Jan 06 ]

(2) src/libs/zbxdbcache/dbconfig.c:5035 the call of the function zbx_vector_uint64_clear() must be moved outside history cache lock

wiper RESOLVED in r57460

sasha CLOSED

Comment by Alexander Vladishev [ 2016 Jan 06 ]

(3) src/libs/zbxdbcache/dbcache.c:2023 Seems candidate_num is always equal history_num. Therefore this condition will be always FALSE.

if (ZBX_HC_SYNC_PROCESSED_PCNT > history_num * 100 / candidate_num)

wiper history_num is the number of available items (not locked by other triggers).
It will be equal to candidate_num on proxies or if there are no locked items.

sasha WON'T FIX

Comment by Alexander Vladishev [ 2016 Jan 06 ]

(4) To increase performance ZBX_HC_SYNC_PROCESSED_PCNT can be defined as minimum count of processed items.

wiper The idea is if the percentage of busy (locked) items is too large then it's better to wait for other syncers to release them instead of taking from queue just to put them back again.

On the other hand if there are no busy items, we should continue syncing even if we processed only one item.

sasha WON'T FIX

Comment by Alexander Vladishev [ 2016 Jan 06 ]

(5) Cosmetics

ZBX_SYNC_MAX must be renamed into ZBX_HC_SYNC_MAX and moved into src/libs/zbxdbcache/dbcache.c
ZBX_HC_SYNC_PROCESSED_PCNT into ZBX_HC_SYNC_MIN_PROCESSED

wiper RESOLVED in r57462

sasha CLOSED

Comment by Alexander Vladishev [ 2016 Jan 06 ]

(6) History cache must be locked while processing of the function hc_update_history_queue().

wiper RESOLVED in r57463

sasha CLOSED

Comment by Alexander Vladishev [ 2016 Jan 06 ]

(7) cache->history_num variable can be removed from history cache

wiper Instead of removing it we decided to add even more statistics in future.
WON'T FIX

Comment by Alexander Vladishev [ 2016 Jan 06 ]

(8) next_sync can be uninicialized in function hc_push_processed_items()

wiper RESOLVED in r57464

sasha CLOSED

Comment by Alexander Vladishev [ 2016 Jan 07 ]

(9) src/libs/zbxdbcache/dbcache.c:2582 this code can be executed multiple times

(*data)->ts = item_value->ts;
DCcheck_ns(&(*data)->ts);

sasha RESOLVED in r57454

wiper CLOSED

Comment by Andris Zeila [ 2016 Jan 07 ]

(10) The hc_push_unavailable_items() pushes items locked by triggers back into history queue releasing ownership of them. Those items could be removed from cache by other syncers, so we can't use item->status in hc_get_item_values() and hc_push_processed_items() functions to check if the item is busy or not.

That can be fixed by reseting busy item pointers to NULL in hc_push_unavailable_items() funciton and then simply compare the items with NULL instead of comparing their state with ZBX_HC_ITEM_STATUS_AVAILABLE.

wiper RESOLVED in r57468

sasha CLOSED

Comment by Andris Zeila [ 2016 Jan 07 ]

(11) The meta field must be always set, otherwise server could ignore the history data.
RESOLVED in r57472

sasha CLOSED

Comment by Alexander Vladishev [ 2016 Jan 07 ]

(12) keep_history and keep_trends fields must be always set

RESOLVED by wiper in r57482

sasha CLOSED

Comment by Alexander Vladishev [ 2016 Jan 07 ]

(13) value_type can be uninitialized in these places:

src/libs/zbxdbcache/dbcache.c:755
src/libs/zbxdbcache/dbcache.c:1468
src/libs/zbxdbcache/dbcache.c:1498
src/libs/zbxdbcache/dbcache.c:1528
src/libs/zbxdbcache/dbcache.c:1561
src/libs/zbxdbcache/dbcache.c:1598

wiper RESOLVED in r57491

sasha CLOSED

Comment by Alexander Vladishev [ 2016 Jan 08 ]

Successfully tested!

Comment by Alexander Vladishev [ 2016 Jan 08 ]

(14) Documentation

wiper RESOLVED

sasha

REOPENED

wiper also updated template changes.
RESOLVED

sasha Great! But information about internal checks must be removed from Zabbix Server and Zabbix Proxy pages.

REOPENED

wiper RESOLVED

sasha Many thanks! CLOSED

Comment by Andris Zeila [ 2016 Jan 08 ]

Released in:

  • pre-3.0.0alpha6 r57505
Comment by MATSUDA Daiki [ 2016 Jan 12 ]

I think that r57505 fix requires with dbupgrade_2050.c for upgrading from previous version.

Comment by Andris Zeila [ 2016 Jan 12 ]

If you meant to create patch to replace history text cache monitoring items/triggers with history index cache monitoring items/triggers - they have different meanings, so we can't use the new items/triggers as drop in replacements.

Comment by MATSUDA Daiki [ 2016 Jan 13 ]

I think that the function like following SQL statement is needed in dbupgrade_2050.c.

update items set name='Zabbix history index cache, % free' where itemid in (23341, 23275) and name='Zabbix $2 write cache, % free';
update triggers set description='Less than 25% free in the history index cache' where triggerid in (13017, 13519, 13489) and description='Less than 25% free in the text history cache';

Comment by richlv [ 2016 Jan 13 ]

that is not possible, there could be any other items with those ids (or no items at all).
even if items with the same name and ids were there, they could have been modified by the user.
and that would change the items in the template for a few users, but not for most - a terribly messy situation.

also see the explanation by wiper

Comment by Aleksandrs Saveljevs [ 2016 Jan 13 ]

Additionally, item and trigger names could have been translated into other languages.

Comment by Aleksandrs Saveljevs [ 2016 Jan 21 ]

(15) It does not say at https://www.zabbix.com/documentation/3.0/manual/config/items/itemtypes/internal that the new parameters are supported since Zabbix 3.0.0.

wiper RESOLVED

asaveljevs Changed the note to refer to "index" cache specifically, rather than item as a whole. Please take a look.

wiper CLOSED

Comment by dimir [ 2016 Jan 26 ]

With 8 history syncers processing 500000 values of 100 items (interleaved by item pairs), before and after this optimization, the improvement is huge.

Comment by Oleksii Zagorskyi [ 2016 Jan 26 ]

Would be interesting to see another test (how it usually happens in production) - single item is spamming "production" zabbix server (where many other items periodically are sending values) and how does short queues (5-10 secs, 10-30 secs and 30-60 secs) with short update interval=5 secs look like.

Comment by dimir [ 2016 Jan 26 ]

This scenario was made with trapper items. I understand it's not something generally used but still gives some picture.

wiper log monitoring could give similar results.

Comment by richlv [ 2016 Mar 12 ]

the template changes seem to be missing from the upgrade notes & the new version of the template is not available in the template download page - instead of handling that in this issue, created ZBX-10528 to handle several templates like that

Generated at Fri Apr 19 19:18:14 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.