[ZBXNEXT-3071] History cache optimization Created: 2015 Dec 14 Updated: 2016 Jun 04 Resolved: 2016 Jan 22 |
|
Status: | Closed |
Project: | ZABBIX FEATURE REQUESTS |
Component/s: | Proxy (P), Server (S) |
Affects Version/s: | None |
Fix Version/s: | 3.0.0alpha6 |
Type: | Change Request | Priority: | Major |
Reporter: | Andris Zeila | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 1 |
Labels: | cache, performance | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: |
![]() |
||||||||
Issue Links: |
|
Description |
When a small set of items starts flooding history cache with values, history syncers can't process the cached data fast enough. This happens because the less the stored items/values ratio is, the more it slows down picking items for processing. To improve it the history cache internal structure must be redesigned. |
Comments |
Comment by Andris Zeila [ 2015 Dec 16 ] |
Specification at https://www.zabbix.org/wiki/Docs/specs/ZBXNEXT-3071 |
Comment by Filipe Paternot [ 2015 Dec 16 ] |
Question: On history cache you say: ... The history cache memory allocator must handle out of memory situations without exiting with low memory error. How will it handle? Currently, we've seen the history cache request new data from proxy (passive proxy) or accept newer data from it (active proxy) and drop old data within the cache. Is this the desired behavior? Should it stay that way? Within our company, we debated and thought no, but perhaps there are such cases (lower time to flush the cache). The reason we thought the history cache must not be replaced when it's full is because we can throttle how much data we store in zabbix proxy (ProxyOfflineBuffer). If that data is still there, it should be sent to Zabbix Server and stored in database. Otherwise we drop it at proxy right away. Since it's not clear on the specs what should happen in such case, we believe it's worth the discussion. |
Comment by Andris Zeila [ 2015 Dec 16 ] |
History cache does not drop data. When it is full the process attempting to add new data will wait for 1 second before trying again until it succeeds (history syncers have freed enough space). This is the current functionality and it will stay the same way. Proxy can drop data from history tables in offline mode depending on ProxyOfflineBuffer parameter, but this feature is not related to history cache. |
Comment by Filipe Paternot [ 2015 Dec 16 ] |
Ok then, thanks for the clarification. |
Comment by Andris Zeila [ 2015 Dec 16 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-3071 |
Comment by Alexander Vladishev [ 2015 Dec 17 ] |
(1) memory description is incorrect: zbx_mem_create(&hc_mem, hc_shm_key, ZBX_NO_MUTEX, CONFIG_HISTORY_CACHE_SIZE, "history cache size", "HistoryCacheSize", 1); zbx_mem_create(&hc_index_mem, hc_index_shm_key, ZBX_NO_MUTEX, CONFIG_HISTORY_INDEX_CACHE_SIZE, "history index cache size", "HistoryIndexCacheSize", 0); history cache and history index cache must be accordingly. wiper RESOLVED in r57261 sasha CLOSED |
Comment by Alexander Vladishev [ 2016 Jan 06 ] |
(2) src/libs/zbxdbcache/dbconfig.c:5035 the call of the function zbx_vector_uint64_clear() must be moved outside history cache lock wiper RESOLVED in r57460 sasha CLOSED |
Comment by Alexander Vladishev [ 2016 Jan 06 ] |
(3) src/libs/zbxdbcache/dbcache.c:2023 Seems candidate_num is always equal history_num. Therefore this condition will be always FALSE. if (ZBX_HC_SYNC_PROCESSED_PCNT > history_num * 100 / candidate_num) wiper history_num is the number of available items (not locked by other triggers). sasha WON'T FIX |
Comment by Alexander Vladishev [ 2016 Jan 06 ] |
(4) To increase performance ZBX_HC_SYNC_PROCESSED_PCNT can be defined as minimum count of processed items. wiper The idea is if the percentage of busy (locked) items is too large then it's better to wait for other syncers to release them instead of taking from queue just to put them back again. On the other hand if there are no busy items, we should continue syncing even if we processed only one item. sasha WON'T FIX |
Comment by Alexander Vladishev [ 2016 Jan 06 ] |
(5) Cosmetics ZBX_SYNC_MAX must be renamed into ZBX_HC_SYNC_MAX and moved into src/libs/zbxdbcache/dbcache.c wiper RESOLVED in r57462 sasha CLOSED |
Comment by Alexander Vladishev [ 2016 Jan 06 ] |
(6) History cache must be locked while processing of the function hc_update_history_queue(). wiper RESOLVED in r57463 sasha CLOSED |
Comment by Alexander Vladishev [ 2016 Jan 06 ] |
(7) cache->history_num variable can be removed from history cache wiper Instead of removing it we decided to add even more statistics in future. |
Comment by Alexander Vladishev [ 2016 Jan 06 ] |
(8) next_sync can be uninicialized in function hc_push_processed_items() wiper RESOLVED in r57464 sasha CLOSED |
Comment by Alexander Vladishev [ 2016 Jan 07 ] |
(9) src/libs/zbxdbcache/dbcache.c:2582 this code can be executed multiple times (*data)->ts = item_value->ts; DCcheck_ns(&(*data)->ts); sasha RESOLVED in r57454 wiper CLOSED |
Comment by Andris Zeila [ 2016 Jan 07 ] |
(10) The hc_push_unavailable_items() pushes items locked by triggers back into history queue releasing ownership of them. Those items could be removed from cache by other syncers, so we can't use item->status in hc_get_item_values() and hc_push_processed_items() functions to check if the item is busy or not. That can be fixed by reseting busy item pointers to NULL in hc_push_unavailable_items() funciton and then simply compare the items with NULL instead of comparing their state with ZBX_HC_ITEM_STATUS_AVAILABLE. wiper RESOLVED in r57468 sasha CLOSED |
Comment by Andris Zeila [ 2016 Jan 07 ] |
(11) The meta field must be always set, otherwise server could ignore the history data. sasha CLOSED |
Comment by Alexander Vladishev [ 2016 Jan 07 ] |
(12) keep_history and keep_trends fields must be always set RESOLVED by wiper in r57482 sasha CLOSED |
Comment by Alexander Vladishev [ 2016 Jan 07 ] |
(13) value_type can be uninitialized in these places: src/libs/zbxdbcache/dbcache.c:755 wiper RESOLVED in r57491 sasha CLOSED |
Comment by Alexander Vladishev [ 2016 Jan 08 ] |
Successfully tested! |
Comment by Alexander Vladishev [ 2016 Jan 08 ] |
(14) Documentation
wiper RESOLVED
REOPENED wiper also updated template changes. sasha Great! But information about internal checks must be removed from Zabbix Server and Zabbix Proxy pages. REOPENED wiper RESOLVED sasha Many thanks! CLOSED |
Comment by Andris Zeila [ 2016 Jan 08 ] |
Released in:
|
Comment by MATSUDA Daiki [ 2016 Jan 12 ] |
I think that r57505 fix requires with dbupgrade_2050.c for upgrading from previous version. |
Comment by Andris Zeila [ 2016 Jan 12 ] |
If you meant to create patch to replace history text cache monitoring items/triggers with history index cache monitoring items/triggers - they have different meanings, so we can't use the new items/triggers as drop in replacements. |
Comment by MATSUDA Daiki [ 2016 Jan 13 ] |
I think that the function like following SQL statement is needed in dbupgrade_2050.c. update items set name='Zabbix history index cache, % free' where itemid in (23341, 23275) and name='Zabbix $2 write cache, % free'; |
Comment by richlv [ 2016 Jan 13 ] |
that is not possible, there could be any other items with those ids (or no items at all). also see the explanation by wiper |
Comment by Aleksandrs Saveljevs [ 2016 Jan 13 ] |
Additionally, item and trigger names could have been translated into other languages. |
Comment by Aleksandrs Saveljevs [ 2016 Jan 21 ] |
(15) It does not say at https://www.zabbix.com/documentation/3.0/manual/config/items/itemtypes/internal that the new parameters are supported since Zabbix 3.0.0. wiper RESOLVED asaveljevs Changed the note to refer to "index" cache specifically, rather than item as a whole. Please take a look. wiper CLOSED |
Comment by dimir [ 2016 Jan 26 ] |
With 8 history syncers processing 500000 values of 100 items (interleaved by item pairs), before and after this optimization, the improvement is huge. |
Comment by Oleksii Zagorskyi [ 2016 Jan 26 ] |
Would be interesting to see another test (how it usually happens in production) - single item is spamming "production" zabbix server (where many other items periodically are sending values) and how does short queues (5-10 secs, 10-30 secs and 30-60 secs) with short update interval=5 secs look like. |
Comment by dimir [ 2016 Jan 26 ] |
This scenario was made with trapper items. I understand it's not something generally used but still gives some picture. wiper log monitoring could give similar results. |
Comment by richlv [ 2016 Mar 12 ] |
the template changes seem to be missing from the upgrade notes & the new version of the template is not available in the template download page - instead of handling that in this issue, created |