[ZBXNEXT-3006] A cache to provide multiple metrics of a single custom function Created: 2015 Oct 09  Updated: 2024 Apr 10  Resolved: 2017 Aug 30

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Agent (G), Proxy (P), Server (S)
Affects Version/s: 2.2.10
Fix Version/s: 3.4.1rc1, 3.4 (plan), 4.0.0alpha1

Type: New Feature Request Priority: Minor
Reporter: Marc Assignee: Unassigned
Resolution: Fixed Votes: 10
Labels: bulk, cache, externalchecks, loadablemodule, performance, userparameters
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Attachments: PNG File broken-change-per-second.png     XML File zbx_export_hosts.xml     Text File zbxnext-3006.patch    
Issue Links:
Causes
causes ZBX-12661 Cannot edit item of Type 'Dependent i... Closed
causes ZBX-13335 Heavy SQL query with huge output in t... Closed
causes ZBX-14812 Hex to Decimal does not ignore whites... Closed
causes ZBX-12618 item became not supported: Item prep... Closed
causes ZBX-14396 Template full clone exhaust memory (l... Closed
causes ZBX-12635 Unsupported low level discovery rule ... Closed
causes ZBX-15539 Incorrect error message when changing... Closed
Sub-task
depends on ZBXNEXT-4109 Allow to use LLD MACRO value as a val... Closed
depends on ZBXNEXT-4200 Ability to create LLD item prototype ... Closed
part of ZBXNEXT-3508 Add embedded script support to item p... Closed
part of ZBXNEXT-3863 Extend item preprocessing with jsonpa... Closed
part of ZBXNEXT-3864 Extend item preprocessing with xpath ... Closed
part of ZBX-12251 New values aren't counted towards tri... Closed
Sub-Tasks:
Key
Summary
Type
Status
Assignee
ZBXNEXT-3883 Subtask: frontend for A cache to prov... Change Request (Sub-task) Closed  
Team: Team C
Sprint: Sprint 2, Sprint 4, Sprint 5, Sprint 6, Sprint 7, Sprint 8, Sprint 9, Sprint 10, Sprint 11, Sprint 12, Sprint 13, Sprint 14, Sprint 15
Story Points: 24

 Description   

Custom functions (items) based on External check, User parameter or loadable module often parse output or content of commands resp. files that include information for lots of metrics.

Thanks to parameter support in items keys a single function can be used to extract the value of a particular metric only.
When being interested in many of the included metrics or even all of them, then this can become quite inefficient. For each dedicated metric all information is requested just to ignore values from the other ones (see ZBXNEXT-103).

In best case a loadable module function is used to access a pseudo file.
A rather worse scenario could be the need to run an expensive command or command chain by User parameter:

UserParameter=openvz.ubc[*],/usr/bin/sudo /usr/bin/tac /proc/user_beancounters | awk '/$1/{print $(NF-5+$2);exit}'

The previous User parameter extracts one of five possible values for one of 20 possible resources from OpenVZ's User Beancounters table

When being interested in every available metric this means 100 User parameter calls each obtaining the very same information per OpenVZ container - and there may be hundreds of containers per physical host.

This ticket is about providing a way to improve this by caching all potential information in memory, so that (dynamic provided) custom functions can then extract data from this regularly updated cache.

What I currently think of is introducing a new cache, lets say Bulk Parameter Cache, which may be populated by Bulk parameters.
A Bulk user parameter could look like this:

BulkUserParameter=openvz.ubc,1,sudo tac /proc/user_beancounters | awk ...

What means the agent executes the deposited command every minute and memorizes the output in the designated cache.
The output must follow a common syntax of course, lets say it must result in a list of key value pairs like this:

# sudo tac /proc/user_beancounters | awk '$1=="uid"{for(i=4;i>=0;i--)h[i]=$(7-i);next}$6~/./{r=$(NF-5);for(i=0;i<5;i++)printf("openvz.ubc[%s,%s] %d\n",r,h[i],$(NF-i))}'
openvz.ubc[kmemsize,failcnt] 0
openvz.ubc[kmemsize,limit] 1217787766
openvz.ubc[kmemsize,barrier] 1217604266
openvz.ubc[kmemsize,maxheld] 793927680
openvz.ubc[kmemsize,held] 764308562
openvz.ubc[lockedpages,failcnt] 0
openvz.ubc[lockedpages,limit] 1368
openvz.ubc[lockedpages,barrier] 1368
openvz.ubc[lockedpages,maxheld] 123
--- SNIP --- SNAP --- SNIP --- SNAP ---

On each cache updated new keys get appended, existing key values get updated and missing keys get removed from the cache.

Ideally each in cache existing key becomes dynamically a valid item key available to passive as well as to active agent checks returning the current key value on demand.

By this the expensive part is significant reduced to one call per minimum desired item update interval.

Of course this should not be limited to Bulk user parameters but be also supported for Bulk external checks or for use in loadable modules.

PS: OpenVZ has been chosen as example just because of being easy to parse.



 Comments   
Comment by Marc [ 2015 Oct 09 ]

Because the item key openvz.ubc is already provided by the BulkUserParameter definition, there is of course no need to include the item key part in the Bulk User parameter's output:

# sudo tac /proc/user_beancounters | awk '$1=="uid"{for(i=4;i>=0;i--)h[i]=$(7-i);next}$6~/./{r=$(NF-5);for(i=0;i<5;i++)printf("[%s,%s] %d\n",r,h[i],$(NF-i))}'
[kmemsize,failcnt] 0
[kmemsize,limit] 1217787766
[kmemsize,barrier] 1217604266
--- SNIP --- SNAP --- SNIP --- SNAP ---

On the other hand could it be desired to provide different item keys by one Bulk parameter. So it could also be kept in the output but left out int the definition:

BulkUserParameter=1,sudo tac /proc/user_beancounters | awk ...
Comment by Strahinja Kustudic [ 2015 Oct 10 ]

This is similar to what I suggested in this comment in issue ZBXNEXT-103, but you forgot to how will these items be defined in the Web UI?

I thought about it from the last time I replied to the issue above and here is an idea how I would implement this. I wouldn't even create a special zabbix_agent.conf parameter like BulkUserParameter, but just use the existing one UserParameter and only add a new type of item in the UI called Zabbix Agent Bulk. This item type would define the user parameter which will be called on the agent to return multiple items. This item type would have everything as regular items, but in it you would need to define item keys which will be returned. The only thing I'm not sure is if this bulk item should also return a value for itself, if it fails to execute, or something.

Comment by Marc [ 2015 Oct 10 ]

kustodian,

it may appear similar but it's different. This request is neither about providing additional context information alongside an item's value (status, criticity, etc.) nor is it about returning separate values or key=value for a single item. A Zabbix item is atomic and that's a very good design decision in my opinion - performance wise and in terms of processability too.

The motivation of this request is to significant improve performance and efficiency in cases where lots of valuable information can be provided by a single custom call.

From my experience when extending Zabbix one has rarely complete distinct custom items. In Most cases one accesses a single interface that provides plenty of valuable information one likely wants to obtain by dedicated items:

Database metrics -> pgsql, sqlite3, db_stat, sqlplus, mysql,...
Hardware controller metrics -> MegaCli64, ...
BIND name server -> rndc
NTP time server -> ntpq
Device mapper -> multipath, multipathd
...

To name only some of them.

All of these commands may provide lots of information for separate Zabbix items in one run. Instead of calling them again and again for each custom item, this ticket requests a way to call them only once and provide their output in a cache for potential Item calls in the future. In fact the proper formatted output gets parsed to fill the cache.
These item calls based on dynamic supported item keys (keys from BulkParameterCache) do not need to issue any fork or command, access a file or do any other kind of processing.
They just have to "echo" what is currently in memory for a specific key.

Because this request does not affect any design principle of Zabbix, there is also no need to touch the Web frontend nor the way things get handled. It's a change at the very beginning of the process chain.

Comment by Strahinja Kustudic [ 2015 Oct 10 ]

I'm not sure I exactly understand how you suggestion works, but let me try to understand it. You would basically create an item in the UI which will call the openvz.ubc BulkUserParameter and that item will update the cache for all those items on the agent. Then you would create items in the UI which will have item keys like those which the BulkUserParameter updates in the agent cache, so if the server requests values for those items, the agent would send the data from the cache, without running any queries.

If that is the case, you are right, nothing would need to be changed the UI, but it would still be a little complicated to answer how are those items being updated, unless you know how the script works. Also what will the openvz.ubc item return to the server, since it has to return something, it cannot only update cache by echo-ing items/values?

Comment by Marc [ 2015 Oct 10 ]

Well, I've got several scenarios for a possible implementation in mind.

This is the one I currently favor for Bulk user parameter :

--- SNIP zabbix_agentd.conf ---
# BulkUserParameter=<key>,<interval>,<command>
BulkUserParameter=openvz.ubc,1,sudo tac /proc/user_beancounters | awk ...
--- SNAP zabbix_agentd.conf ---

Configuration of BulkUserParameter is considered on Zabbix agent (re)start only. The configured <key> becomes a supported item key.
The Zabbix agent's collector process executes the deposited <command> every <interval> minutes, parses its output and updates the corresponding cache entries.
The <command> its output could look like this:

[kmemsize,failcnt] 0
[kmemsize,limit] 1217787766
[kmemsize,barrier] 1217604266
--- SNIP --- SNAP --- SNIP --- SNAP ---

The corresponding cache representation could then look like this:

openvz.ubc[kmemsize,failcnt] 0
openvz.ubc[kmemsize,limit] 1217787766
openvz.ubc[kmemsize,barrier] 1217604266
--- SNIP --- SNAP --- SNIP --- SNAP ---

When the Zabbix agent gets a request for the item key openvz.ubc[kmemsize,limit], then it knows that openvz.ubc[*] is a Bulk user parameter, searches in the Bulk Parameter Cache for the related cache entry and returns the corresponding value. In this case 1217787766.

By this the actual custom command (Bulk User parameter) gets automatically executed once per minute and keeps current values for 100 different Items in memory (in this example).

Which item values finally get send/returned to the Zabbix server/proxy in which interval, is decided on Zabbix frontend, resp. by Zabbix server in dependency which items had been created. Another benefit is that one can decide/change on Zabbix frontend the interval for each such item.

The latter is by the way one of the reasons why I don't like to use zabbix_sender instead. When bulk sending item values by Zabbix sender, then all metrics get send to Zabbix server/proxy, regardless if they are configured or not. Further the update intervals are not individually changeable in Zabbix frontend but are the same to all items.

Comment by Marc [ 2015 Nov 14 ]

Similar idea but makes use of memcached:
https://github.com/crackmonkey/zbxcache

Edit:
Yet another example but makes use of a cache file:
https://bitbucket.org/sivann/runcached/

Special thanks to steki for dropping a comment that made me find that

Comment by Max [ 2016 Sep 20 ]

Hello everybody,

I think this issue is very important and concerns not only UserParameters, but all item types (simple checks, external checks, ssh checks, log monitoring, database monitoring etc). However, for SNMP checks it is implemented in BulkSNMP.
There are a lot of workarounds: temp files, per command or value cache etc. But they are all should be done manually and kept in mind by Zabbix users with a restriction of different item update intervals.

My suggestion is to split definitions: ITEMS and CHECKS. Items is a monitoring parameter, check is a collection method. Multiple items can be assigned to one check by string id or by order number in the check output with delimiters (depending on the check type).
I think this method is suitable for backward compatibility and less painful upgrade, because in current version there is one check for one item and we can easily convert all templates.
As for LLD checks it can also be done the same way. Every check can return one line (for usual checks) or several lines (for discovery rules and item prototypes). So one check for the whole LLD. For backward compatibility there can be additional check option - format: JSON or raw (with parsing rules).
Check output can be parsed by agent according to check settings and transferred in the "value" JSON sub-structure - no need to change protocol. Only additional check fields (such as multiline indicator, format and parsing rules) can be added to the server request (or response for active checks).
The only pitfall is item keys. They should be assigned to checks (not items), but what to do with trigger expressions... I see two ways:
1) during upgrade convert item keys to item names in trigger expressions and refuse item keys at all,
2) copy items keys to checks keys and make item keys to be in free format and used only in triggers.

In conclusion.
IMHO this architectural improvement must be done ASAP, because as later as more difficult.

Comment by dimir [ 2017 Mar 07 ]

Related issue: ZBXNEXT-1443

Comment by Max [ 2017 Mar 07 ]

dimir,
Sorry, but no, it is not related to ZBXNEXT-1443. It is absolutely different issues.

Comment by dimir [ 2017 Mar 07 ]

I would disagree. I understand that this issue is bit different but to me the idea is the same: one check results in multiple values for other items. Item pre-processing is the first step to that, so to me these issues are related.

Comment by Rostislav Palivoda [ 2017 Jun 07 ]

Please take care of server side testing - wiper

Comment by Vjaceslavs Bogdanovs [ 2017 Jun 08 ]

Server side is ready for testing in development branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-3006

Comment by Andris Zeila [ 2017 Jun 12 ]

(3) [I] Monitoring of the new processes must be added to the default Zabbix templates (zabbix[process,<process>,avg,busy] items with corresponding triggers.

vjaceslavs Added monitoring to templates. RESOLVED in r69199

wiper CLOSED

Comment by Vjaceslavs Bogdanovs [ 2017 Jun 12 ]

(4) [S] preprocessor_link_delta_items can be called with unitialized item configuration if item is not "preprocessable".

vjaceslavs RESOLVED in r69198, r69241

wiper CLOSED

Comment by Andris Zeila [ 2017 Jun 13 ]

(5) [S] Naming improvements:

  • using zbx_preprocessor_hold() (begin() ?), zbx_preprocessor_flush() instead of zbx_preprocessor_send_command() would easier to read.
  • would better to avoid using generic file names (worker, manager) where possible. Use preproc_worker, preproc_manager or something like that
  • all structures must be prefixed with zbx_, so that means using zbx_queue_t, zbx_queue_item_t, zbx_queue_iterator_t. Actually maybe it would be better to rename it to list (zbx_list_t, ...). We already have circural queue zbx_queue_ptr_t and the new queue is basically a singly linked list.

vjaceslavs RESOLVED in r69241

wiper CLOSED

Comment by Andris Zeila [ 2017 Jun 14 ]

(6) [S] item.dependent_items vector update is done by clearing and rebuilding the dependent item vector for all items.

This will not work with the configuration sync changes in trunk. Only the changed rows will be passed to DCsync_items(). There are few options:

  1. when adding/removing dependent item (or item's type changed to/from dependent) update its' master items dependent_items vector. This would probably require to generate itemid->master_itemid pair vector from configuration cache and synchronize it with itemid->master_itemid pairs selected from database (select itemid,master_itemid from items where master_itemid is not null).
  2. after syncing items (and before updating dependent items) iterate through config->items to reset dependent items.

The (1) is more optimal for partial configuration cache updates, but (2) is much easier to implement.

vjaceslavs RESOLVED in r69628

wiper CLOSED

Comment by Vjaceslavs Bogdanovs [ 2017 Jun 15 ]

(7) [S] Additional check was missing for LLD items that are not discovered anymore (items without lld_row should not be linked with parent as they will not be updated).
RESOLVED in r69272, r69273

wiper CLOSED

Comment by Andris Zeila [ 2017 Jun 16 ]

(10) [S] More naming improvements:
Preprocessor manager internal type names also must be prefixed with zbx_ (preprocessing_states, preprocessing_request_t, preprocessing_worker_t, delta_item_index_t, preprocessing_manager_t)

A 'dep' abbreviation is used when processing trigger dependencies. Vectors/arrays storing identifiers usually have ids suffix. And commonly the number of something has suffix _num. So dep_itemids in ZBX_DC_ITEM structure and dep_itemids_num, dep_itemids in DC_ITEM structure would be better match to existing names.

vjaceslavs RESOLVED in r69310

wiper CLOSED

Comment by Andris Zeila [ 2017 Jun 19 ]

(11) [S] Fixed dependent item copying when linking templates. Also minor code improvement, please review.
RESOLVED in r69343, r69349

vjaceslavs Thanks! CLOSED

Comment by Andris Zeila [ 2017 Jul 03 ]

(12) [S] Fixed possible memory leaks/corruption when setting the calculated value/error to agent result. Also cleaned up uninitalized memory errors reported by valgrind (might have been false positive though). Please review.
RESOLVED in r69755, r69757

vjaceslavs Thanks! Reviewed and CLOSED

Comment by Andris Zeila [ 2017 Jul 03 ]

(13) [S] Value type is used for type hinting during string to numeric conversions in trunk. While currently value type is not used by preprocessor it will be easier to merge if it was added to the manager->worker request package.

vjaceslavs RESOLVED in r69852 as a part or merge

wiper CLOSED

Comment by Andris Zeila [ 2017 Jul 05 ]

(14) [S] Another 'todo' when merging trunk - new preprocessing options were added to trunk. The location of item_preproc.c file was changed - you will probably need to merge the changes manually.

vjaceslavs RESOLVED in r69852 as a part or merge

wiper CLOSED

Comment by Andris Zeila [ 2017 Jul 05 ]

(15) [F] The dependent item graphs consists of dots without lines. I understand this is because delay period for dependent items is 0. For dependent items we will have to take the master's delay period (in worst case scenario we will need to travel dependent item chain to the 'root' item).

Moved to frontend task ZBXNEXT-3883 as (5).

CLOSED

Comment by Andris Zeila [ 2017 Jul 05 ]

Server side is tested, however due to the differences with trunk some things will have to be retested after merge.

Comment by Andris Zeila [ 2017 Jul 06 ]

(16) [S] Fixed few merge issues, please review r69865

vjaceslavs Thanks! Reviewed and added some minor fixes in r69869. RESOLVED

wiper CLOSED

Comment by Andris Zeila [ 2017 Jul 06 ]

(17) [S] In DCsync_items() the ids vector is used only to update dependent items in master items. My suggestion would be to use ptr vector where only new dependent items or dependent items with changed master items would be added.

vjaceslavs RESOLVED in r69868

wiper CLOSED

Comment by Andris Zeila [ 2017 Jul 07 ]

Server side (trunk merge) tested.
Frontend is under ZBXNEXT-3883

Comment by Dimitri Bellini [ 2017 Jul 12 ]

Hi Andris,
As i could see using SVN Trunk there is a new Item called "Dependent item", how can i test it? Is this feature planned for 3.4?
Thanks so much

Comment by Andris Zeila [ 2017 Jul 24 ]

Dimitri, it's marked for 3.4 in Fix Version/s (you should be able to see it above). However It doesn't seem to be merged in trunk yet.

Comment by Dimitri Bellini [ 2017 Jul 24 ]

Hi Andris,
Thanks, i have retested the feature but i did not understand how is working

Comment by Andris Zeila [ 2017 Jul 24 ]

Shortly - it allows to create items with another item as data source and using preprocessing options to extract required value. I'd suggest to wait for documentation.

Comment by Dimitri Bellini [ 2017 Jul 24 ]

I understood the concept but as you suggest i will wait documentation, thanks very much!

Comment by Andris Zeila [ 2017 Jul 27 ]

(23) [S] As preprocessing is done before values are stored into history cache the ZBX_FLAG_ITEM_FIELDS_PREPROC flag can be removed when getting items from configuration cache in dbcache.c:DCmass_update_items() function.

vjaceslavs Changed to ZBX_FLAG_ITEM_FIELDS_DEFAULT. RESOLVED in r70675

wiper CLOSED

Comment by Andris Zeila [ 2017 Aug 01 ]

(24) [S] At some point in the trunk we added step number to the preprocessing step failure message, something like Item preprocessing step #1 failed: <message>.
With preprocessing relocation to preprocessing manager and multiple merges this message seems to be lost. Now it simply says 'cannot apply multiplier "2" to value "x" of type "string": cannot convert value to numeric type'.

vjaceslavs RESOLVED in r71044

wiper CLOSED

Comment by Vladislavs Sokurenko [ 2017 Aug 01 ]

(27) [I] Proxy does not compile
./configure --enable-proxy --with-mysql

make[3]: *** No rule to make target '../../src/zabbix_server/preprocessor/libpreprocessor.a', needed by 'zabbix_proxy'.  Stop.
make[2]: *** [Makefile:673: install-recursive] Error 1
make[1]: *** [Makefile:471: install-recursive] Error 1
make: *** [Makefile:477: install-recursive] Error 1

wiper RESOLVED in r70796
vso CLOSED

Comment by Marc [ 2017 Aug 03 ]

Gents, please, please with sugar on top, re-consider sharing specs with the community from the very beginning.

This is just yet another example of missing the point, which could have been easily identified, if acting more open and transparent.

While the current implementation is possibly a nice feature, it has less to do with the demand behind ZBXNEXT-3006. Since the issue description/comments were obviously not clear enough, you can also take a look at the Eichhoernchen project to get a better idea of it.
The Eichhoernchen project does not yet provide a solution, but it properly addresses the origin demand.

Anyway, I appreciate your commitment!

Comment by Andris Zeila [ 2017 Aug 03 ]

As I understand the Eichhoernchen project it implements workaround for:

And this is a pity that with current Zabbix data flow architecture only one value of only one metric can be returned at a time.

This feature besides moving item preprocessing to separate processes, itroduces dependent items, allowing to use an item as data source for other items, so basically returning multiple metrics from an item.

Comment by Marc [ 2017 Aug 03 ]

As I understood the current implementation, a master item updates all dependent items at the time of the master items update.

If we keep the terminology, then this feature request was actually about a master item that may update dependent items with individual update intervals.

In the origin request the master item update interval would give the minimum usable update interval of dependent items. A smaller update interval of dependent items would be possible, but would of course not lead to a greater detail. But a larger update interval allows to lead to less item values and easier processing (rendering graphs, 3rd party API use cases, etc.)

This can make a huge difference when indeed going up to 999 items for thousands of hosts.

The Eichhoernchen project instead was even attempting to make this smart. Smart in the sense that the master data gets updated in dependency of dependent items. that's to say, update the master data only, if the the data does not exist yet, or a dependent items is about to be updated, where the item's update interval is smaller than the age of the master data.

The current implementation is rather similar to repetitive sending data for multiple items in a bulk by Zabbix sender, than to a cache that allows individual items, incl. individual item update intervals.

While I've to admit that the issue description didn't pointed out that individual update interval fact explicitly (to me it was an obvious fact of using a cache as described), it has been at least mentioned in the comments.

Having been able to follow the spec from the very beginning, could have helped to clarify that this is not addressing the actual need, resp. is a yet cool but different ZBXNEXT.

Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 03 ]

I think we can improve to make everybody happy, see ZBXNEXT-4016.

Comment by Dimitri Bellini [ 2017 Aug 03 ]

Hi Glebs,
I came back I supposed, from what said by Marc, that the main problem is clarification of what kind of implementation is on developing.
From what i read this kind of "feature" seem very good for example on API request, receive a big file with a lot of metrics (using the "Master Item"), extract what is need (i do not know how) and send to "child items". From my point of you seems to solve most of my needs but i did not know for other cases.
So the best thing is to provide (please, please, please) a better details of the implementations.
Thank very much

Comment by Andris Zeila [ 2017 Aug 08 ]

Released in:

  • 3.4.0beta3 r70992
Generated at Thu Apr 25 06:52:30 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.