[ZBXNEXT-922] Trend computing function Created: 2011 Aug 25  Updated: 2016 Jan 06  Resolved: 2015 Oct 15

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: 1.8.6
Fix Version/s: 3.0.0alpha3

Type: New Feature Request Priority: Major
Reporter: Michal Humpula Assignee: Unassigned
Resolution: Fixed Votes: 27
Labels: patch, prediction, trends
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File zbx-trend.patch    
Issue Links:
Duplicate
is duplicated by ZBXNEXT-40 forecast trigger function Closed

 Description   

In the attachment is a dirty hack that adds support for trend computation based on function evaluate_SUM. The function does linear regression on the data an returns the trend of the fitted function.

Right now I'm using it to predict when the space on disk will fill:

1) zabbix agent item: vfs.fs.size[/,pfree]
2) calculated item that computes time to live in hours named "vfs.fs.size[/,ttl]":
-last("vfs.fs.size[/,pfree]")/(trend("vfs.fs.size[/,pfree]",21600)*3600-0.000001)
3) trigger
{vfs.fs.size[/,ttl].last(#1)}>0&{vfs.fs.size[/,ttl].last(#1)}<48

Specification https://www.zabbix.org/wiki/Docs/specs/ZBXNEXT-922



 Comments   
Comment by Bart Verwilst [ 2012 Mar 06 ]

Even being a self-proclaimed dirty hack, i think trending support offers a far better way of for example disk space checking than simple '10% left' messages during the night ( when 10% can be 500GB ). Could this be looked into a bit more for 2.0?

Comment by Michal Humpula [ 2012 Mar 06 ]

In the meantime I've worked a little bit on Lua extension, which would basically put the end to this neverending extensioning of zabbix server. I suppose it's a little bit too much to ask the Lua patch integration instead?

The main problem with this patch is the get_table_by_value_type() function, which is no longer there. It was replaced by the DBget_history(), which unfortunately is now returning only values and from my point of view, the patch extending this function to return the timestamps also, is much more invasive then simple eval function patch.

Though I'm opened to suggestions, how to overcome this. I don't want to use the same hack with SQL "concat" as here

https://github.com/mihu/zabbix-lua/blob/lua-dev/src/libs/zbxlua/lua_zbxitem_lib.c#L240

Comment by Bart Verwilst [ 2012 Mar 07 ]

Alexei, is this something that might be of interest for Zabbix to have natively ( the trending i mean )?

Comment by richlv [ 2012 Mar 11 ]

there's always interest in more native functionality, but the development time is limited

Comment by Michal Humpula [ 2012 Mar 11 ]

If you mean that 2.0 will be out soon, then goooood If you mean developers time, then it's probably true for in company ones, less for outsiders though.

I would welcome some suggestion how to cleanly (read: from your point of view) go around DBget_history limitation, which does not return timestamps. I can then rework the patch to more plausible way.

Comment by richlv [ 2012 Mar 13 ]

there's a hope that 2.0 will be out sooner than later, as usual
i'm not a dev, though, so can't really give any meaningful feedback

Comment by Michal Humpula [ 2012 Mar 13 ]

Guy's, common, it's pretty difficult with this attitude to contribute to zabbix I don't mind rewriting the complete thing, but someone needs to tell me, what would be acceptable change

Comment by Bart Verwilst [ 2012 Jul 27 ]

Somebody tell the guy!
Please add a dev to this thread who can discuss a possible implementation.. You should grab the opportunity to work with volunteers like this with both hands.

Comment by richlv [ 2012 Aug 10 ]

i'm very sorry that i can't help with this in any way... the best i can do is tell about the coding guidelines at https://zabbix.org/wiki/Docs/specs/coding_style - not that it helps in this case :/

Comment by Michal Humpula [ 2012 Aug 10 ]

that sucks:o) Would be nice to work on zabbix codebase from time to time, but it kinda cools down the motivation if there is no response from code gatekeepers.

Comment by Alexei Vladishev [ 2012 Aug 10 ]

Michal,

Please give us details of the of the algorithm you used for trend prediction. A pointer to external is fine as well. Thanks.

Comment by Michal Humpula [ 2012 Aug 10 ]

well, it's basically http://en.wikipedia.org/wiki/Linear_regression , nothing fancy. It needs both the "time" and "value" values which is not very fytting with the current code base (the rest of the functions are interested only in values).

Comment by richlv [ 2012 Aug 13 ]

if possible, could you give a simple explanation for a non-coder ?

a) what exactly does trend() return ?
b) what exactly are all the values in your calculated item example ?
c) which value is the "threshold" on (i assume there must be one) ?
d) why not have a single trigger function to which you could pass time/values to consider and threshold, which would then return hours (or whatever units) ?

Comment by Michal Humpula [ 2012 Aug 13 ]

a) well, the slope of fitted linear function. In term of wiki page this is Beta
b) the example is for determining when will the disk fill. It does by computing the trend (aka how fast the space is shrinking) then using division to get the TTL in seconds. So by parts...

last("vfs.fs.size[/,pfree]") - how much space do we have left
trend("vfs.fs.size[/,pfree]",21600) - use last six hours to determine the trend
3600 - is just to change the scale to hours
-0.000001 - is nasty hack to no trigger division by zero

All and all, it's nasty hack, but with least invasive form to codebase.
c) threshold? After computing how much hours I have left, all I need is setup described trigger to monitor that value.
d) because it's less general. By doing, what you suggest, zabbix will etheir refuse to accept new functions soon or will have a tons of specialized functions. That's why I afterwards focus on lua extension. Because it's enabling administrator to fully express what value he is interested in, or what status he consider to be a problem. Statistical functions provided by zabbix are great but are by definition limited. Consider creating "soft triggers". Yes, there exist a solution for that, but it's a hack. Zabbix core nicely matured to a great monitoring engine. To enable more features further, I think that scripting language (whatever one) is the only logical way for further evolution.

Sorry for the unrelated discussion here:o)

Comment by richlv [ 2012 Aug 13 ]

thanks. after some quick discussion, it seems that the functionality is real nice, but not very user friendly
i suspect that explaining this to users and then requiring them to create such complex configuration would be impossible, thus negating the "generalness" benefit.
what do you think about a much more specialised function that would accept time period/values, timeshift, threshold ?
it could return time remaining in seconds, or 0 if we are moving away form the threshold.
while much more specific, i can actually see users using it

another function could go the opposite way and accept date/time, then return "estimated" value, but that's for later
ah, and of course, lua plugin/extension could be the more long term approach.

Comment by Michal Humpula [ 2012 Aug 13 ]

In general I agree. So yes, I could see something like

time_to_depletion(time_sample, fill_value)

time_sampel (interval in seconds or #count of samples to analyze)
fill_value (what is the max value, that can be reached. Default could be 0 or 100)

The return value could be in seconds or more usable probably would be float representing the hours (or the third parametr telling in which units return result).

The remaining problem is, how to program this nicely. As mentioned above. Unlike the rest functions, this one needs also the "x" time values. So the patch get's eventualy a little bit messy.

Comment by Ronald Rood [ 2015 Mar 06 ]

Any news on this subject?
The ability to trigger on a resource that will be running out within X time is way more valuable than a stupid bark that say's it's filled for 80%.

Is the patch still usable for 2.4.4 ?

Comment by Glebs Ivanovskis (Inactive) [ 2015 Aug 10 ]

Feature is ready for testing in development branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-922 (revision 54827)

Comment by Aleksandrs Saveljevs [ 2015 Aug 10 ]

(1) Specification says that timeleft() returns -1 if threshold cannot be reached. However, users will make trigger like this:

{host:item.timeleft()} < 1d

If timeleft() returns -1 for unreachable thresholds, then this trigger will trigger. Maybe timeleft() can return infinity in such cases?

glebs.ivanovskis This makes sense. Specs updated and changes were made in revision 55485. RESOLVED

sandis.neilands CLOSED. Result set to DB_INFINITY.

Comment by Glebs Ivanovskis (Inactive) [ 2015 Aug 11 ]

(2) New translatable strings:

"Fit"
"Threshold"
"Forecast for next t seconds based on period T is < N"
"Forecast for next t seconds based on period T is = N"
"Forecast for next t seconds based on period T is > N"
"Forecast for next t seconds based on period T is NOT N"
"Time to reach threshold estimated based on period T is < N"
"Time to reach threshold estimated based on period T is = N"
"Time to reach threshold estimated based on period T is > N"
"Time to reach threshold estimated based on period T is NOT N"

sasha CLOSED

Comment by Sandis Neilands (Inactive) [ 2015 Sep 08 ]

(3) Considering that the correct use of these functions relies on users having proper understanding of the underlying statistical model's assumptions, limitations as well as limitations imposed by our implementation it is important to briefly discuss the following topics in our documentation.

Guide
0. Visualising the predictions.
1. Determining which function (fit) to use.
2. Choosing appropriate interval.
3. Reliability of the predictions (and how it's related to interval).
3.1. Full interval of "good fit" is needed for the prediction to be reliable
4. When and why to use other mode than "values" (in case of variance in the data spikes can reach limit before forecast?).
5. When and why to use time shift?

Reference
1. The exact formulas used in the calculations (so that they can be reproduced with other tools).
2. Limitations of the implementation.
2.1. Trends data is not used.
2.2. Value of epsilon.
2.3. Min, max values of the returned floats.
2,4. Maximum number of iterations when finding polynomial roots.
2.5. Types of mathematical errors to expect and suggestions on next steps to recover from them.
2.6. Cross validating results from Zabbix (getting out the data and coefficients).

glebs.ivanovskis Little explanatory HowTo covering these questions attached to specification page. Here is direct link to the file. RESOLVED

sandis.neilands CLOSED. martins-v should proof-read it before official publishing.

Comment by Sandis Neilands (Inactive) [ 2015 Sep 08 ]

(4) In case of mathematical error returning a value from the function's codomain to designate an error is not the best approach. The user cannot determine if the returned value is an error or valid prediction. In such cases the item should become "Not Supported" with a proper error message.

Note that -infinity and infinity cannot be used for the following reasons.
1. We cannot determine which one of them to use since the trends can go towards either of them.
2. There are limitations on the maximum value of the FLOAT imposed by the database engine used.

On the other hand this will be a problem when linear fit is used and the line is completely horizontal, for example y = 0x + 3. In that case time to reach y = 5 is infinity but y = 0 --> -infinity. But maybe this case is not so common in real life to warrant special attention.

glebs.ivanovskis For timeleft() error code -1 is totally ok, since timeleft() may return only 0 or positive values normally. If calculated time to reach threshold is negative (like in your example) we are going away from threshold and will never reach it, therefore timeleft()=+infinity (or the largest number representable as NUMERIC(16,4) to be more precise). Forecast is much trickier. "Not supported" will effectively switch the prediction off for 10 minutes which can be a bad thing. We can either stick with -1 or consider -infinity as error code. In the latter case error check would look like forecast() < some_reasonable_negative_number, because in most practical cases forecast below some_reasonable_negative_number does not make much sense anyway.

Actually, in revision you tested (r54827) functions became not supported in case of any error. Changes in r55600, r55604 and r55612 made the behaviour fully compliant with Specification:

  • in case of wrong item type, invalid parameters or value cache failure functions become not supported with corresponding error message being shown in frontend;
  • if there are no data or for some reason we cannot make a forecast based on these data (mathematical operations undefined or numerical complications) functions return special error code value (-1) to signal error state and error message is printed in log at DebugLevel=4.

In the first case user needs to revise his trigger expression. In the second situation forecasting can recover on its own as new data comes into forecasting interval. However, if user permanently gets -1 and warnings in log file, something needs to be done with data or trigger expression to make things work. This issue will be addressed in documentation.

RESOLVED

sandis.neilands For the sake of consistency it would be nice to set error message in all evaluate_* functions when zbx_vc_get_value* fails. Currently it is only done in the new forecasting functions.

glebs.ivanovskis Probably it's a better idea to make global rethinking of trigger functions in a separate issue. Created ZBX-9913 for this purpose.

sandis.neilands I agree. CLOSED.

Comment by Sandis Neilands (Inactive) [ 2015 Sep 08 ]

(5) It should be possible to get out the calculated coefficients so that they can be used in other tools. Also it would be useful for troubleshooting.

glebs.ivanovskis Done in r55634. RESOLVED

sandis.neilands CLOSED.

Comment by Sandis Neilands (Inactive) [ 2015 Sep 08 ]

(6) We have unnecessary amount of strcmp() function calls. We can do the parameter parsing once and then use enum or literals through #defines. Should improve readability and performance.

glebs.ivanovskis Problem finally solved in r55586. RESOLVED

sandis.neilands CLOSED.

Comment by Sandis Neilands (Inactive) [ 2015 Sep 08 ]

(7) I didn't find a place where ZBX_VALID_MATRIX() is needed in normal operation. If you keep it then add THIS_SHOULD_NEVER_HAPPEN macro in all instances when it fails.

glebs.ivanovskis Applied your suggestions to catch invalid matrices and several other possible programmer's errors in r55606. RESOLVED

sandis.neilands CLOSED with minor style change in r55648.

Comment by Sandis Neilands (Inactive) [ 2015 Sep 08 ]

(8) You don't have to #undef your macros unless they are redefined within the same C file. In C the macro scope is per translation unit.

glebs.ivanovskis To #undef or not to #undef... Both approaches coexist in current Zabbix code and there is no Guideline on this. See improvements in r55606. RESOLVED

sandis.neilands CLOSED. The convention for #undef is to use them to limit macro scope. For example, if macro is defined in function it should not be available outside of it. Other uses are mainly for various tricks.

Comment by Sandis Neilands (Inactive) [ 2015 Sep 08 ]

(9) Please move code '*result != *result' to a separate function (zbx_isnan()?) in order to avoid confusion. Apparently C99 isnan() is not well supported in certain legacy systems.

Search for isnan() in GNU Autoconf manual for more info.

glebs.ivanovskis Addressed this in r55585. RESOLVED

sandis.neilands CLOSED. Parenthesized all parameter names in the new macros in r55647.

Relevant CERT recommendations.

Comment by Sandis Neilands (Inactive) [ 2015 Sep 18 ]

(10) Check the case in zbx_forecast() when prediction horizon (time) is 0.0 and mode is 'value'. In this case we'll return garbage (result is not initialized). Should we return error?

glebs.ivanovskis MODE_VALUE is checked separately.

sandis.neilands As you rightly pointed out after the patch - case with 'value' mode is handled. The only way to get uninitialized result from the function is if mode is not from the enum range (which is programmer error, compiler error. memory corruption). Consider either adding THIS_SHOULD_NEVER_HAPPEN, or initializing either result or ret (or both).

glebs.ivanovskis Please take a look at my changes in r55668. I placed THIS_SHOULD_NEVER_HAPPEN wherever it is possible to use new functions in the future incorrectly. I also changed the way we handle such cases for more consistency. Followed your example from r55659. RESOLVED

sandis.neilands CLOSED. Fixed passing of uninitialized 'k' usage in r55679.

Comment by Sandis Neilands (Inactive) [ 2015 Sep 18 ]

(11) In both evaluate_FORECAST() and evaluate_TIMELEFT() we call prediction functions at the end within zbx_snprintf() macro. In these calls we use "zero_time" structure which is only initialized if we got at least one value from the value cache. Does it make sense to call prediction functions in this case at all? If it does then we have to initialize "zero_time" properly.

Also "k" is sometimes passed uninitialized (but we don't look at it in the prediction functions in those cases but still...).

glebs.ivanovskis Let's save one function call in case of "no data available". We need "k" in case of polynomial fit only. Let zbx_fit_code() initialize it with 0 for other "fit" values. See r55665. RESOLVED

sandis.neilands CLOSED.

Comment by Glebs Ivanovskis (Inactive) [ 2015 Sep 21 ]

Fixed in pre-3.0.0alpha3 (trunk) r55689. Improvements in r55700.

Comment by Aleksandrs Saveljevs [ 2015 Sep 22 ]

(12) In trunk there is now the following warning:

evalfunc.c:2056:44: warning: passing 'unsigned int *' to parameter of type 'int *' converts between pointers to integer types
      with different sign [-Wpointer-sign]
                                SUCCEED != zbx_fit_code(fit_str, &fit, &k, error))
                                                                       ^~
../../../include/zbxalgo.h:356:54: note: passing argument to parameter 'k' here
int     zbx_fit_code(char *fit_str, zbx_fit_t *fit, int *k, char **error);
                                                         ^
1 warning generated.

glebs.ivanovskis Warning silenced in r55693. Please review stylistic changes in r55695 as well. RESOLVED

sandis.neilands CLOSED.

Comment by richlv [ 2015 Sep 23 ]

(13) documentation

glebs.ivanovskis Updated What's new section and the list of supported trigger functions, added new section: https://www.zabbix.com/documentation/3.0/manual/config/triggers/prediction
Additional tips, tricks, explanations and reference: http://zabbix.org/mw/images/1/18/Prediction_docs.pdf
RESOLVED

martins-v Reviewed.

Some changes on my part:

Please review the changes and close this subissue if satisfied.

glebs.ivanovskis I am perfectly happy with your changes. Feel free to poke me if pdf needs corrections. CLOSED

Comment by Aleksandrs Saveljevs [ 2015 Oct 05 ]

(14) [D] Documentation for forecast() trigger function gives the following examples:

⇒ forecast(#10,,3600) → forecast of item value after one hour based on last 10 values
⇒ forecast(3600,,1800) → forecast of item value after 30 minutes based on last hour data
⇒ forecast(3600,86400,43200) → forecast of item after 12 hours based on one hour one day ago
⇒ forecast(3600,,600,exponential) → forecast of item value after 10 minutes based on last hour data and exponential function
⇒ forecast(3600,,7200,polynomial3,max) → forecast of maximum value item can reach in next two hours based on last hour data and cubic (third degree) polynomial

It would be better to give examples using unit suffixes like "forecast(1h,1d,12h)" - they are more readable and give users examples they should really follow. Writing "forecast(3600,86400,43200)" is not the best practice.

martins-v RESOLVED in https://www.zabbix.com/documentation/3.0/manual/appendix/triggers/functions

asaveljevs Wondeful! CLOSED.

Comment by Aleksandrs Saveljevs [ 2015 Oct 05 ]

(15) [D] https://www.zabbix.com/documentation/3.0/manual/appendix/triggers/functions does not seem to link to https://www.zabbix.com/documentation/3.0/manual/config/triggers/prediction .

martins-v RESOLVED in https://www.zabbix.com/documentation/3.0/manual/appendix/triggers/functions

asaveljevs Very good! CLOSED.

Comment by Aleksandrs Saveljevs [ 2015 Oct 07 ]

(16) [D] The PDF at http://zabbix.org/mw/images/1/18/Prediction_docs.pdf mentions that prediction functions may be a bit expensive to compute:

We stop iterations when all roots are good enough (it takes 10–15 iterations in more or less simple situations) or we hit iteration count limit—200 iterations.

Can extensive use of prediction functions significantly affect Zabbix performance? Should we mention it in the notes?

martins-v I'll leave it to glebs.ivanovskis to deal with [16] and [17].

glebs.ivanovskis Uploaded new version of PDF file with new subsection addressing this question. In vast majority of applications (default "fit" and "mode", reasonable amount of item values in the interval) performance impact isn't big at all. RESOLVED

asaveljevs I have fixed some wording ("due to" -> "because") in R573. Please take a look. Apart from that, a wonderful description on performance!

glebs.ivanovskis Thanks! Reflected your changes in the file on zabbix.org. CLOSED

Comment by Aleksandrs Saveljevs [ 2015 Oct 08 ]

(17) [D] The PDF incorrectly refers to items in trigger expressions. For instance, the following

host:item.forecast(10m,,5m,,max) > limit

should instead be

{host:item.forecast(10m,,5m,,max)} > limit

glebs.ivanovskis Thanks for pointing this out! Fixed in current version of PDF. RESOLVED

asaveljevs Looks good. CLOSED.

Comment by Aleksandrs Saveljevs [ 2015 Oct 09 ]

(18) [D] The new predictive functions are a bit different from other functions we had before and we might wish to write a little guide on how to deal with corner cases. For instance, there are at least the following scenarios to cover:

(a) The new functions return -1 when there is a calculation problem. This is different from other functions, which simply fail to calculate (in which case the trigger would fail to calculate, too, but would keep its OK or PROBLEM state). What is a good way of handling this -1?

Suppose we have the following trigger:

{host:item.timeleft()} < 1h

It will become PROBLEM if timeleft() is either 30m or -1. This approach does not allow to distinguish between a valid return value and an error code. Therefore, it might be tempting to change the trigger expression:

{host:item.timeleft()} < 1h and {host:item.timeleft()} <> -1

However, suppose a trigger is currently in the PROBLEM state. Suddenly, timeleft() is -1, so the trigger will become OK! So how do we handle -1 properly? For forecast() function, for which -1 is both a valid return value and an error code, is there a way to distinguish them?

(b) Suppose we have a disk space that is steadily decreasing: 500 MB, 499 MB, 498 MB, 497 MB, etc. Suppose we also have the following trigger:

{host:item.timeleft(#10,,100M) < 1h}

When disk space is 105 MB, the trigger becomes a PROBLEM. It stays a PROBLEM when the disk goes 104 MB, 103 MB, 102 MB, 101 MB, 100 MB, and when it goes 99 MB, the trigger suddenly becomes OK! This may be unexpected for users, because they are used to triggers going back to OK when the problem is solved, and should be mentioned. We may also mention that predictive functions are an addition to the other functions, not their replacement.

asaveljevs Should we mention that -1 is rare? Also, how about the following expression for (a):

{host:item.timeleft()} < 1h and ({TRIGGER.VALUE}=0 and {host:item.timeleft()} <> -1 or {TRIGGER.VALUE}=1)

This is meant to guard against trigger state changes when timeleft() returns -1.

glebs.ivanovskis I am kneeling in front of your trigger-expression-fu!
Please review https://www.zabbix.com/documentation/3.0/manual/config/triggers/prediction
I would not claim that -1 is rare, it can be caused by "no data available" which can be pretty common.
RESOLVED

asaveljevs Wonderful! CLOSED.

Comment by Alexander Vladishev [ 2015 Oct 24 ]

(19) "Undefined index" in trigger form when forecast() function used in the expression.

Expression constructor must be opened to reproduce the issue.

trunk@56347

    Undefined index: type [triggers.php:371 → getTriggerFormData() → analyzeExpression() → buildExpressionHtmlTree() → expressionHighLevelErrors() → get_item_function_info() in include/triggers.inc.php:2165]

sasha Moved to ZBX-9992. CLOSED

Comment by David Lang [ 2016 Jan 05 ]

I realize that this has been implemented, but given the number of possible predictive algorithms out there (specifically thinking of Holt-Winters but applicable to others) that don't just produce a single value, but a value plus confidence info so that you can do things like "is the current value within +- 1 Standard Deviation of what was expected", I'm not sure that the idea of having a single forecast function is the best way to go.

It may be that the current function should be forecast_simple() or something like that so that other forecast functions can be added that take different parameters.

I think there is also great value in seeing the forecast over time (so you can see how accurate the prediction is historically), so there should be a way to define a data item as being the result of a forecast_*() or timeleft() function. Ideally the data item would store both the computed value and computed confidence so that the entire thing would not need to be re-generated from scratch if you decided that you wanted +2 standard deviations instead of + 1 standard deviation.

If there is a new data type (value + confidence) used, then it would make sense to have a function to use in triggers to evaluate the fit of the current data to the computed value rather than having to do multiple comparisons to check that the current data is < predicted +x and > predicted -x

Comment by Glebs Ivanovskis (Inactive) [ 2016 Jan 06 ]

It is possible to implement Holt-Winters model in Zabbix if there is a demand for it (will probably look like forecast(...,...,...,holt-winters,...)). But please come back to earth, Holt-Winters algorithm is not a "magic bullet" that will bring you a trouble-free life. Neither is any other prediction algorithm.

Even a humble linear regression will produce you a value plus confidence. The problem is that within Zabbix there is a rule "one trigger function - one value". We could add one more parameter to forecast() function to make it return value, confidence interval or whatever, but we did not want to overload user with necessity to specify six or more parameters for a single function.

You can define a calculated item as being the result of forecast. You can plot it against your original data and see how accurate predictions were historically. Please read the documentation more thoroughly, there are examples.

Generated at Wed Apr 24 19:12:18 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.