[#ZBXNEXT-2972] Prediction validation and forecasting improvements

[ZBXNEXT-2972] Prediction validation and forecasting improvements Created: 2015 Sep 18 Updated: 2017 Sep 29
Status:	Open
Project:	ZABBIX FEATURE REQUESTS
Component/s:	Server (S)
Affects Version/s:	None
Fix Version/s:	None

Type:

New Feature Request

Priority:

Trivial

Reporter:

Glebs Ivanovskis (Inactive)

Assignee:

Unassigned

Resolution:

Unresolved

Votes:

Labels:

prediction, server, triggerfunctions

Remaining Estimate:

Not Specified

Time Spent:

Not Specified

Original Estimate:

Not Specified

Description

~~ZBXNEXT-922~~ will introduce basic predictive capabilities to Zabbix. Unfortunately trigger functions are supposed to return a single value but in case of statistical analysis and trend prediction there is much more information user would want to know and Zabbix would want to provide.

How certain and reliable the forecast is?
How good was the chosen fit function?
Are there any "breaks" in data and we should use shorter interval?
How big are data fluctuations?
...

Forecast validation trigger function would allow to choose best fit function or optimal interval "on the fly" or could be used to skip unreliable predictions and avoid "false positive" alerts.

Here I'd like to summarize ideas and resources on the ways how prediction validation can be done and how forecast() and timeleft() trigger functions may be improved.

Validation methods and criteria:

Book on forecasting:

https://www.otexts.org/fpp

Best fit selection:

"Break" detection:

https://en.wikipedia.org/wiki/Chow_test

More sophisticated forecasting algorithms:

A survey of online failure prediction methods:

http://dx.doi.org/10.1145/1670679.1670680

Predicting Resource Exhaustion with Double Exponential Smoothing

https://signalfx.com/blog/predicting-resource-exhaustion-double-exponential-smoothing/

Comments

Comment by David Lang [ 2016 Jan 05 ]

rrdtool implements the Holt-Winters Time Series Forecasting Algorithm for this purpose.

It lets you define additional lines on your graph for expected value and expected value +- X standard deviations. It then lets you do comparisons of the actual value with these calculated expected values and take action based on the result

This lets you do something like "Alert if the actual value is > 2SD away from the expected value" which can alert for 'traffic too high' at 3am on sunday for the same traffic level that generates a 'traffic too low' at 9am on monday.

It 'learns' the patterns. As I understand it, after ~10 cycles of the pattern it will be pretty close to accurate. so after a couple weeks you can rely on it's daily pattern, after a couple months it will notice weekends with great accuracy, etc.

some useful links for this

the original usenix paper and slides
http://usenix.org/legacy/publications/library/proceedings/lisa2000/full_papers/brutlag/brutlag_html/index.html
http://www.hpl.hp.com/news/events/csc/2005/jake_slides.pdf

info in it's implementation in rrdtool (also under GPLv2 so code can be copied directly)

http://cricket.sourceforge.net/aberrant/rrd_hw.htm
https://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html

Comment by David Lang [ 2016 Jan 05 ]

I think the most important thing is to introduce the concept of the forcast/trend prediction function and tracking.

Once the concept is in Zabbix, then implementing additional forecast types is much easier.

As I note in ~~ZBXNEXT-2463~~ the simple, brute force approach is to let the admin define the forecast function and then calculate the forecast value as the data arrives (with the option to go back over stored data) and store it just like you would any other data item.

The other option is to calculate the forecast as needed (for display, for trigger evaluation, etc). I suspect that for all but the most trivial forecast algorithms, it's going to be better to sacrifice the space to store the pre-computed data instead of computing it each time it's referenced.

I suspect that it's also probably less disruptive to the Zabbix codebase to pre-compute the data and have it available as 'just another item' than it is to do the computations on demand.

Comment by richlv [ 2016 Jan 05 ]

[email protected], just to be sure, have you looked at ~~ZBXNEXT-922~~ and its documentation ?

Comment by David Lang [ 2016 Jan 06 ]

I got to that after posting the comments above. I added comments to the ~~ZBXNEXT-922~~ ticket.

short version (and this summary may be better than what I posted)

there are going to be more types of calculations, the forecast and timeleft functions won't work if the calculation type needs different/additional parameters.

we need the ability to graph the prediction over time, as such, I think it makes sense to have a way to define a new item as being calculated from an existing one so that it can be graphed.

predictive values have both expected value and confidence level, so they are a compound value, not a single numeric value (for simple trends like forecast() and timeleft() currently support, confidence level isn't very meaningful, but for something like Holt-Winters the ability to plot (or trigger on) expected value +- 1SD is extremely valueable.

Generated at Sat Apr 27 02:24:30 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.

[ZBXNEXT-2972] Prediction validation and forecasting improvements Created: 2015 Sep 18 Updated: 2017 Sep 29