[ZBXNEXT-4190] make ZBX_ERROR not change item state Created: 2017 Apr 13  Updated: 2018 Oct 25

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Proxy (P), Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Change Request Priority: Trivial
Reporter: richlv Assignee: Unassigned
Resolution: Unresolved Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

zabbix agents can return ZBX_UNSUPPORTED or ZBX_ERROR.

ZBX_ERROR has been underused lately, but, according to same traces from old documents, it was used earlier to signal temporary failure that should not make the item unsupported :

The agent is also allowed to return ZBX_ERROR for recoverable errors. The Zabbix server should then poll the agent again whenever it wants to retrieve the item value.

this is currently only surviving in https://zabbix.org/wiki/Docs/protocols/zabbix_agent/1.4 , but it has come from the old wiki, which probably came from some other old document.

since then, the effect of ZBX_ERROR has changed and items are turning unsupported. it is suggested to make it match this old piece of documentation and only set the error message, but not make the item unsupported.
it would also make more sense, as now ZBX_UNSUPPORTED and ZBX_ERROR are largely identical.

this would be especially useful for script writers who sometimes would not have a value to return. they cannot return 0, as that would mess up the data. making the item go unsupported disables it for 10 minutes (by default). neither is a good choice.



 Comments   
Comment by Glebs Ivanovskis (Inactive) [ 2017 Apr 17 ]

Why is it a bug report and not a feature request?

I don't feel enthusiastic about bringing back old stuff, it must have been abandoned for a reason. It seems that everyone tends to avoid items becoming not supported. Therefore this seems to be an issue to fix.

Regarding scripts, if ZBXNEXT-3006 gets implemented and does not deviate dramatically from the current specification, it will solve this problem.

Comment by richlv [ 2017 Apr 17 ]

a) it currently mimics ZBX_UNSUPPORTED, just with less functionality; it was supposed not to make items unsupported
b) "It seems that everyone tends to avoid items becoming not supported" - sure, but that isn't always possible. let's say you have a script that sometimes has a temporary failure. how do you indicate that back to the server ? sending 0 is crap data. making it unsupported results in data loss.
c) not sure how ZBXNEXT-3006 is related - how would it help ?

on the other hand, why not fix this ?
it is a low hanging fruit to solve something users have been asking for a long time. it is a simple change, and can benefit users of all supported branches - nearly instant satisfaction instead of years of deliberating and not delivering.

Comment by Glebs Ivanovskis (Inactive) [ 2017 Apr 17 ]

a) Seems it does so for at least ten years...
b) Trapper items can be used.
c) We will see.

It is a difficult fix with many ambiguous choices to be made and many caveats which may result in crashes. What should be displayed in the frontend? How should nodata() behave? Should user be notified about ZBX_ERROR? Should item lastcheck be updated?

Comment by richlv [ 2017 Apr 17 ]

a) true, but that does not make it good
b) different usecase, different problems (straight line in graphs, needs an invoker and more)
c) eh.

thanks for raising the potential issues, will have a go at them.

1. displayed in the frontend - same as with ZBX_NODATA, error msg is updated (use the user-supplied msg, fallback to a default msg)
2. nodata() - should catch this, as the item would be missing data
3. user notifications - not needed, nodata() can catch that; can be considered later for internal events, but not a concern or a blocker
4. lastcheck - yes, that would seem to be sensible to be updated

note that these decisions, while important, aren't blockers. fixing this would solve an issue users have been having for a long time and does not require lots of effort (especially compared to other options).
this is one of the cases where i'd advocate moving quickly and getting something out for users - further polishing is very much optional.

one concern - you mentioned potential crashes. any detail on that risk ?

Comment by Glebs Ivanovskis (Inactive) [ 2017 Jun 14 ]

1. This will lead to a situation when item was an error message and is supported at the same time, something never seen in Zabbix, I guess.
2. Some people may use nodata() to catch network errors, behaviour you suggested won't cover this purpose.

This "check with no value" will most probably represented in Zabbix backend internals as NULL value - something not existing in Zabbix currently. And we need to get this precious NULL from agent (through proxy) to server and into database without accidentally dereferencing it... The closest match for this development I can recall is allowing numeric data types for log items and as I recall it caused a wake of server and proxy crashes in early 3.0 versions.

Comment by richlv [ 2017 Jun 14 ]

thank you for considering this.

1. true, but that alone would not be a huge concern. at most, it might have to be propagated to other pages (like latest data) eventually, but it is not critical
2. not sure why wouldn't it work ?

thank you a lot for explaining the crash concerns - it might need a few beta versions to weed out the most obvious issues. nevertheless, i would take slight risk, especially with non-lts 3.4, as it would make custom data injection in zabbix more pleasant.

Comment by Ivan Lezhnjov IV [ 2017 Oct 19 ]

"this would be especially useful for script writers who sometimes would not have a value to return. they cannot return 0, as that would mess up the data. making the item go unsupported disables it for 10 minutes (by default). neither is a good choice."

This is the case today for a bunch of my new UserParameters. Returning zero is not technically correct and the only way to do it is to actually avoid sending to Server anything at all. Which would not be an error condition. Just, no data.

Comment by Ivan Lezhnjov IV [ 2017 Oct 19 ]

Another reason why sending zero is not an acceptable solution (aside from it being a workaround), is because zero is an actual value which has real storage requirements. In case of prototyped items and large-scale installations this is simply unacceptable.

Comment by Glebs Ivanovskis (Inactive) [ 2017 Oct 23 ]

Dear richlv and iliv, here is how ZBXNEXT-3006 helps.

You can make your script return both data and error in a JSON, XML or some other format. You can set history storage period to 0 for this item. You can then create two dependent items with different preprocessing options: one will extract useful data from script output, the other will extract error message. If your script returns either data or error message one of dependent items will become not supported, but it won't affect master item collection in any way. No unnecessary information stored, no gaps in graphs, no "fake" zeros. Win-win-win!

Comment by richlv [ 2017 Nov 07 ]

Gleb, i like your thinking
it is a nice workaround, but a tiny bit excessive - doubles the item count. additionally, it decouples the message from the item, making it harder to look up, especially for new people.

like it a lot, thank you for the idea. the original request still highly desirable.

Comment by Glebs Ivanovskis (Inactive) [ 2017 Nov 09 ]

Hm, isn't this a duplicate of ZBXNEXT-152?

Comment by richlv [ 2017 Nov 12 ]

Gleb, this would most likely solve that feature request, but this one is about a specific change. there might be some usecases of ZBXNEXT-152, not covered by the change, proposed here.

Comment by Glebs Ivanovskis (Inactive) [ 2017 Nov 12 ]

Got it. Makes sense.

Generated at Wed Apr 24 09:09:32 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.