[ZBX-13309] Problem/Recovery time is wrong Created: 2018 Jan 08  Updated: 2024 Apr 10  Resolved: 2018 May 29

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Documentation (D), Frontend (F)
Affects Version/s: 3.4.5
Fix Version/s: 3.4.8rc1, 4.0.0alpha5, 4.0 (plan)

Type: Problem report Priority: Trivial
Reporter: Giorgio Biondi Assignee: Andrejs Griščenko
Resolution: Fixed Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Redhat 7.4


Issue Links:
Causes
causes ZBX-13728 Frontend - Incorrect problem duration... Closed
Duplicate
is duplicated by ZBX-13729 Documentation - missing information o... Closed
Sub-task
part of ZBX-13197 Wrong SLA calculation Need info
Team: Team B
Sprint: Sprint 28, Sprint 29, Sprint 30, Sprint 31, Sprint 32, Sprint 33, Sprint 34, Sprint 35
Story Points: 1

 Description   

In the problem view, I have noted time of recover often is before time to detect issue. Like this:

Time Severity Recovery time Status Info Host Problem Duration Ack Actions Tags
13:01:30 Average 13:01:07 RESOLVED   x.xxx.it Zabbix agent on x.xxx.it is unreachable for 5 minutes 23s No    


 Comments   
Comment by Glebs Ivanovskis (Inactive) [ 2018 Jan 08 ]

Do you mean that it takes this trigger 5 minutes after agent becomes unreachable to go into problem state, but it takes only 23 to get back to OK? If yes, what would be your suggestions to fix the trigger?

Comment by Giorgio Biondi [ 2018 Jan 08 ]

Hi Glebs,
from my point of view, this view must have some problem. Is surely wrong recovery time before detect time. For me if the problem is detected to 13:01:30 the time for tag 'resolved' must be after some second or minutes.. After this I don't known Zabbix architecture.. and so I can't give you idea for solve this trouble.
In add.. I have note this only with this version.. V3.4.5 -neve seen before this release. And have some alarm like this "Zabbix agent is unreachable for 5 minutes" and also this never seen before this version.

All the best.

Giorgio Biondi.

Comment by Glebs Ivanovskis (Inactive) [ 2018 Jan 08 ]

Ah, got it. Can you provide steps to reproduce the issue?

Comment by Giorgio Biondi [ 2018 Jan 09 ]

Hi Glebs,
no step need.. I have host linux with agent 3.4.2 AND 3.2.0 and this behaviur random on all hosts.. but always with error "Zabbix agent is unreachable for 5 minutes"...

Comment by Glebs Ivanovskis (Inactive) [ 2018 Jan 09 ]

Managed to reproduce with nodata() trigger based on trapper item by sending value with timestamp from the past when trigger is PROBLEM.

I think what happens is that nodata() trigger gets calculated by timer process and generates an event with current time as a timestamp, then it gets recalculated when the item gets new value (by history syncer) and the timestamp of item value is used as recovery event timestamp. So the recovery will always be before problem, which looks confusing.

Also frontend miscalculates Duration, looks like it takes the absolute value.

Comment by Giorgio Biondi [ 2018 Jan 19 ]

Hi at all..

I have installed new version V.3.4.6 and the problem is the same.. The issue is resolver BEFORE occurred..

06:11:30 Average 06:11:26 RESOLVED

Comment by Giorgio Biondi [ 2018 Jan 23 ]

Hi,

please edit the version affected.. please add also V3.4.6

Best regard.

Comment by Glebs Ivanovskis (Inactive) [ 2018 Jan 23 ]

Quoting reporting guidelines:

If an issue is reported against some version, it is supposed to affect that version and all later versions, unless noted otherwise.

Comment by Glebs Ivanovskis (Inactive) [ 2018 Feb 06 ]

Most likely solution will be to document this behaviour in more detail and definitely fix duration calculation in the frontend.

Comment by Andrejs Griščenko [ 2018 Feb 27 ]

(1) No translation string changes.

sasha CLOSED

Comment by Andrejs Griščenko [ 2018 Feb 27 ]

RESOLVED in svn://svn.zabbix.com/branches/dev/ZBX-13309

Comment by Andrejs Griščenko [ 2018 Mar 14 ]

Fixed in:

  • 3.4.8rc1 r78625
  • 4.0.0alpha5 (trunk) r78626
Comment by Andrejs Griščenko [ 2018 May 28 ]

Let's consider one of the most common situations when problem recovery time is displayed before problem start time.
There is a proxy that is monitoring an item and is constantly sending data to a server. After some time occurs an unexpected network error. The server is no longer receiving data about the item and item.nodata() trigger is fired. The proxy keeps collecting data for the server. When network error is fixed, proxy continues to send data to the server and server marks problem as resolved. In this case resolve event will be created with time when next item data was received by proxy, and it will be earlier, than problem was created. And such case will result in a negative problem duration.
Other common situation would be, when resolve item data for resolve event would be sent by Zabbix sender and will contain timestamp earlier than problem creation time.

It is proposed that similar examples should be added to the documentation.

martins-v Great, thanks for these examples.

Comment by dimir [ 2019 Oct 28 ]

Documentation:

  • Examples of possible negative problem duration added to documentation for 3.44.0.
  • Also mentioned in what's new for 3.4.8 that negative values can be displayed

How does negative duration affect Reports -> Availability report and SLA calculation in services:

 

Generated at Fri Apr 26 09:51:17 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.