[ZBX-13197] Wrong SLA calculation Created: 2017 Dec 14 Updated: 2023 Oct 07 |
|
Status: | Need info |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | API (A), Frontend (F) |
Affects Version/s: | 3.4.4 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Trivial |
Reporter: | Grzegorz Grabowski | Assignee: | Zabbix Development Team |
Resolution: | Unresolved | Votes: | 6 |
Labels: | services, sla | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Centos 7, MariaDB. Full updated. |
Attachments: |
![]() ![]() ![]() ![]() ![]() |
||||||||||||||||||||
Issue Links: |
|
Description |
Don't know how to reproduce it, but we've got situation you can see on attachment. |
Comments |
Comment by Olegs Vasiljevs (Inactive) [ 2017 Dec 14 ] |
Hello Grzegorz! You can search for multiple lines with value="0" from query below. It would look like consecutive entries with value="0" going one after another. That is, in order to fix the issue. select from_unixtime(sa.clock), sa.* from service_alarms sa where serviceid = <problematic service> order by clock DESC limit 1000; Have you installed 3.4.3 when it was available or gone straight to 3.4.4? When were these recent updates installed? The reason why I'm asking is that this issue was addressed in Regards, |
Comment by Grzegorz Grabowski [ 2017 Dec 14 ] |
There was no update from 3.4.3 to 3.4.4. Te upgrade was from 3.2.9 to 3.4.4-1 and then to 3.4.4-2 (from repo). |
Comment by Olegs Vasiljevs (Inactive) [ 2017 Dec 14 ] |
When was the upgrade made from 3.2.9 to 3.4.4-1 and then to 3.4.4-2 (from repo)? From the screenshot attached - first and second line indicate the beginning of an issue. Event recovery and problem states were written in reverse order. This may have happened due to Regards, |
Comment by Vladislavs Sokurenko [ 2017 Dec 14 ] |
At that time there should have been item that caused problem and then recovery after 8 seconds, could you please provide information what type of item it was. Also if possible, can you please provide history for this item near that time ? 11:53:22 - 11:53:30 select * from history_uint where itemid=<your item id> and clock=1512384810; select * from history_uint where itemid=<your item id> and clock=1512384802; It would be nice to have events for that trigger as well. select * from events where objectid=<your triggerid>; this data should also be visible through frontend. |
Comment by Grzegorz Grabowski [ 2017 Dec 14 ] |
Ok, I will, but you have to wait a little bit. When I saw this issue, I removed the item (service child node) and recreate it. |
Comment by Vladislavs Sokurenko [ 2017 Dec 14 ] |
was it agent passive check ? Could it be that first value came with time stamp11:53:30, while next one with 11:53:22 ? |
Comment by Rostislav Palivoda [ 2018 Jan 31 ] |
Any updates? - mbsit |
Comment by Vladislavs Sokurenko [ 2018 Jan 31 ] |
This one might be related: |
Comment by Christian Anton [ 2018 Mar 05 ] |
Having the same problem here. From my point of view, it definitely has to do with Apparently, a Service's SLA calculation goes through the events one-by-one in the timeline. Let's assume we have two problems for the trigger this service depends on, both 5 minutes of duration, one at some day at 9, and the other one day later, also at 9, where the first of the problems is one of such described in What seems to happen in such a case is that SLA calculation "sees" the "Problem" event of the first event and assumes the service to be "Down" until there is a Recovery event of the same Trigger, which in this case would be the Recovery event of the Problem occurred one day after. That means, instead of two times short downtime, the Service will state 1 day and something of downtime. |
Comment by Grzegorz Grabowski [ 2018 Apr 24 ] |
Guys, I'm bored to correct that mess every week for 5-6 SLA Services.... Any chance to find what is going on? |
Comment by Vladislavs Sokurenko [ 2018 Apr 24 ] |
Did you have a change to look at |
Comment by Celso Ishikawa [ 2019 Feb 20 ] |
Tested versions 4.0.4 and 4.0.5 rc1 and found the same issue here... It was OK until 4.0.2 and 4.0.3. I solved provisionally by using "CServicesSlaCalculator.php" file of v4.0.2 just replacing it on v4.0.5 Front-End on dir (..)/include/classes/services/. Hope this issue to be solved on official 4.0.5 release version. |
Comment by Arturs Lontons [ 2019 Feb 20 ] |
Aslo reported on 4.0.4 and 4.0.5rc1 inĀ |