[ZBX-8169] IT Service SLA wrong for monthly/yearly Created: 2014 May 01  Updated: 2017 May 30  Resolved: 2014 May 14

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Frontend (F)
Affects Version/s: 2.2.2
Fix Version/s: 2.3.0

Type: Incident report Priority: Major
Reporter: Gustavo Michels Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: itservices, sla
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Service-times-DB.PNG     PNG File Service-times.PNG     PNG File daily.png     PNG File monthly.png     Text File service_alarms.sql     PNG File weekly.png     PNG File yearly.png    

 Description   

I have an IT service tied to a single trigger with SLA calculation. We only had one outage for the year that lasted 35 min, and I can see that perfectly reflected on the problem time column for the daily and weekly SLA reports:

Now the monthly and yearly display wildly different problem times:

Can anyone explain to me such behavior?

Thank you,



 Comments   
Comment by Gustavo Michels [ 2014 May 01 ]

I thought I could use formatting on the description field so attachments would appear inline. I'm sorry for that. This was my goal:

I have an IT service tied to a single trigger with SLA calculation. We only had one outage for the year that lasted 35 min, and I can see that perfectly reflected on the problem time column for the daily and weekly SLA reports:

Now the monthly and yearly display wildly different problem times:

Can anyone explain to me such behavior?

Thank you,

Comment by Gustavo Michels [ 2014 May 07 ]

I'm leaning towards the problem being related to DST starting on March 9th at 2 AM. I wrote a quick python script to query the API based on whatever intervals I want. Here's some possibly useful information:

From Mar 1 thru 9, 100% SLA:

time_from = mktime(now.replace(year=2014, month=3, day=1, hour=0, minute=0, second=0, microsecond=0).timetuple())
time_to = mktime(now.replace(year=2014, month=3, day=9, hour=0, minute=0, second=0, microsecond=0).timetuple())

{u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1393650000, u'problemTime': 0, u'to': 1394341200, u'okTime': 607200, u'downtimeTime': 84000, u'sla': 100}]}}

From Mar 9 thru 10, 100% SLA:

time_from = mktime(now.replace(year=2014, month=3, day=9, hour=0, minute=0, second=0, microsecond=0).timetuple())
time_to = mktime(now.replace(year=2014, month=3, day=10, hour=0, minute=0, second=0, microsecond=0).timetuple())

{u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394341200, u'problemTime': 0, u'to': 1394424000, u'okTime': 67500, u'downtimeTime': 15300, u'sla': 100}]}}

From Mar 9 thru 11, 100% SLA:

time_from = mktime(now.replace(year=2014, month=3, day=9, hour=0, minute=0, second=0, microsecond=0).timetuple())
time_to = mktime(now.replace(year=2014, month=3, day=11, hour=0, minute=0, second=0, microsecond=0).timetuple())

{u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394341200, u'problemTime': 0, u'to': 1394510400, u'okTime': 132000, u'downtimeTime': 37200, u'sla': 100}]}}

From Mar 9 thru 12, 98.59% SLA:

time_from = mktime(now.replace(year=2014, month=3, day=9, hour=0, minute=0, second=0, microsecond=0).timetuple())
time_to = mktime(now.replace(year=2014, month=3, day=12, hour=0, minute=0, second=0, microsecond=0).timetuple())

{u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394341200, u'problemTime': 2969, u'to': 1394596800, u'okTime': 207631, u'downtimeTime': 45000, u'sla': 98.590218423552}]}}

From Mar 10 thru 12, 100% SLA:

time_from = mktime(now.replace(year=2014, month=3, day=10, hour=0, minute=0, second=0, microsecond=0).timetuple())
time_to = mktime(now.replace(year=2014, month=3, day=12, hour=0, minute=0, second=0, microsecond=0).timetuple())

{u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394424000, u'problemTime': 0, u'to': 1394596800, u'okTime': 146700, u'downtimeTime': 26100, u'sla': 100}]}}

I just don't understand why the problem doesn't show up on the Mar 9 thru 11 test and is only visible if there are more than 2 days apart. Anything starting with Mar 9 to whatever date will have the SLA impacted:

From Mar 9 thru 16, 97.3% SLA:

time_from = mktime(now.replace(year=2014, month=3, day=9, hour=0, minute=0, second=0, microsecond=0).timetuple())
time_to = mktime(now.replace(year=2014, month=3, day=16, hour=0, minute=0, second=0, microsecond=0).timetuple())

{u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394341200, u'problemTime': 14160, u'to': 1394942400, u'okTime': 510840, u'downtimeTime': 76200, u'sla': 97.302857142857}]}}

From Mar 9 thru 26, 96.6% SLA:

time_from = mktime(now.replace(year=2014, month=3, day=9, hour=0, minute=0, second=0, microsecond=0).timetuple())
time_to = mktime(now.replace(year=2014, month=3, day=26, hour=0, minute=0, second=0, microsecond=0).timetuple())

{u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394341200, u'problemTime': 42990, u'to': 1395806400, u'okTime': 1224810, u'downtimeTime': 197400, u'sla': 96.60908660672}]}}

From Mar 10 thru 16, 100% SLA:

time_from = mktime(now.replace(year=2014, month=3, day=10, hour=0, minute=0, second=0, microsecond=0).timetuple())
time_to = mktime(now.replace(year=2014, month=3, day=16, hour=0, minute=0, second=0, microsecond=0).timetuple())

{u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394424000, u'problemTime': 0, u'to': 1394942400, u'okTime': 461100, u'downtimeTime': 57300, u'sla': 100}]}}

From Mar 10 thru 26, 100% SLA:

time_from = mktime(now.replace(year=2014, month=3, day=10, hour=0, minute=0, second=0, microsecond=0).timetuple())
time_to = mktime(now.replace(year=2014, month=3, day=26, hour=0, minute=0, second=0, microsecond=0).timetuple())

{u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394424000, u'problemTime': 0, u'to': 1395806400, u'okTime': 1203900, u'downtimeTime': 178500, u'sla': 100}]}}
Comment by Gustavo Michels [ 2014 May 07 ]

The exact moment the problem happens:

100% SLA up to Mar 11 at 3:05 AM:

time_from = mktime(now.replace(year=2014, month=3, day=1, hour=0, minute=0, second=0, microsecond=0).timetuple())
time_to = mktime(now.replace(year=2014, month=3, day=11, hour=3, minute=5, second=0, microsecond=0).timetuple())

{u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1393650000, u'problemTime': 0, u'to': 1394521500, u'okTime': 750300, u'downtimeTime': 121200, u'sla': 100}]}}

29 seconds problem time up to Mar 11 at 3:06 AM:

time_from = mktime(now.replace(year=2014, month=3, day=1, hour=0, minute=0, second=0, microsecond=0).timetuple())
time_to = mktime(now.replace(year=2014, month=3, day=11, hour=3, minute=6, second=0, microsecond=0).timetuple())

{u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1393650000, u'problemTime': 29, u'to': 1394521560, u'okTime': 750331, u'downtimeTime': 121200, u'sla': 99.996135188443}]}}
Comment by Krists Krigers (Inactive) [ 2014 May 10 ]

Hello, gmichels!

Could You provide following:
1. Which timezone is used.
2. Dump from your service_alarms table (with relevant data).

Thanks.

Comment by Gustavo Michels [ 2014 May 12 ]

Hello kristsk,

I have attached the dump from March 1st thru March 19th for the service. I can provide the service_times table if needed.

Timezone in use is America/New_York.

Please let me know if i can be of any further help.

Thank you

Comment by Krists Krigers (Inactive) [ 2014 May 14 ]

Fixed SLA period calculation to account for DST changes in r45473, branch svn://svn.zabbix.com/branches/dev/ZBX-8169.

Moved SLA calculation logic to new class CServicesSlaCalculator in r45475, branch svn://svn.zabbix.com/branches/dev/ZBX-8169.

Comment by Gustavo Michels [ 2014 May 14 ]

Backported the changes to 2.2.2 and verified SLA is now correct both for monthly and yearly. Thank you!

Comment by Pavels Jelisejevs (Inactive) [ 2014 May 19 ]

(1) I've made a few corrections and added some code comments in r45601, 45699 and 45708, please review.

kristsk CLOSED.

Comment by Pavels Jelisejevs (Inactive) [ 2014 May 21 ]

TESTED.

Please review and close (1) before merging.

Comment by Krists Krigers (Inactive) [ 2014 May 22 ]

Fixed and merged to 2.3.0 (trunk) in r45732.
API changelog updated.

Generated at Sat May 10 08:22:03 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.