[ZBX-8169] IT Service SLA wrong for monthly/yearly Created: 2014 May 01 Updated: 2017 May 30 Resolved: 2014 May 14 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Frontend (F) |
Affects Version/s: | 2.2.2 |
Fix Version/s: | 2.3.0 |
Type: | Incident report | Priority: | Major |
Reporter: | Gustavo Michels | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | itservices, sla | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: |
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Description |
I have an IT service tied to a single trigger with SLA calculation. We only had one outage for the year that lasted 35 min, and I can see that perfectly reflected on the problem time column for the daily and weekly SLA reports: Now the monthly and yearly display wildly different problem times: Can anyone explain to me such behavior? Thank you, |
Comments |
Comment by Gustavo Michels [ 2014 May 01 ] |
I thought I could use formatting on the description field so attachments would appear inline. I'm sorry for that. This was my goal:
|
Comment by Gustavo Michels [ 2014 May 07 ] |
I'm leaning towards the problem being related to DST starting on March 9th at 2 AM. I wrote a quick python script to query the API based on whatever intervals I want. Here's some possibly useful information:
From Mar 1 thru 9, 100% SLA: time_from = mktime(now.replace(year=2014, month=3, day=1, hour=0, minute=0, second=0, microsecond=0).timetuple()) time_to = mktime(now.replace(year=2014, month=3, day=9, hour=0, minute=0, second=0, microsecond=0).timetuple()) {u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1393650000, u'problemTime': 0, u'to': 1394341200, u'okTime': 607200, u'downtimeTime': 84000, u'sla': 100}]}} From Mar 9 thru 10, 100% SLA: time_from = mktime(now.replace(year=2014, month=3, day=9, hour=0, minute=0, second=0, microsecond=0).timetuple()) time_to = mktime(now.replace(year=2014, month=3, day=10, hour=0, minute=0, second=0, microsecond=0).timetuple()) {u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394341200, u'problemTime': 0, u'to': 1394424000, u'okTime': 67500, u'downtimeTime': 15300, u'sla': 100}]}} From Mar 9 thru 11, 100% SLA: time_from = mktime(now.replace(year=2014, month=3, day=9, hour=0, minute=0, second=0, microsecond=0).timetuple()) time_to = mktime(now.replace(year=2014, month=3, day=11, hour=0, minute=0, second=0, microsecond=0).timetuple()) {u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394341200, u'problemTime': 0, u'to': 1394510400, u'okTime': 132000, u'downtimeTime': 37200, u'sla': 100}]}} From Mar 9 thru 12, 98.59% SLA: time_from = mktime(now.replace(year=2014, month=3, day=9, hour=0, minute=0, second=0, microsecond=0).timetuple()) time_to = mktime(now.replace(year=2014, month=3, day=12, hour=0, minute=0, second=0, microsecond=0).timetuple()) {u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394341200, u'problemTime': 2969, u'to': 1394596800, u'okTime': 207631, u'downtimeTime': 45000, u'sla': 98.590218423552}]}} From Mar 10 thru 12, 100% SLA: time_from = mktime(now.replace(year=2014, month=3, day=10, hour=0, minute=0, second=0, microsecond=0).timetuple()) time_to = mktime(now.replace(year=2014, month=3, day=12, hour=0, minute=0, second=0, microsecond=0).timetuple()) {u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394424000, u'problemTime': 0, u'to': 1394596800, u'okTime': 146700, u'downtimeTime': 26100, u'sla': 100}]}} I just don't understand why the problem doesn't show up on the Mar 9 thru 11 test and is only visible if there are more than 2 days apart. Anything starting with Mar 9 to whatever date will have the SLA impacted: From Mar 9 thru 16, 97.3% SLA: time_from = mktime(now.replace(year=2014, month=3, day=9, hour=0, minute=0, second=0, microsecond=0).timetuple()) time_to = mktime(now.replace(year=2014, month=3, day=16, hour=0, minute=0, second=0, microsecond=0).timetuple()) {u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394341200, u'problemTime': 14160, u'to': 1394942400, u'okTime': 510840, u'downtimeTime': 76200, u'sla': 97.302857142857}]}} From Mar 9 thru 26, 96.6% SLA: time_from = mktime(now.replace(year=2014, month=3, day=9, hour=0, minute=0, second=0, microsecond=0).timetuple()) time_to = mktime(now.replace(year=2014, month=3, day=26, hour=0, minute=0, second=0, microsecond=0).timetuple()) {u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394341200, u'problemTime': 42990, u'to': 1395806400, u'okTime': 1224810, u'downtimeTime': 197400, u'sla': 96.60908660672}]}} From Mar 10 thru 16, 100% SLA: time_from = mktime(now.replace(year=2014, month=3, day=10, hour=0, minute=0, second=0, microsecond=0).timetuple()) time_to = mktime(now.replace(year=2014, month=3, day=16, hour=0, minute=0, second=0, microsecond=0).timetuple()) {u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394424000, u'problemTime': 0, u'to': 1394942400, u'okTime': 461100, u'downtimeTime': 57300, u'sla': 100}]}} From Mar 10 thru 26, 100% SLA: time_from = mktime(now.replace(year=2014, month=3, day=10, hour=0, minute=0, second=0, microsecond=0).timetuple()) time_to = mktime(now.replace(year=2014, month=3, day=26, hour=0, minute=0, second=0, microsecond=0).timetuple()) {u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1394424000, u'problemTime': 0, u'to': 1395806400, u'okTime': 1203900, u'downtimeTime': 178500, u'sla': 100}]}} |
Comment by Gustavo Michels [ 2014 May 07 ] |
The exact moment the problem happens: 100% SLA up to Mar 11 at 3:05 AM: time_from = mktime(now.replace(year=2014, month=3, day=1, hour=0, minute=0, second=0, microsecond=0).timetuple()) time_to = mktime(now.replace(year=2014, month=3, day=11, hour=3, minute=5, second=0, microsecond=0).timetuple()) {u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1393650000, u'problemTime': 0, u'to': 1394521500, u'okTime': 750300, u'downtimeTime': 121200, u'sla': 100}]}} 29 seconds problem time up to Mar 11 at 3:06 AM: time_from = mktime(now.replace(year=2014, month=3, day=1, hour=0, minute=0, second=0, microsecond=0).timetuple()) time_to = mktime(now.replace(year=2014, month=3, day=11, hour=3, minute=6, second=0, microsecond=0).timetuple()) {u'42': {u'status': u'0', u'problems': [], u'sla': [{u'from': 1393650000, u'problemTime': 29, u'to': 1394521560, u'okTime': 750331, u'downtimeTime': 121200, u'sla': 99.996135188443}]}} |
Comment by Krists Krigers (Inactive) [ 2014 May 10 ] |
Hello, gmichels! Could You provide following: Thanks. |
Comment by Gustavo Michels [ 2014 May 12 ] |
Hello kristsk, I have attached the dump from March 1st thru March 19th for the service. I can provide the service_times table if needed. Timezone in use is America/New_York. Please let me know if i can be of any further help. Thank you |
Comment by Krists Krigers (Inactive) [ 2014 May 14 ] |
Fixed SLA period calculation to account for DST changes in r45473, branch svn://svn.zabbix.com/branches/dev/ZBX-8169. Moved SLA calculation logic to new class CServicesSlaCalculator in r45475, branch svn://svn.zabbix.com/branches/dev/ZBX-8169. |
Comment by Gustavo Michels [ 2014 May 14 ] |
Backported the changes to 2.2.2 and verified SLA is now correct both for monthly and yearly. Thank you! |
Comment by Pavels Jelisejevs (Inactive) [ 2014 May 19 ] |
(1) I've made a few corrections and added some code comments in r45601, 45699 and 45708, please review. kristsk CLOSED. |
Comment by Pavels Jelisejevs (Inactive) [ 2014 May 21 ] |
TESTED. Please review and close (1) before merging. |
Comment by Krists Krigers (Inactive) [ 2014 May 22 ] |
Fixed and merged to 2.3.0 (trunk) in r45732. |