[ZBXCTR-10] iLert integration in Zabbix Created: 2020 Jun 29  Updated: 2024 Apr 10  Resolved: 2020 Sep 01

Status: Closed
Project: ZABBIX CONTRIBUTION
Component/s: Template
Affects Version/s: None
Fix Version/s: 5.0.4rc1, 5.2.0alpha2, 5.2 (plan)

Type: New Feature Priority: Trivial
Reporter: Aleksandrs Larionovs (Inactive) Assignee: Aleksandrs Larionovs (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 4.png     PNG File event.png     PNG File link.png     PNG File note.png     PNG File null.png     PNG File one.png     PNG File pd.png     PNG File readme.png     PNG File test.png     PNG File testmedia.png     PNG File zabbix_links1.png     PNG File zabbix_links2.png    
Team: Team INT
Sprint: Sprint 65 (Jun 2020), Sprint 67 (Aug 2020)
Story Points: 1

 Description   

iLert integration in Zabbix



 Comments   
Comment by Vjaceslavs Bogdanovs [ 2020 Jul 08 ]

(1) README is broken:

Comment by Vjaceslavs Bogdanovs [ 2020 Jul 08 ]

(2) Is having 10 retries with 30s interval justified?

Comment by Vjaceslavs Bogdanovs [ 2020 Jul 08 ]

(3) The following check:

if (typeof alertSourceKey === 'string' && alertSourceKey.trim() === '') {

Should be fixed as follows:

if (typeof alertSourceKey !== 'string' || alertSourceKey.trim() === '') {

Both non-string values and empty string values should cause an error.

<natalja.zabbix> This is still not fixed.

Comment by Vjaceslavs Bogdanovs [ 2020 Jul 08 ]

(4) First check is not required here:

if (!!resp && typeof resp === 'object' && typeof resp.message === 'string') {

It could be just:

if (typeof resp === 'object' && typeof resp.message === 'string') {

with the same result.

Comment by Vjaceslavs Bogdanovs [ 2020 Jul 09 ]

(5) It is unclear why 404 and other codes are considered valid results according to the code:

    if (req.Status() != 429 && req.Status() >= 400 && req.Status() <= 499) {
        return JSON.stringify(result);
    }

 Are you sure that this is ok?

Comment by Aleksandrs Larionovs (Inactive) [ 2020 Jul 09 ]

Hello roman-ilert,
Can you please review comments of my college vjaceslavs that were found during the code review?

Comment by Roman Rogozhnikov [ 2020 Jul 09 ]

alarionovs vjaceslavs Thank you for the code review. 

I made some changes based on your comments:

  1. README was fixed. Some invisible characters were removed, I think copy&paste problem.
  2. Ja, we think it's justified. In any case, the user can change this value if he is not satisfied with it.
  3. Fix as suggested by vjaceslavs 
  4. Fix as suggested by vjaceslavs 
  5. Yes, it'i correct. According to our API documentation, a retry should be made only in cases 429 or 5xx. https://api.ilert.com/api-docs/#tag/Events/paths/~1events/post -> "Dealing with errors and retries"

 

Comment by Vjaceslavs Bogdanovs [ 2020 Jul 10 ]

roman-ilert, thank you for your response.

I still have a few questions/suggestions:

  1. I propose to change the retry attempts and interval to default values (that way webhook will not create a higher load on the "Alerter" process. If some users do need to have higher retry count or longer interval in between the attempts they could still change it but reasonable default will be provided for everyone else. Please correct me if I am wrong.
  2. According to the documentation provided:
    Retry a failed request for the following errors:
      any network errors
      5xx errors: this indicates an error in iLert
      429 Too Many Requests: you have reached your rate limit
    
    Do NOT retry a request for the following HTTP response codes:
      200 OK: the reqeust was successful
      400 Bad Request: (check the error message for details)
    


    If I understand this correctly, cases, when service responds with 404 or 408, webhook should retry the request. Am I reading this wrong?

Comment by Roman Rogozhnikov [ 2020 Jul 10 ]

vjaceslavs, thanks for your suggestions. 

We have once again discussed the suggestions in the team and decided to follow your advice.

Specifically:
1. The custom attempts configuration has been removed, so the default settings will be used.
2. The custom check for 4xx cases has been removed, so only the 200 case will be mark as successful, otherwise a new retry will be made.

Comment by Natalja Romancaka [ 2020 Jul 15 ]

(6) iLert incident link is not generated on trigger event
Steps to reproduce:

  1. Setup iLert and media type to receive notifications
  2. Make trigger in problem state
  3. Go to Monitoring->Problems
  4. Click on trigger name to open the context menu

Result: there is no link to iLert
Expected result: link "Alert in iLert"

<natalja.zabbix> TESTED

Comment by Natalja Romancaka [ 2020 Jul 15 ]

(7) Zabbix link in ilert is set to the item Graph /history.php?action=showgraph&itemids[]={ITEM.ID}

It is better to add link on event details page. There is more information about incident /tr_events.php?triggerid={TRIGGER.ID}&eventid={EVENT.ID}.

Comment by Natalja Romancaka [ 2020 Jul 15 ]

(8) Event update in zabbix doesn't work properly with iLert. It is possible to receive notification only when set Acknowledge in problem.
Preconditions:

  1. Setup operation "Recovery operations" in Configuration->Actions->Trigger Actions
  2. Set "Notify all involved"
  3. Create two users with permissions on host, one of them receives a notification from iLert
  4. Make trigger in Problem state
  5. Go to Monitoring->Problems

1. For other problem update actions (comment and change severity) - error "iLert notification failed : TypeError: cannot read property 'incidentUrl' of null".
Steps to reproduce:

  • Update problem with other user, write message or change severity
  • Check Reports->Action log

2. For problem that has already acknowledgment- error is "iLert notification failed : Cannot accept an incident that is already in ACCEPTED status"
Steps to reproduce:

  • Acknowledge problem
  • Update problem with other user, write message or change severity or unacknowledge
  • Check Reports->Action log

3. When problem has resolved and updating the event get error "iLert notification failed : no open incident found with key zabbix-xxx"
Steps to reproduce:

  • Make trigger in Resolved state
  • Update problem with other user, write message or change severity
  • Check Reports->Action log
Comment by Natalja Romancaka [ 2020 Jul 15 ]

(9) iLert webhook not support non-trigger event types

Comment by Natalja Romancaka [ 2020 Jul 15 ]

(10) What means " = 1" near trigger id in incident details 

<natalja.zabbix> TESTED

Comment by Natalja Romancaka [ 2020 Jul 15 ]

(11) Timeline contains "null" event when acknowledge problem in zabbix
 

<natalja.zabbix> TESTED

Comment by Natalja Romancaka [ 2020 Jul 15 ]

(12) It is not clear where the problem is when use test option to check media type
Steps to reproduce:

  1. Go to Administration->Media types
  2. Press Test button near iLlert media
  3. Fill source key and press test

Result: Error message "iLert notification failed : TypeError: cannot read property 'incidentUrl' of null"

Comment by Vjaceslavs Bogdanovs [ 2020 Jul 24 ]

roman-ilert, we are waiting for your actions after the QA session.

Comment by Roman Rogozhnikov [ 2020 Jul 28 ]

natalja.zabbix vjaceslavs  thank you for the review. We have made corresponding changes and waiting for the second review.

  • (6) Was fixed in media type code
  • (7) We set here multiple links:
    • if TRIGGER.URL param is set in event notification, the trigger url will appears in the iLert incident view
    • if TRIGGER.URL param is not set, but ZABBIX.URLTRIGGER.ID and EVENT.ID are set, the trigger url will appears in the iLert incident view
    • if one of the ITEM.IDx / ITEM.NAMEx pairs (e.g. ITEM.ID1 and ITEM.NAME1) and ZABBIX.URL ist set, ** the corresponding item graph url will appears in the iLert incident view
  • (8) Was fixed on server, so no error will appears in the Zabbix UI for this cases
  • (9) We're not sure that we need to support non-trigger event types. any use cases for that?
  • (10) Was fixed on server, so it not appears in the incident view any more
  • (11) Was fixed on server, so it not appears in the incident view any more
  • (12) Was fixed on server, so with default params and without alert source key it will always return success response code
Comment by Vjaceslavs Bogdanovs [ 2020 Jul 30 ]

roman-ilert, as for the #9 we do ask for webhooks to support non-trigger event types. It is more convenient for the end-users to have webhooks that cover all of the use cases.

If you still think that non-trigger events are not applicable for your webhook I would ask to explicitly state this in the README.md file.


As for the #7, the common approach is to give a link to an event detail page. This is the way how it is done in other Webhooks. So our recommendation is to stick to this format.
Non-trigger events contain a link to Zabbix (ZABBIX.URL).

Comment by Roman Rogozhnikov [ 2020 Jul 30 ]

vjaceslavs, I'm not sure what you're talking about in #9. What exactly these events are? Comments? Priority changes? Events relation?

Anyway, our goal was to make the first version with all basic functionality that we already have. We plan to add more features later and this requires changes only from our side.

 


As I mentioned earlier for the #7, we've already fixed the "event detail link" problem:


 

and we're also adding additional links for Zabbix charts:

 

Comment by Vjaceslavs Bogdanovs [ 2020 Jul 30 ]

roman-ilert thank you for the clarification. Non-trigger events are internal events (not supported items, autoregistration events, etc). But they could be added later.

Comment by Natalja Romancaka [ 2020 Aug 06 ]

(13) Test option to check media type always return success message, but no incident is created in iLert system.

There must be parameters validation to create test incident in iLert.
If all parameters are OK, an incident should be created in the iLert system.
If some parameters are not valid, an error message should appear and no incident is generated in iLert.

Comment by Natalja Romancaka [ 2020 Aug 06 ]

(14) It would be great if there was an error for not supported events for iLert in Reports ->Action log, such as:

  • non-trigger events
  • event update action, except when acknowledge trigger problem

Because this events not affect on incident (no incidents updates or new creation) in iLert.

Comment by Natalja Romancaka [ 2020 Aug 06 ]

(15) Please update README file

  • Mention about support for trigger events only
  • "Send to" field in user media settings is not used in iLert media, but it cannot be empty. To comply with the frontend requirements, there can be any symbol.
  • alert source api key that generated in iLert. should be in media parameter .ILERT.ALERT.SOURCE.KEY
Comment by Roman Rogozhnikov [ 2020 Aug 10 ]

natalja.zabbix vjaceslavs 

(13) We fixed this, so you can try it out.

 

(14) We never log an incident action if an event was ignored, so when we add support for the non-trigger events, then the action logs will appear. As I mentioned earlier will plan to do that in the future.

 

(15) We added a note about support for trigger events only. 

However, we believe that alert source api key that generated in iLert should not be in media parameter .ILERT.ALERT.SOURCE.KEY, but in the user's "Send to" field und the .ILERT.ALERT.SOURCE.KEY parameter should filled by default with 

{ALERT.SENDTO}

, which is a value of the user's "Send to" field. We believe that this approach will allow users to use Zabbix more effectively with multiple alert sources. This means that the user must perform the full Zabbix <=> iLert configuration (Zabbix: media type, user group, permissions, user, action. iLert: alert source) only the first time. The next time the user only has to create a new alert source in iLert and a new user in Zabbix. Otherwise, the user will have to create all resources again (Zabbix: media type, user group, permissions, user, action. iLert: alert source).

 
 

Comment by Natalja Romancaka [ 2020 Aug 12 ]

(16) Looks like happen misunderstanding about media type "Test" option.
iLert creates its own look of test incident , but this test incident should looks like real trigger event incident. All macros should be changed to valid values during the test, which are necessary to create a real incident.
If it isn't possible to create an incident in iLert with a string or empty parameter EVENT.ID, there should be a message with additional failure details.
If the value of EVENT.ID parameter doesn't matter, then the test incident will contain the value that I entered during the test.

For example, for webhook PagerDuty need to change 6 parameters to create an incident in the system.

All other parameters can remain as a macro (as a string, because macros not resolved). And incident is created with all these parameters as they are.

 

Comment by Natalja Romancaka [ 2020 Aug 12 ]

(17) How many item graph url can be in the iLert incident view? 
There are 5 pairs of ITEM.IDx / ITEM.NAMEx parameters in webhook. I created trigger with 6 items, but in incident view displayed only 4 graph url. Is this expected?

Comment by Roman Rogozhnikov [ 2020 Aug 17 ]

natalja.zabbix vjaceslavs 

(16) fixed

(17) fixed

Comment by Natalja Romancaka [ 2020 Aug 18 ]

(18) Parameters validation
Steps to reproduce:

  1. Press on "Test" button near iLert webhook
  2. Type valid api token in param field .ILERT.ALERT.SOURCE.KEY
  3. Remove all values from other webhook parameters
  4. Press Test

Result: no incident created (response is null)
Expected result: validation of the necessary parameters to create an incident

 

Comment by Roman Rogozhnikov [ 2020 Aug 18 ]

(18) fixed

Comment by Natalja Romancaka [ 2020 Aug 19 ]

(19) There are still cases where the response received during webhook test is null. Please add checks and corresponding error message for fields: EVENT.ACK.STATUS (Yeas or No), EVENT.UPDATE.STATUS (0 - Webhook was called because of problem/recovery event, 1 - Update operation.) , EVENT.VALUE (1 for problem, 0 for recovering)

1. Steps to reproduce:

  1. Make sure there is no open test incident in the system
  2. Press on "Test" button near iLert webhook
  3. Type valid api token in param field .ILERT.ALERT.SOURCE.KEY
  4. Remove all other values
  5. Type:
    EVENT.ACK.STATUS => {EVENT.ACK.STATUS}
    EVENT.UPDATE.STATUS => {EVENT.UPDATE.STATUS}
    EVENT.VALUE => {EVENT.VALUE}
  6. Press Test
  7. Observe that test incident was created
  8. Close incident. Try to create new one
  9. Change any two of these parameters to any text. For example:
    EVENT.ACK.STATUS => aaa
    EVENT.UPDATE.STATUS => {EVENT.UPDATE.STATUS}
    EVENT.VALUE => aaa
  10. Observe that test incident created.
  11. Close incident. Try to create new one
  12. Type:
    EVENT.ACK.STATUS => aaa
    EVENT.UPDATE.STATUS => aaa
    EVENT.VALUE => aaa
  13. Press Test. Observe successful test message
  14. Observe that test incident is not created. Response in webhook log is null

Result: During webhook test macros are unresolved, so it just string value. But incident is not created with random text in all parameters.
Expected result: There are two options
1. allow to create test incident with any text in parameters (if it is possible in your system)
2. write an error message like when one of these 3 required parameters is empty and validate values (we use and prefer this one):

Media type test failed.
- Sending failed: incorrect value for variable "EVENT.ACK.STATUS". The value must be Yes or No.
Media type test failed.
- Sending failed: Incorrect "EVENT.UPDATE.STATUS" parameter given: "{EVENT.UPDATE.STATUS}".
- Must be 0 or 1.

2. Steps to reproduce:

  1. Make sure there is no open test incident in the system
  2. Type:
    EVENT.ACK.STATUS => Yes
    EVENT.UPDATE.STATUS => 1
    EVENT.VALUE => 1
  3. Press Test. Observe successful test message
  4. Observe that test incident is not created. Error in log {"status":400,"message":"no open incident found with key zabbix-"

Expected result: error message instead of successful test

3. Steps to reproduce:

  1. Make sure there is no open test incident in the system
  2. Type:
    EVENT.ACK.STATUS => Yes or No
    EVENT.UPDATE.STATUS => 1 or 0
    EVENT.VALUE => 0
  3. Press Test. Observe successful test message
  4. Observe that test incident is not created. Error in log {"status":400,"message":"no open incident found with key zabbix-"

Expected result: error message instead of successful test

Comment by Roman Rogozhnikov [ 2020 Aug 20 ]

natalja.zabbix 

(19)

1. Fixed

2. and 3. this is an expected behaviour, as the iLert incident can't be reopened by design, we avoid such API errors to prevent UI errors in the Zabbix trigger view (Actions Log), which might confuse the user. We already discussed and fixed this issue in (8) - 3.

Comment by Natalja Romancaka [ 2020 Aug 21 ]

(20) please fix this line in README file "The text from "Action Recovery Operations" and "Action Update Operations" will be sent to "iLert Alert Notes" when a problem is resolved or updated".

There is no text from recovery and update operations. Only appears "Note" that accept (for acknowledgement) or resolved event has been received and changed incident status.

Comment by Roman Rogozhnikov [ 2020 Aug 21 ]

natalja.zabbix 

(19) README fixed

Comment by Vjaceslavs Bogdanovs [ 2020 Aug 26 ]

Available in:

Documentation updated:

Generated at Sun Apr 27 10:42:42 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.