[ZBX-25409] Media type tags processing race condition Created: 2024 Oct 17  Updated: 2025 Aug 26  Resolved: 2025 Aug 26

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 6.4.18
Fix Version/s: None

Type: Problem report Priority: Trivial
Reporter: Aleksandr Khudushin Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File rc_test_action.PNG     File rc_test_host.yaml     File rc_test_media.yaml     File rc_test_sender.py     File zabbix_server.log.gz    

 Description   

We use custom Zabbix media type to integrate it's events with external alerting system.
This media type stores external alert's ID in Zabbix event's tag.
Media type is executed within simple action with zero delay for the first step.

From time to time Zabbix generates many events with zero duration (event start = event end). When Zabbix starts to process these events, it executes 2 actions:

  1. Create alert in external system (and receive it's ID).
  2. Close alert in external system by it's ID (received during previous step).

As far as I can see, actions are being executed with ~3 seconds interval.
But for some of these events close action fails because at the time of it's execution event tag with alert ID is missing. Action retries fails with the same error.

It happens with <1% of events. But it happens.
Setting concurrent sessions to 1 in media type settings doesn't solve the issue.
So for me it seems to be some kind of a race condition between actions execution.

Steps to reproduce:

  1. Use some media type that stores external system ID in Zabbix event tag and throw errors if this ID is missing for subsequent actions.
  2. Create typical action for above media type with Notify all involved on event resolve.
  3. Create Zabbix trapper item.
  4. Create last()=1 trigger for above item with multiple event generation.
  5. Generate bunch (200 for ex) of item values (0/1) with same timestamp and send it with zabbix_sender.
  6. Check action log for errors.

Result:
Errors in action log with missing external system ID
Expected:
No errors



 Comments   
Comment by Kamil Florowski (Inactive) [ 2024 Oct 25 ]

Hi akhudushin ,

Have you enabled verbose logging in Zabbix server (setting debug level to 4 or 5)? This might give us more insight into the sequence of operations and where the breakdown is occurring. Please, share captured logs.

Thanks

Comment by Aleksandr Khudushin [ 2024 Oct 28 ]

Dear Kamil,

Thank you for response!

Prerequisites for reproducing the problem:

  • rc_test_media.yaml - simple media type unrelated with any external system.
  • rc_test_action.PNG - example action for above media type (don't forget to add media type in user's settings).
  • rc_test_host.yaml - simple host with one trapper-item and related trigger.
  • rc_test_sender.py - Python script (based on zabbix_utils) to spam values into Zabbix trapper-item.

I've switched Zabbix to DebugLevel=4 and captured logs for your request using above components:
zabbix_server.log.gz

You can search log for phrase:

[ RACE CONDITION TEST ] ERROR: Ticket ID is missing
Comment by Kamil Florowski (Inactive) [ 2024 Oct 28 ]

Thanks for the above.

Did you have a chance to verify whether the issue occurs also in Zabbix version 7?

Comment by Aleksandr Khudushin [ 2024 Oct 29 ]

Kamil,

No, we haven't migrated to 7 yet.

Comment by Kamil Florowski (Inactive) [ 2024 Oct 29 ]

Hi,

Seems like the potential race condition could be excluded during notification processing. What do you think?

Comment by Aleksandr Khudushin [ 2024 Oct 29 ]

Kamil,

I can't find any good way to do that.

Webhook's input parameters are static and doesn't change during webhook's processing. Moreover they don't change during subsequent processing attempts. I've tried to raise attempts and attempt interval in media type settings - no effect.

Request the same event from within webhook's script through Zabbix API seems like bad workaround IMO.

Comment by Aleksandr Khudushin [ 2024 Dec 28 ]

Greetings Kamil!

Any update on this?

Comment by Edgars Melveris [ 2025 Jun 25 ]

It looks to me like the main problem is that your script has not yet returned the external system ID when Zabbix tries to close the problem.
But there is nothing we can do about that, Zabbix will try to run the script at the moment it needs to run it, with the data that it has at that moment.
It would be better to rely in Zabbix's eventid in this case.

Also - you should investigate why you have problems with 0 duration and probably fix those.

Generated at Sat Dec 13 21:49:57 EET 2025 using Jira 10.3.13#10030013-sha1:56dd970ae30ebfeda3a697d25be1f6388b68a422.