[ZBX-13208] Preprocessing is not applied to first value of new item Created: 2017 Dec 16  Updated: 2024 Apr 10  Resolved: 2018 Jan 16

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.4.4
Fix Version/s: 3.4.7rc1, 4.0.0alpha3, 4.0 (plan)

Type: Problem report Priority: Major
Reporter: A B Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 0
Labels: cache
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File DiscoveryRule.png     PNG File DiscoveryRuleFilter.png     PNG File ItemDefinition.png     PNG File ItemDefinitionPreprocessing.png     PNG File ReportedValues.png     PNG File Screenshot_1.png     PNG File Screenshot_2.png     PNG File Screenshot_3.png    
Team: Team A
Sprint: Sprint 23, Sprint 25
Story Points: 1

 Description   

We have a LLD with several item prototypes, some of which have preprocessing steps configured. In particular, for two item prototypes we have a single step configured consisting of a custom multiplier. Now, when a new actual item is created from either of these prototypes, we noticed that the first (and only the first) value stored is the raw value without preprocessing - all subsequent values, however, have the custom multiplier correctly applied. This is causing a number of spurious triggers and alerts in our setup.

To recap, the following should be sufficient to reproduce it:

  1. Create a LLD
  2. Create an Item Prototype, of type Numeric, and configure it with a single preprocessing step consisting of a custom multiplier
  3. wait for an item to be created and a few values to be stored
  4. look at the values and notice that the first one does not have the custom multiplier applied to it, but the rest do


 Comments   
Comment by Olegs Vasiljevs (Inactive) [ 2017 Dec 18 ]

Hello!

  • OS: Centos 7.3.1611
  • Database: MariaDB 5.5.56
  • Web Server: Apache 2.4.6
  • Zabbix Server MySQL: 3.4.4
  • Zabbix Frontend MySQL: 3.4.4

Algorithm:

  1. Create vfs.fs.discovery LLD of Zabbix agent type.
  2. Create Item prototype with vfs.fs.size[{#FSNAME}] item key and Zabbix agent type. Use custom multiplier of 5 for the item prototype.
  3. Wait until LLD runs through as well as collect a couple of values for newly created item.

Results:
Created item properties:


Values in Latest Data -> Graph -> Values view:

Values in the DB:

MariaDB [zabbix]> select * from history_uint where itemid='28209';
+--------+------------+-------------+-----------+
| itemid | clock      | value       | ns        |
+--------+------------+-------------+-----------+
|  28209 | 1513586769 | 91194654720 | 250468041 |
|  28209 | 1513586829 | 91194654720 | 350026916 |
|  28209 | 1513586889 | 91194654720 | 518192051 |
+--------+------------+-------------+-----------+

roadrunner, what LLD and item prototype should I use to reproduce an issue?

Regards,
Oleg

Comment by Glebs Ivanovskis (Inactive) [ 2017 Dec 18 ]

Dear roadrunner, is there any chance that you have edited prototype, then LLD rule was executed, but server had not updated the configuration from the database before items received their first (after prototype changes) values?

Comment by A B [ 2017 Dec 19 ]

Thanks for taking a look at this. No, I did not edit anything in between: the LLD is part of a template, and the host was just linked to the template after being created.

To double check whether I missed something, I created a new host with a single discovery rule much like yours above, and was able to reproduce the issue consistently, with one difference: with a 30s update interval, it's actually the first several values that are wrong (the original items we're seeing the issues with have 1m update intervals). I'm attaching screenshots.f

Comment by A B [ 2017 Dec 19 ]

The discovery rule definition, item definition, and the resulting values.

Comment by A B [ 2017 Dec 19 ]

It just occurred to me: could this be a race-condition in the (re)loading of the config? We have ~6K hosts and ~3M items (not all enabled, but since everything gets (re)loaded I believe only the total number is relevant here), and it takes 2 - 3 minutes to load the config. If somehow the main item definition gets loaded substantially before the associated pre-processing definitions, I suppose this could explain things?

Comment by Olegs Vasiljevs (Inactive) [ 2017 Dec 19 ]

Interesting idea indeed!

You can troubleshoot this situation by altering the CacheUpdateFrequency directive in zabbix_server.conf to a value that exceeds 2-3 minute interval (for example, 10m will do).
Then create a discovery rule with item prototype with scheduling interval to be executed at specific time - specifically, after configuration cache reload by Zabbix server. Scheduling interval will ensure that LLD and item check runs through when all the configuration is loaded.

Regards,
Oleg

Comment by Glebs Ivanovskis (Inactive) [ 2017 Dec 19 ]

It should be checked that LLD items and respective preprocessing steps are created or updated in the same transaction. Otherwise if LLD processing is happening at the same time as configuration reload server may end up with new items and old/none preprocessing options in the cache. Forget it, it is not the case.

I think I know what the problem is. Most likely ZBXNEXT-3659 is to blame. For reasons configuration cache is synced in several stages each one protected with a mutex lock. Turns out items and item preprocessing are synced in different stages, leaving in memory item configuration incoherent for some time between these stages. If a new item happens to get processed in between stages it may not go to plan.

Comment by A B [ 2017 Dec 20 ]

We already have a CacheUpdateFrequency of 1 hour because of the excessive time it takes to (re)load the config (and the heavy load is puts on the db and network - we need a better solution here, but that's for another ticket), and instead "manually" trigger config reloads when necessary. Anyway, yes, creating an item with with a custom interval that doesn't start till after the config has been fully reloaded doesn't exhibit the problem (i.e. the pre-preprocessing is applied on the first value already).

Also, I realized that this isn't LLD specific, and confirmed the behaviour with just a new regular item with a pre-processing step (it just happens that in our setup the non-discovered items don't trigger alerts for this problem).

I don't suppose there's any workaround, is there? (using custom intervals everywhere is not really feasible...)

Comment by Glebs Ivanovskis (Inactive) [ 2017 Dec 20 ]

Also, I realized that this isn't LLD specific

Yes, that isn't a race condition between LLD processing and configuration reload, it's a race condition between cache reload and data gathering + preprocessing. Issue title updated.

I don't think there is a workaround, but the fix should be coming out very soon, hopefully.

Comment by Andris Zeila [ 2017 Dec 20 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-13208

Comment by Vladislavs Sokurenko [ 2017 Dec 20 ]

Successfully tested, now items and preprocessing will be synced under same lock

Comment by A B [ 2017 Dec 20 ]

Excellent! Thanks for the quick fix! I've just grabbed r76187 from svn and applied it to our server. I'll post an update in a couple days.

Comment by Andris Zeila [ 2017 Dec 20 ]

Released in:

  • pre-3.4.5rc3 r76205
  • pre-4.0.0alpha1 r76206
Comment by A B [ 2018 Jan 11 ]

Apologies for getting back on this so late (holidays etc). Unfortunately we are still seeing the issue. Both when I had patched 3.4.4 manually, but also even after upgrading to 3.4.5. I verified the issue again by creating a simple item (no LLD involved) with a scaling factor and then triggering a reload of the config.

Let me know if you need more information.

Comment by Vladislavs Sokurenko [ 2018 Jan 11 ]

Fixed in development branch:
svn://svn.zabbix.com/branches/dev/ZBX-13208

wiper successfylly tested. Pelase review renaming of timestamp variable in r76781

vso thanks, reviewed

Comment by Vladislavs Sokurenko [ 2018 Jan 16 ]

Fixed in:

  • pre-3.4.7rc1 r76870
  • pre-4.0.0alpha3 r76871
Comment by A B [ 2018 Jan 20 ]

We've been running 3.4.6 with r76870 applied for several days now, and so far the issue has not resurfaced, so this looks good now. Many thanks.

Generated at Tue Jul 01 00:08:53 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.