[ZBXNEXT-4108] Ability to search problems by trigger name (Z4) Created: 2017 Sep 18 Updated: 2024 Apr 10 Resolved: 2017 Dec 18 |
|
Status: | Closed |
Project: | ZABBIX FEATURE REQUESTS |
Component/s: | API (A), Frontend (F), Server (S) |
Affects Version/s: | None |
Fix Version/s: | 4.0.0alpha1, 4.0 (plan) |
Type: | Change Request | Priority: | Trivial |
Reporter: | Rostislav Palivoda | Assignee: | Andris Zeila |
Resolution: | Fixed | Votes: | 3 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: | monitoring-problems-sort-after.png monitoring-problems-sort-before.png performance-measurements.pdf widget-problems-sort-after.png widget-problems-sort-before.png zbx_export_templates(1).xml | ||||||||||||||||||||
Issue Links: |
|
||||||||||||||||||||
Team: | Team A | ||||||||||||||||||||
Team: | Team A | ||||||||||||||||||||
Sprint: | Sprint 17, Sprint 18, Sprint 19, Sprint 20, Sprint 21, Sprint 22, Sprint 23 | ||||||||||||||||||||
Story Points: | 8 |
Description |
Currently, problem and event names are generated on the fly in the front-end and on server side. It introduces severe performance issues and makes impossible seeing historical information about problems especially when trigger name changes or trigger name contains macros. The proposal leads to a better separation of triggers and problems, improves performance (however size of tables problem/events will be larger) and will maintain historical problem names. |
Comments |
Comment by Vitaly Zhuravlev [ 2017 Sep 18 ] | ||||||||||
As a side effect, this would make a much easier to generate reports using external tools such as JasperReports and so on | ||||||||||
Comment by Miks Kronkalns [ 2017 Oct 05 ] | ||||||||||
Frontend and API:
| ||||||||||
Comment by Andrea Biscuola (Inactive) [ 2017 Oct 16 ] | ||||||||||
Resolved in svn://svn.zabbix.com/branches/dev/ZBXNEXT-4108 (server side) The implementaton span from commit r73305 to r73511 and based on what was implemented is divided in different "sections": The feature implementation (database patches and server change), the introduction of the new macro EVENT.NAME and other minor diffs. The server was modified for storing the new 'name' field for both the problem and events table just before flushing a series of events to the database. Some modifications were performed for the correct expansion of the new EVENT.NAME macro.
To test:
| ||||||||||
Comment by Miks Kronkalns [ 2017 Oct 18 ] | ||||||||||
Currently I used name = 'search query' due the performance reasons. See attached file (performance-measurements.pdf) to see how much of speed can be lost changing it to LIKE. Frontend RESOLVED in ^/branches/dev/ | ||||||||||
Comment by Miks Kronkalns [ 2017 Oct 18 ] | ||||||||||
(1) No translation string changes. iivs CLOSED | ||||||||||
Comment by Rostislav Palivoda [ 2017 Nov 02 ] | ||||||||||
Please test server side - wiper | ||||||||||
Comment by Andris Zeila [ 2017 Nov 08 ] | ||||||||||
(4) [S] Changes to the tables in database upgrade must be split into separate patches - as small as possible. For example DBpatch_3050000() should be split into two patches - one for events table the other for problem table. abs RESOLVED in 74357 wiper CLOSED | ||||||||||
Comment by Andris Zeila [ 2017 Nov 08 ] | ||||||||||
(5) [S] Missing filter by object type (object=0) in "update events set name='%s' where objectid=%d and source=%d" "update problem set name='%s' where objectid=%d and source=%d" abs RESOLVED in r74364 Added the additional filter by "object" for EVENT_OBJECT_TRIGGER (0) wiper CLOSED | ||||||||||
Comment by Andris Zeila [ 2017 Nov 08 ] | ||||||||||
(6) [S] As all existing internal trigger events will have the same name they can be updated with one sql statement similarly to internal item events. abs RESOLVED in r74368 Moved the internal trigger event and problem updates outside of the wiper CLOSED | ||||||||||
Comment by Andris Zeila [ 2017 Nov 08 ] | ||||||||||
(7) DBbegin_multiple_update() must be used when using multiple update statements in one patch. abs RESOLVED in r74391 Changed the multiple updates inside while loops for using the wiper CLOSED | ||||||||||
Comment by Andris Zeila [ 2017 Nov 08 ] | ||||||||||
(8) It might be better to explicitly set event names to empty strings instead of leaving them NULL and relying that NULL fields will be converted to strings somewhere during insertion process. abs RESOLVED in r74393 Explicitly pass the empty string ("") to the add_event() calls that must not store wiper Passing empty error string and using it as name would work for internal OK events, though it might be a bit strange. However for discovery and autoregistration events event name would still be NULL. Maybe the simplest way would be using ZBX_NULL2EMPTY_STR() macro for event name in db_insert parameters. abs RESOLVED in r74402 and r74405 Reverted the previous change to the original behavior in r74402 and use wiper CLOSED | ||||||||||
Comment by Andrea Biscuola (Inactive) [ 2017 Nov 09 ] | ||||||||||
(9) All the update patches need to be splitted in separated chunks. abs RESOLVED in r74390 All the database upgrades are performed one by one now in different wiper CLOSED | ||||||||||
Comment by Andris Zeila [ 2017 Nov 09 ] | ||||||||||
(10) DBpatch_3050006, DBpatch_3050007 must escape trigger name before updating events/problem table. abs RESOLVED in r74400 Escape the descriptions through DBdyn_escape_string(), no size wiper Fixed memory leak. Not related to this issue, but also changed the default internal event names to match style of trigger/item error messages. abs Looks OK. CLOSED | ||||||||||
Comment by Andris Zeila [ 2017 Nov 10 ] | ||||||||||
Server side tested | ||||||||||
Comment by Ivo Kurzemnieks [ 2017 Nov 23 ] | ||||||||||
(18) [D] API documentation must be updated. Miks.Kronkalns Updated API examples and object desriptions in:
RESOLVED iivs CLOSED | ||||||||||
Comment by Andrea Biscuola (Inactive) [ 2017 Dec 04 ] | ||||||||||
Released in
| ||||||||||
Comment by Andrea Biscuola (Inactive) [ 2017 Dec 05 ] | ||||||||||
Please assign to who should verify the fixes | ||||||||||
Comment by Andrey Melnikov [ 2017 Dec 05 ] | ||||||||||
r75329 broke event description - now problem widget show events with unresolved macros. > select * from events where name like "%{%" order by clock desc limit 15; +---------+--------+--------+----------+------------+-------+--------------+-----------+---------------------------------------+ | eventid | source | object | objectid | clock | value | acknowledged | ns | name | +---------+--------+--------+----------+------------+-------+--------------+-----------+---------------------------------------+ | 6064644 | 0 | 0 | 29304 | 1512495673 | 0 | 0 | 197491002 | Disc sg1 tempearture {ITEM.LASTVALUE} | | 6064601 | 0 | 0 | 29304 | 1512493873 | 1 | 0 | 856409455 | Disc sg1 tempearture {ITEM.LASTVALUE} | | 6064592 | 0 | 0 | 27446 | 1512492959 | 0 | 0 | 629279695 | Ping loss detected on {HOST.NAME} | | 6064591 | 0 | 0 | 26791 | 1512492953 | 0 | 0 | 499233859 | Ping loss detected on {HOST.NAME} | | 6064590 | 0 | 0 | 26795 | 1512492952 | 0 | 0 | 587624767 | Ping loss detected on {HOST.NAME} | | 6064589 | 0 | 0 | 27447 | 1512492899 | 0 | 0 | 573972479 | Ping loss is too high on {HOST.NAME} | | 6064588 | 0 | 0 | 26792 | 1512492893 | 0 | 0 | 240886016 | Ping loss is too high on {HOST.NAME} | | 6064587 | 0 | 0 | 26796 | 1512492892 | 0 | 0 | 526952880 | Ping loss is too high on {HOST.NAME} | | 6064582 | 0 | 0 | 27447 | 1512492599 | 1 | 0 | 861435755 | Ping loss is too high on {HOST.NAME} | | 6064583 | 0 | 0 | 27446 | 1512492599 | 1 | 0 | 861435755 | Ping loss detected on {HOST.NAME} | | 6064580 | 0 | 0 | 26792 | 1512492593 | 1 | 0 | 457567907 | Ping loss is too high on {HOST.NAME} | | 6064581 | 0 | 0 | 26791 | 1512492593 | 1 | 0 | 457567907 | Ping loss detected on {HOST.NAME} | | 6064578 | 0 | 0 | 26796 | 1512492592 | 1 | 0 | 836207547 | Ping loss is too high on {HOST.NAME} | | 6064579 | 0 | 0 | 26795 | 1512492592 | 1 | 0 | 836207547 | Ping loss detected on {HOST.NAME} | | 6064563 | 0 | 0 | 29304 | 1512490873 | 0 | 0 | 38660506 | Disc sg1 tempearture {ITEM.LASTVALUE} | +---------+--------+--------+----------+------------+-------+--------------+-----------+---------------------------------------+ 15 rows in set (0.43 sec) And how currently see in web interface triggers with {ITEM.LASTVALUE}macros? | ||||||||||
Comment by Andrea Biscuola (Inactive) [ 2017 Dec 06 ] | ||||||||||
When we started to implement this, there was a discussion on how | ||||||||||
Comment by Andrey Melnikov [ 2017 Dec 06 ] | ||||||||||
this change removed trigger description/event description expanding and now all old events in widget shows as 'Disc sg1 tempearture ITEM.LASTVALUE'. | ||||||||||
Comment by Andrea Biscuola (Inactive) [ 2017 Dec 06 ] | ||||||||||
Regarding the change in behavior of ITEM.LASTVALUE, I went on to check it Thanks for pointing it out. | ||||||||||
Comment by Andris Zeila [ 2017 Dec 11 ] | ||||||||||
(29) [S] The post database upgrade event/problem name update is implemented in svn://svn.zabbix.com/branches/dev/ZBXNEXT-4108_2 It basically supersedes server fixed in svn://svn.zabbix.com/branches/dev/ZBXNEXT-4108. I will review and port any relevant commits to the new branch shortly. Some rough performace data - 1m of events were converted in 2m 20s. During conversion 17mb of shared memory were used to cache historical (uint64) data. vso CLOSED | ||||||||||
Comment by Andris Zeila [ 2017 Dec 12 ] | ||||||||||
Released in:
| ||||||||||
Comment by Andrey Melnikov [ 2017 Dec 12 ] | ||||||||||
In real life upgrading tables takes AGE. 2525:20171212:175227.373 completed 21% of event name update 2525:20171212:175227.373 In substitute_simple_macros() data:'Ping loss ({#ITEM.VALUE})', type=16 2525:20171212:175227.373 End substitute_simple_macros() data:'Ping loss ({#ITEM.VALUE})' 2525:20171212:175227.373 query [txnlev:1] [select eventid,source,object,objectid,clock,value,acknowledged,ns,name from events where source=0 and object=0 and objectid=26613 order by eventid] 2525:20171212:175227.387 In substitute_simple_macros() data:'Ping loss ({ITEM.VALUE})', type=16 2525:20171212:175227.387 In DBitem_value() 2525:20171212:175227.387 In get_N_itemid() expression:'({TRIGGER.VALUE}=0 and {51863}>33) or ({TRIGGER.VALUE}=1 and {51864}>0)' N_functionid:1 2525:20171212:175227.387 End of get_N_itemid():SUCCEED 2525:20171212:175227.387 query [txnlev:1] [select value_type,valuemapid,units from items where itemid=102185] 2525:20171212:175227.387 In zbx_vc_get_value() itemid:102185 value_type:0 timestamp:1456407361.374212810 2525:20171212:175227.387 In zbx_history_get_values() itemid:102185 value_type:0 start:1456407360 count:0 end:1513090347 2525:20171212:175227.387 query [txnlev:1] [select clock,ns,value from history where itemid=102185 and clock>1456407360 and clock<=1513090347] 2525:20171212:175507.765 End of zbx_history_get_values():SUCCEED values:268761 2525:20171212:175507.790 In zbx_history_get_values() itemid:102185 value_type:0 start:0 count:1 end:1456407360 2525:20171212:175507.790 query [txnlev:1] [select clock,ns,value from history where itemid=102185 and clock>0 and clock<=1456407360 order by clock desc limit 1] 2525:20171212:175507.806 End of zbx_history_get_values():SUCCEED values:0 2525:20171212:175507.806 In zbx_history_get_values() itemid:102185 value_type:0 start:1513082760 count:0 end:1513082761 2525:20171212:175507.806 query [txnlev:1] [select clock,ns,value from history where itemid=102185 and clock=1513082761] 2525:20171212:175507.807 End of zbx_history_get_values():SUCCEED values:1 2525:20171212:175507.835 End of zbx_vc_get_value():FAIL cache_used:1 2525:20171212:175507.835 End of DBitem_value():FAIL 2525:20171212:175507.835 cannot resolve macro '{ITEM.VALUE}' 2525:20171212:175507.835 End substitute_simple_macros() data:'Ping loss (*UNKNOWN*)' 2525:20171212:175507.835 In zbx_vc_clean() 2525:20171212:175507.835 End of zbx_vc_clean() 2525:20171212:175507.835 In substitute_simple_macros() data:'Ping loss ({#ITEM.VALUE})', type=16 2525:20171212:175507.835 End substitute_simple_macros() data:'Ping loss ({#ITEM.VALUE})' MariaDB [zabbix]> select eventid,source,object,objectid,clock,value,acknowledged,ns,name from events where source=0 and object=0 and objectid=26613 order by eventid; +---------+--------+--------+----------+------------+-------+--------------+-----------+--------------------------+ | eventid | source | object | objectid | clock | value | acknowledged | ns | name | +---------+--------+--------+----------+------------+-------+--------------+-----------+--------------------------+ | 5303507 | 0 | 0 | 26613 | 1456407361 | 0 | 0 | 374212810 | Ping loss ({ITEM.VALUE}) | +---------+--------+--------+----------+------------+-------+--------------+-----------+--------------------------+ 1 row in set (0.02 sec) One event in table, but server fetch ALL values from table (268761) - for what ? vso Thank you for your report, so it was caching 656 days of history to calculate item value at the time of the event and it took 3 minutes, this does not look good, this issue also looks similar to | ||||||||||
Comment by Andrey Melnikov [ 2017 Dec 12 ] | ||||||||||
Standard rotational SATA drives in RAID-1 set. I'm slightly hacked valuecache and upgrade process on same database took: 4579:20171212:222245.160 query [txnlev:0] [select taskid from task where type=5 and status=1] 4579:20171212:222245.179 query [txnlev:1] [begin;] 4579:20171212:222245.180 starting event name update forced by database upgrade 4579:20171212:222245.180 query [txnlev:1] [select count(*) from triggers] 4579:20171212:222245.181 query [txnlev:1] [select triggerid,description,expression,priority,comments,url,recovery_expression,recovery_mode,value from triggers order by triggerid] 4579:20171212:222245.186 In substitute_simple_macros() data:'Processor load is too high on {HOST.NAME}', type=16 ..... 4579:20171212:222314.374 event name update completed 4579:20171212:222314.374 query [txnlev:1] [delete from task where taskid=1] 4579:20171212:222314.602 query [txnlev:1] [commit;] 30 seconds. | ||||||||||
Comment by Vladislavs Sokurenko [ 2017 Dec 12 ] | ||||||||||
That's great, what did you do ? | ||||||||||
Comment by Andris Zeila [ 2017 Dec 13 ] | ||||||||||
Yes, this is know design flaw. Normally it works okayish, but can cause problems (mostly - wasted memory usage) with large timeshift ranges in trigger functions. The cache was used to improve processing for next events of the same trigger. However in hindsight such situation (when there are enough events generated by one trigger to justify value caching) is quite rare and it would be better to turn value cache off (which can be done manually in configuration files for now). | ||||||||||
Comment by Andris Zeila [ 2017 Dec 14 ] | ||||||||||
Did some 'stress' testing with 1m events, 1m history and trigger having 2 functions and description having {ITEM.VALUE1}, {ITEM.VALUE2} macros. Event/problem update took ~10 minutes (i7 cpu, ssd). | ||||||||||
Comment by Andris Zeila [ 2017 Dec 14 ] | ||||||||||
Released in:
Note that the previous release incorrectly expanded {ITEM.VALUEN} macros for N>1. If somebody have already applied the update and wants to have the event/problem names recalculated - it can be forced with following steps:
| ||||||||||
Comment by MATSUDA Daiki [ 2018 Oct 16 ] | ||||||||||
document has a typo. https://www.zabbix.com/documentation/4.0/manual/introduction/whatsnew400#problem_name_generation Now problem and event names are stored directly in the event and problem tables at the moment when an correct is 'events and problem tables'. Miks.Kronkalns Thank you! I have fixed it. |