[ZBX-15428] PCRE regexp seems use much more CPU resources than POSIX Created: 2019 Jan 09 Updated: 2024 Apr 10 Resolved: 2019 Jan 18 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 4.0.3 |
Fix Version/s: | 4.0.4rc1, 4.2.0alpha3, 4.2 (plan) |
Type: | Problem report | Priority: | Critical |
Reporter: | Oleksii Zagorskyi | Assignee: | Vladislavs Sokurenko |
Resolution: | Fixed | Votes: | 1 |
Labels: | pcre, regexps | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: | 8.42_with_study_but_jit_not_available.png ZBX-15428-2.diff ZBX-15428-3.4.diff pcre.png pcre_8.0_with_study.png pcre_8.42_from_package_study_without_compile.png posix.png posix_compile_each_time.png regexp_performance.png regexp_posix.png results.png study_and_compile_each_execution.png with_pcre_study.png with_pcre_study_and_jit_exec.png without_pcre_study.png | ||||||||
Issue Links: |
|
||||||||
Team: | Team A | ||||||||
Sprint: | Sprint 48, Jan 2019 | ||||||||
Story Points: | 2 |
Description |
Looks like PCRE regexps use much more CPU resources than regexp in previous zabbix versions! We use regexps like "some|one|other|another" in trigger and its a trouble. This should be investigated and addressed.
There is also suggestion to move to pcre2 in |
Comments |
Comment by Glebs Ivanovskis [ 2019 Jan 09 ] |
Title says that "PCRE is slower than POSIX ERE", but Decsription compares PCRE to multiple str() calls. I'm pretty sure that POSIX ERE will be slower than equivalent str() calls as well. Where is the proof that PCRE is slower than POSIX ERE? |
Comment by Oleksii Zagorskyi [ 2019 Jan 09 ] |
Glebs, we did some quick tests, which proved that. |
Comment by Glebs Ivanovskis [ 2019 Jan 09 ] |
Slightly better workaround would be to use global regular expression with Expression type "Any character string included":
Underneath there is the same strcmp() function as in str(). |
Comment by Vladislavs Sokurenko [ 2019 Jan 09 ] |
Here is CPU usage of all execution when using PCRE: And POSIX: |
Comment by Glebs Ivanovskis [ 2019 Jan 09 ] |
What was 100%? |
Comment by Vladislavs Sokurenko [ 2019 Jan 09 ] |
I have another picture that might provide some insights about 100% (reading lots of values from cache), but it was long time ago and I might need to repeat tests again to refresh in my memory, main point is that changing to 4 regular expressions or str solved the issue for the user. |
Comment by Glebs Ivanovskis [ 2019 Jan 09 ] |
Maybe some performance can be gained using pcre_study()? Maybe Zabbix should optimize trigger expressions like {host:item.count(1m,<pattern>,regexp)}>3 and stop evaluation of count() early because exact value (100500 or 100501) does not matter? Maybe Zabbix server should at least log slow triggers? |
Comment by Vladislavs Sokurenko [ 2019 Jan 09 ] |
Use case here was to find if there are one of 4 strings in line received from log item and it was done for each incoming value with range of 5 minutes, this causes lots of pcre_exec. I think we should try pcre2 and see if it solves this issue first and then take it from there. |
Comment by Vladislavs Sokurenko [ 2019 Jan 09 ] |
Here is with study (actually study took 0% as it is done only when compiling but same compiled expression is reused): Here is with study and jit_exec Here is PCRE 8.42 without study: PCRE 8.0 with study but JIT not available: PCRE 8.42 with study but JIT not available: PCRE 8.42 from package with only extra requested by study:
Study and compile each time when executing PCRE: Study and compile each time when executing POSIX: Actual costs can be provided on demand Some info: |
Comment by Andris Zeila [ 2019 Jan 09 ] |
Another option would be to make patterns from the start of the input data, for example with syslog you can try ^[A-Z][a-z]+ +?[0-9]+ [0-9]+:[0-9]+:[0-9]+ [a-z]+ (some|one|other|another) or even ^.\{21}(some|one|other|another) Quick tests showed more than 10x performance increase.
|
Comment by Andris Mednis [ 2019 Jan 09 ] |
Five invaluable techniques to improve regex performance might help. |
Comment by Vladislavs Sokurenko [ 2019 Jan 10 ] |
Please try attached patch ZBX-15428-2.diff it will request the JIT compiler to generate code for complete matches. This should be faster if compiled regular expression is used for lots of values for example trigger: {Zabbix server:foo.regexp(some|one|other|another,5m)}=0 And foo item has 10000 values of length 300 bytes each zalex_ua Thanks, trying to test on affected zabbix server. |
Comment by Glebs Ivanovskis [ 2019 Jan 10 ] |
Do you mean PCRE2? (10.x) |
Comment by Vladislavs Sokurenko [ 2019 Jan 10 ] |
Sorry, this was with study but without JIT, so using study without JIT also gives performance improvement. |
Comment by Vladislavs Sokurenko [ 2019 Jan 10 ] |
Fixed in development branch: |
Comment by Glebs Ivanovskis [ 2019 Jan 12 ] |
Call graphs look cool, but are difficult to read information from. A summary table with numbers like here would be perfect, I think. |
Comment by Vladislavs Sokurenko [ 2019 Jan 13 ] |
Dear cyclone, thanks allot for your suggestions, pcre_study() does perform better based on graphs, regarding summary tables, I think there are allot of regex engine comparisons available online already, not sure if there will be anything more to add. |
Comment by Andris Zeila [ 2019 Jan 15 ] |
Tested the following patterns:
Tested 50k iterations with the following data Jan 15 09:36:30 wbox acvpnagent[1703]: Function: getProfilePath File: ../../vpn/CommonCrypt/Certificates/NSSCertUtils.cpp Line: 567 Invoked Function: getpwnam Return Code: -31457275 (0xFE200005) Description : CERTSTORE_ERROR_NULL_POINTER Jan 15 09:36:30 wbox acvpnagent[1703]: Function: InitNSS File: ../../vpn/CommonCrypt/Certificates/NSSCertUtils.cpp Line: 397 Invoked Function: CNSSCertUtils::getProfilePath Return Code: -31457275 (0xFE200005) Description: CERTSTORE_ERROR_NULL_POINTER Jan 15 09:36:30 wbox acvpnagent[1703]: Function: CNSSCertStore File: ../../vpn/CommonCrypt/Certificates/NSSCertStore.cpp Line: 76 Invoked Function: CNSSCertUtils::InitNSS Return Code: -31457275 (0xFE200005) Description: CERTSTORE_ERROR_NULL_POINTER Jan 15 09:36:30 wbox acvpnagent[1703]: Function: addNSSStore File: ../../vpn/CommonCrypt/Certificates/CollectiveCertStore.cpp Line: 1766 Invoked Function: CNSSCertStore::CNSSCertStore Return Code: -31457275 (0xFE200005) Description: CERTSTORE_ERROR_NULL_POINTER Jan 15 09:36:30 wbox acvpnagent[1703]: Function: OpenStores File: ../../vpn/CommonCrypt/Certificates/CollectiveCertStore.cpp Line: 423 Invoked Function: CCollectiveCertStore::addNSSStore Return Code: -31457275 (0xFE200005) Description: CERTSTORE_ERROR_NULL_POINTER Jan 15 09:36:30 wbox acvpnagent[1703]: Function: VerifyCertName File: ../../vpn/CommonCrypt/Certificates/VerifyServerName.cpp Line: 150 Certificate name verification has failed. Server Name: 129.33.247.60 Subject Alternate Name(s): vpnisv.isv.ihost.com Jan 15 09:36:30 wbox acvpnagent[1703]: Function: VerifyCertName File: ../../vpn/CommonCrypt/Certificates/OpenSSLCertUtils.cpp Line: 1421 Invoked Function: CVerifyServerName::VerifyCertName Return Code: -313 91725 (0xFE210013) Description: CERTIFICATE_ERROR_VERIFY_NAME_FAILED Jan 15 09:36:30 wbox acvpnagent[1703]: Function: analyzeHttpResponse File: ../../vpn/Agent/NetEnvironment.cpp Line: 1596 Invoked Function: CCertHelper::VerifyServerCertificate Return Code: -31391725 (0xFE210013) Description: CERTIFICATE_ERROR_VERIFY_NAME_FAILED server name: 1.1.1.1 Jan 15 09:36:30 wbox acvpnagent[1703]: Server certificate validation failed with the following errors: #011Certificate does not match the server name. Jan 15 09:36:30 wbox acvpnagent[1703]: Function: TestAccessToSG File: ../../vpn/Agent/NetEnvironment.cpp Line: 1298 Invoked Function: CNetEnvironment::analyzeHttpResponse Return Code: -28966897 (0xFE46000F) Description: NETENVIRONMENT_ERROR_CERT_VERIFICATION_FAILED:The server cert verification performed after the HTTPS probe has failed Jan 15 09:36:32 wbox acvpnagent[1703]: Function: OnOpenRequestComplete File: ../../vpn/Common/IP/HttpProbeAsync.cpp Line: 304 Invoked Function: CHttpSessionAsync::OnOpenRequestComplete Return Code: -31588307 (0xFE1E002D) Description: SOCKETTRANSPORT_ERROR_CONNECT_CANCELED:An asynchronous connection has been canceled during its initiation. Jan 15 09:36:32 wbox acvpnagent[1703]: Function: logProbeFailure File: ../../vpn/Agent/NetEnvironment.cpp Line: 1437 Invoked Function: CHttpProbeAsync::SendProbe Return Code: -27066354 (0xFE63000E) Description: HTTP_PROBE_ASYNC_ERROR_CANNOT_CONNECT HTTP (host: 1.1.1.1) Jan 15 09:36:32 wbox acvpnagent[1703]: Function: TestAccessToSG File: ../../vpn/Agent/NetEnvironment.cpp Line: 1390 Invoked Function: CNetEnvironment::analyzeHttpResponse Return Code: -28966899 (0xFE46000D) Description: NETENVIRONMENT_ERROR_PROBE_INCOMPLETE:Network Probe could not contact target Jan 15 09:36:32 wbox acvpnagent[1703]: Function: testNetwork File: ../../vpn/Agent/NetEnvironment.cpp Line: 783 Invoked Function: CNetEnvironment::IsSGAccessible Return Code: -28966899 (0xFE46000D) Description: NETENVIRONMENT_ERROR_PROBE_INCOMPLETE:Network Probe could not contact target Jan 15 09:36:32 wbox acvpnagent[1703]: Function: TestNetEnv File: ../../vpn/Agent/NetEnvironment.cpp Line: 268 Incomplete probe count is 3 Jan 15 09:36:32 wbox acvpnagent[1703]: Function: TestNetEnv File: ../../vpn/Agent/NetEnvironment.cpp Line: 370 Network accessible. Retesting connectivity to the secure gateway in 30 seconds. Jan 15 09:37:03 wbox acvpnagent[1703]: Function: getDnsConfiguration File: ../../vpn/Common/Utility/NetInterface_unix.cpp Line: 1253 nm_ip4_config_get_domains: Unable to get domain list. Jan 15 09:37:03 wbox acvpnagent[1703]: Function: getDnsConfiguration File: ../../vpn/Common/Utility/NetInterface_unix.cpp Line: 1266 nm_ip6_config_get_nameservers: Unable to get name server list. Jan 15 09:37:03 wbox acvpnagent[1703]: Function: EnumerateInterfaces File: ../../vpn/Common/Utility/NetInterface_unix.cpp Line: 1385 Invoked Function: CNetInterface::getDnsConfiguration Return Code: -32636923 (0xFE0E0005) Description: NETINTERFACE_ERROR_NULL_POINTER eno1 Jan 15 09:37:03 wbox acvpnagent[1703]: Function: getDnsConfiguration File: ../../vpn/Common/Utility/NetInterface_unix.cpp Line: 1198 nm_ip4_config_get_domains: Unable to get domain list. Total 1m matches performed, batch size is the number of matches without regular expression recompilation. |
Comment by Glebs Ivanovskis [ 2019 Jan 15 ] |
Looks like users will have to choose themselves which strategy to use... Which is somewhat logical if users can direct access to regexp engine and can use any pattern imaginable. |
Comment by Andris Zeila [ 2019 Jan 15 ] |
Study seems to have minimal perormance impact in simple cases and gives real improvement in some corner cases. While JIT seem to be better with batches of 8-32 values, depending on pattern. For trigger expressions a future optimization could be switching to JIT if regexp match has to be performed on 30+ values. wiper: Just to clarify, we are using pcre_study without jit now. |
Comment by Vladislavs Sokurenko [ 2019 Jan 15 ] |
Fixed in:
|
Comment by Glebs Ivanovskis [ 2019 Feb 01 ] |
Similar issue: |