[#ZBX-9038] VMware item processing takes a lot of time

[ZBX-9038] VMware item processing takes a lot of time Created: 2014 Nov 13 Updated: 2024 Apr 10 Resolved: 2018 Dec 02
Status:	Closed
Project:	ZABBIX BUGS AND ISSUES
Component/s:	Proxy (P), Server (S)
Affects Version/s:	None
Fix Version/s:	4.0.3rc1, 4.2.0alpha2, 4.2 (plan)

Type:

Problem report

Priority:

Trivial

Reporter:

Andris Zeila

Assignee:

Michael Veksler

Resolution:

Fixed

Votes:

Labels:

performance, vmware

Remaining Estimate:

Not Specified

Time Spent:

Not Specified

Original Estimate:

Not Specified

Attachments:

vmware-optimized-requests.diff

Issue Links:

Duplicate
is duplicated by	~~ZBX-9226~~	Improve vmware data collection perfor...	Closed

Team:

Team A

Sprint:

Sprint 46, Nov 2018

Story Points:

Description

Currently most of the data retrieved from VMware service (vCenter or Hypervisor) are stored as were received - in XML format. The parsing is done every time a poller requests value.

There are two problems with it:

the vmware collector is locked while poller parses the required value. It's especially bad for discovery requests where multiple hypervisor/virtual machine data are parsed. However it also adds up when single requests are parsed.
usually data from vmware services is retrieved with less frequency than pollers are querying it. This causes the same value being parsed mulitple times.

To fix it vmware collector must parse the data upon receiving and store already parsed values. This will speed up the value retrieval process and also reduce the shared memory requirements.

Another places to improve:

increase lookup speed of performance entity counters (zbx_vmware_perf_entity_t:counters, use vector bsearch instead of search or switch to hashset)
store performance counters as integer type instead of string

Comments

Comment by Andris Zeila [ 2014 Nov 28 ]

Based on VMware support response regarding "-1" performance counter values

The value "-1" in VPX_STAT_X tables means that performance data were not available or the value of counter is out of valid range (e.g. any negative value).
You can safely ignore this value and treat this as data not available for this time.

Regarding possible causes what comes to my mind is external environment issue like network issue during the host is sending statistics to Vcenter server.

We should ignore -1 values and store performance counter values as unsigned 64 bit integers.

Comment by Andris Zeila [ 2015 Feb 02 ]

Currently we are retrieving top level properties - for example overallStatus, name, vm, summary, parent, datastore are retrieved for hypervisors. However often we really need only small subset of retrieved data. For example we aren't using summary/runtime, which is quite large. So instead of retrieving whole summary property we should specify detalized property paths:

            <urn:propSet>
               <urn:type>HostSystem</urn:type>
               <urn:pathSet>overallStatus</urn:pathSet>
               <urn:pathSet>name</urn:pathSet>
               <urn:pathSet>vm</urn:pathSet>
               <urn:pathSet>summary.quickStats</urn:pathSet>
               <urn:pathSet>summary.config</urn:pathSet>
               <urn:pathSet>summary.hardware</urn:pathSet>
               <urn:pathSet>parent</urn:pathSet>
               <urn:pathSet>datastore</urn:pathSet>
            </urn:propSet>

wiper The largest data requests were optimized in vmware-optimized-requests.diff patch. Tests showed that the retrieved configuration data size droped from 250k to 112k (1 hypervisor, 12 virtual machines). Given the simplicity of this patch - maybe we should move it to a separate issue and apply also to 2.2 version.

wiper Based on this a new issues (~~ZBX-9279~~) was created.

Comment by Andris Zeila [ 2016 Jan 26 ]

Hypervisor and virtual machine property lists were introduced in ~~ZBXNEXT-3106~~. Now the vmware collector will parse the received data and store as easily accessable property lists rather than single xml file, that has to be parsed for every vmware item check.

While ~~ZBXNEXT-3106~~ deals with the most frequently used data, there are still few places where the xml parsing could be optmized further:

the following keys still uses 'runtime' parsing of stored xml data to retrieve values:
- vmware.cluster.status
- vmware.eventlog
- vmware.version
- vmware.fullname
when creating hypervisor/virtual machine objects the corresponding xml data files are loaded in xml parser multiple times. This loading is quite expensive so the code should be restructurized to load the xml document once and then pass it to the corresponding initialization functions.

Comment by Glebs Ivanovskis (Inactive) [ 2017 Dec 05 ]

vmware.eventlog took a step in the correct direction in ~~ZBX-12497~~.

Comment by Michael Veksler [ 2018 Nov 28 ]

Available in:

4.0.3rc1 r87392
4.2.0alpha2 (trunk) r87393

Generated at Wed Apr 24 01:11:51 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.

[ZBX-9038] VMware item processing takes a lot of time Created: 2014 Nov 13 Updated: 2024 Apr 10 Resolved: 2018 Dec 02