[ZBXNEXT-1804] Server-proxy data exchange optimization Created: 2013 Jun 27  Updated: 2024 Apr 10  Resolved: 2017 Mar 29

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Proxy (P), Server (S)
Affects Version/s: 2.0.6
Fix Version/s: 3.4.0alpha1, 3.4 (plan)

Type: Change Request Priority: Minor
Reporter: Corey Shaw Assignee: Unassigned
Resolution: Fixed Votes: 15
Labels: performance, slow, synchronization
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Causes
causes ZBX-12974 Incorrect handling of passive proxy w... Closed
Duplicate
duplicates ZBX-5448 Zabbix proxy datasender doesn't manag... Closed
is duplicated by ZBX-12964 Priority of Data Processing from Prox... Closed
Team: Team C
Sprint: Sprint 1, Sprint 2, Sprint 3, Sprint 4

 Description   

High latency links can cause significant problems with the proxy sending data to the Zabbix server if the proxy receives large amounts of data. Here's an example:

1. I have a Zabbix server located in Texas, USA.
2. I have a Zabbix proxy (virtual machine - 2 CPU 4GB RAM, sqlite3) located in Texas, USA.
3. I have another Zabbix proxy (virtual machine - 4 CPU 8GB RAM, tried with sqlite3 and mysql) located in Singapore.

With testing, I found that the proxy in Texas can support at least 1500 nvps. I didn't have any more monitoring to throw at it, but it was able to handle the incoming data as well as send it to the Zabbix server without any backlog.

The proxy in singapore could receive large amounts of data for its region (was collecting 500 nvps), but it was unable to send data to the Zabbix server fast enough to prevent a backlog. With a 229ms latency between it and the server in Texas, it was only able to send 1000 values (the current hardcoded max) to the Zabbix server every 2-3 seconds. Inspecting a packet dump showed what was happening. The high latency was causing the transfer to go slow for two reasons:

1. ACK/SYN packets obviously had high amounts of latency due to the distance. The initial connection was slow due to the latency.
2. Send/Receive windows in TCP cause multiple ACKs to occur during the transfer, plus they keep the initial transfer rate slow.
3. As a result of #3, and the behavior of Zabbix to not have a persistent connection, each upload to the Zabbix server of 1000 values takes roughly 2-3 seconds.

What I propose here (and Richlv mentioned in IRC ) is that it would be helpful to have a configuration value for the quantity of values which the Zabbix proxy will send in one connection.

Another thing (that would be far more difficult to implement I'm sure) would be to have a persistent connection to the Zabbix server from the proxy (but that should be another ticket ).



 Comments   
Comment by Alexei Vladishev [ 2013 Jul 19 ]

I believe it should be resolved by allowing server receive incoming data with delayed processing. A configuration parameter for adjusting number of values would help as well, however it may lead to problems if set too high. Zabbix server could be overloaded if we send, for example, 100M of values in one go. There must be some smart logic implemented on server side to avoid it.

Comment by Marc [ 2014 Apr 15 ]

What about having support for multiple data sender processes?
Right order/timing could become a challenge, though.

Comment by Florian Requardt [ 2014 Apr 15 ]

This seems to be the bottleneck of my proxy as well. So a big +1 from my site as well...

Wouldn´t having support for multiple data sender processes kick the door for building load balanced active/active zabbix cluster server/proxies?

Comment by Oleksii Zagorskyi [ 2016 Apr 14 ]

I think it will be ok to close this one as duplicate of ZBX-5448, closing.

Comment by Alexander Vladishev [ 2016 Sep 24 ]

This issue wasn't fixed in ZBX-5448. Reopening.

Comment by Andris Zeila [ 2016 Nov 16 ]

The number of packets sent (and connections made) from proxy to server was reduced by:

  • sending history, discovery, auto registration and host availability data in one packet
  • increasing the records per packet limit to 10000 and also limiting packet by maximum size (128M is the largest packet accepted by server)

A new request proxy data was added to provide this functionality. However server will still support the old server-proxy data exchange protocol based on host availability, history data, discovery data and auto registration requests, providing backwards compatibility with older proxies.

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-1804

Comment by Oleksii Zagorskyi [ 2016 Nov 17 ]

(1) [D] Protocol documentation changes should be listed here http://zabbix.org/wiki/Docs/protocols#Protocol_changes
New page should be created here http://zabbix.org/wiki/Docs/protocols/zabbix_proxy/3.4

martins-v I think these pages could be updated by community resources.

Comment by Oleksii Zagorskyi [ 2016 Nov 17 ]

I wonder whenever new limit in 10000 values will be sufficient for all cases or we still need a new ZBXNEXT to be able to adjust it on user level without proxy recompilation ...

Comment by dimir [ 2016 Nov 17 ]

There are 3 options:

  1. leave it as it is
  2. make configurable in proxy config
  3. make configurable in front-end

Making it configurable could be useful, as sasha suggested on a small embedded devices with small amount of memory. Making it configurable in the config file would be the easiest but that would require proxy restart. Making it configurable in front-end sounds too complicated. But I think option 2 would be a good choice to have it implemented right in the scope of this task.

zalex_ua if a proxy is running on an embedded device, then it supposedly does not collect a lot of NVPS, but we need to remember about connections lost/recovery and flushing backlog, where you could be right.
Yes, I support the value to ne configurable, but in a short discussion with wiper during spec preparation he answered that we are not going to do it configurable. I'm not sure why.
That's why I asked about a new ZBXNEXT.
I'll more than happy if it will be implemented in this task, especially that it's a trivial sub-task for zabbix C devs .
Also - keep such changes together in single major version would be much more logical than split them among different major versions....

<dimir> It was decided to keep the maximum history data records hard-coded, so feel free to create a ZBXNEXT.

zalex_ua Created ZBXNEXT-3566.
CLOSED

Comment by richlv [ 2016 Nov 17 ]

if it's not implemented here, i'd vote on a new feature request to make it configurable.

Comment by dimir [ 2016 Nov 21 ]

(2) [G] Debug info should be removed from src/zabbix_agent/zabbix_agentd.c:

-                       PARM_OPT,       SEC_PER_MIN,            SEC_PER_HOUR},
+                       PARM_OPT,       1,              SEC_PER_HOUR},

wiper RESOLVED in r64202

<dimir> CLOSED

Comment by dimir [ 2016 Nov 22 ]

From this point server will track proxy version and support old data exchange format (3.2 vs 3.4). But this will be undocumented feature.

Comment by dimir [ 2016 Nov 23 ]

(3) [PS] Just a suggestion. Currently we have 2 sets of functions, for parsing data from proxy up to 3.2 and 3.3 (and later). Newer set has suffix *_33(). I suggest to suffix the older ones with *_32() and remove the suffix from new functionality instead. What do you think?

<dimir> We have decided it is not good because if we would need a new set for another version we'd need to rename again.

CLOSED

Comment by dimir [ 2016 Nov 23 ]

(4) [PS] Fixed some uninitialized variable warnings in r63960. Please review.

wiper Thanks, CLOSED

Comment by dimir [ 2016 Nov 25 ]

(5) [S] In proxypoller.c:proxy_send_configuration() seems error is never freed.

wiper RESOLVED in r64203

<dimir> CLOSED

Comment by dimir [ 2016 Nov 25 ]

(6) [S] In proxypoller.c:proxy_send_configuration() zbx_json_open() is called without checking return value.

wiper RESOLVED in r64205

<dimir> We should discuss if invalid json on return is considered a network error.

REOPENED

<dimir> It was decided to do nothing in this case as first, this is too unlikely to ever happen secondly, handling this situation properly would bring additional complexity in the code.

CLOSED

Comment by dimir [ 2016 Nov 28 ]

(7) [I] I don't know if we still need to set it, but seems new files are without svn:eol-style property set:

  • src/zabbix_server/trapper/proxydata.c
  • src/zabbix_server/trapper/proxydata.h

wiper
RESOLVED in r64206

<dimir> CLOSED

Comment by dimir [ 2016 Dec 06 ]

Comparison of old and new server-proxy data exchange functionality with different network conditions. The values were sent from proxy to server right after server start-up:

network conditions encryption values old new
    100000 4s 4s
20 Kbit limit   100000 11m27s 6m18s
200ms latency   100000 3m17s 28s
200ms latency yes 100000 3m23s 58s
1s latency   100000 10m13s 1m59s
Comment by dimir [ 2016 Dec 06 ]

Tested.

Comment by Andris Zeila [ 2016 Dec 07 ]

Released in:

  • pre-3.3.0 r64235
Comment by Andris Zeila [ 2016 Dec 07 ]

Documentation:

<dimir> Looks good.

sasha CLOSED

Comment by Andris Zeila [ 2016 Dec 08 ]

(8) Coverity errors:


** CID 154898:  Null pointer dereferences  (FORWARD_NULL)
/src/libs/zbxdbhigh/proxy.c: 2328 in process_history_data_value()


________________________________________________________________________________________________________
*** CID 154898:  Null pointer dereferences  (FORWARD_NULL)
/src/libs/zbxdbhigh/proxy.c: 2328 in process_history_data_value()
2322            if (SUCCEED == in_maintenance_without_data_collection(item->host.maintenance_status,
2323                            item->host.maintenance_type, item->type) &&
2324                            item->host.maintenance_from <= value->ts.sec)
2325                    return FAIL;
2326
2327            /* empty values are only allowed for meta information update packets */
>>>     CID 154898:  Null pointer dereferences  (FORWARD_NULL)
>>>     Comparing "value->value" to null implies that "value->value" might be null.
2328            if (NULL == value->value && 0 == value->meta)
2329            {
2330                    zabbix_log(LOG_LEVEL_DEBUG, "item %s value is empty", item->key_orig);
2331                    return FAIL;
2332            }
2333

** CID 154897:  Error handling issues  (CHECKED_RETURN)
/src/libs/zbxdbhigh/proxy.c: 3189 in process_discovery_data_contents()


________________________________________________________________________________________________________
*** CID 154897:  Error handling issues  (CHECKED_RETURN)
/src/libs/zbxdbhigh/proxy.c: 3189 in process_discovery_data_contents()
3183                            goto json_parse_error;
3184
3185                    if (SUCCEED == zbx_json_value_by_name(&jp_row, ZBX_PROTO_TAG_PORT, tmp, sizeof(tmp)))
3186                            port = atoi(tmp);
3187
3188                    zbx_json_value_by_name_dyn(&jp_row, ZBX_PROTO_TAG_VALUE, &value, &value_alloc);
>>>     CID 154897:  Error handling issues  (CHECKED_RETURN)
>>>     Calling "zbx_json_value_by_name" without checking return value (as is done elsewhere 47 out of 48 times).
3189                    zbx_json_value_by_name(&jp_row, ZBX_PROTO_TAG_DNS, dns, sizeof(dns));
3190
3191                    if (SUCCEED == zbx_json_value_by_name(&jp_row, ZBX_PROTO_TAG_STATUS, tmp, sizeof(tmp)))
3192                            status = atoi(tmp);
3193
3194                    if (0 == last_druleid || drule.druleid != last_druleid)

While both are false positive, the code should be changed avoid basing it on hidden business logic.

RESOLVED in r64285

<dimir> CLOSED

Comment by Andrey Melnikov [ 2016 Dec 12 ]

proxy_data_sender() leak 16K allocated memory for json in every call.

Comment by dimir [ 2016 Dec 12 ]

Thank you!

Comment by dimir [ 2016 Dec 12 ]

(9) [S] Memory leak in datasender.c:proxy_data_sender() json object is never freed:

zbx_json_init(&j, 16 * ZBX_KIBIBYTE);

wiper RESOLVED in rr64357

<dimir> Suggestion to free memory ASAP in r64369, otherwise looks good. Feel free to close.

wiper Thanks, CLOSED

Comment by Andris Zeila [ 2016 Dec 14 ]

Last fixes released in:

  • pre-3.3.0 r64430
Comment by Andrey Melnikov [ 2017 Jan 31 ]

After this merge zabbix_sender running on server host can't update trapper items for any host if this host monitored with proxy.
This intentional restriction?

Comment by richlv [ 2017 Jan 31 ]

lynxchaus, that has been like this always - unlikely it's related to this issue

Comment by Andris Zeila [ 2017 Feb 02 ]

(10) [S] Uninitialized (old) variable is used when logging proxy configuration sending notifications. It should be something like this:

Index: src/zabbix_server/trapper/proxyconfig.c
===================================================================
--- src/zabbix_server/trapper/proxyconfig.c	(revision 65497)
+++ src/zabbix_server/trapper/proxyconfig.c	(working copy)
@@ -38,7 +38,7 @@
 void	send_proxyconfig(zbx_socket_t *sock, struct zbx_json_parse *jp)
 {
 	const char	*__function_name = "send_proxyconfig";
-	char		host[HOST_HOST_LEN_MAX], *error = NULL;
+	char		*error = NULL;
 	struct zbx_json	j;
 	DC_PROXY	proxy;
 
@@ -68,18 +68,18 @@
 	{
 		zbx_send_response(sock, FAIL, error, CONFIG_TIMEOUT);
 		zabbix_log(LOG_LEVEL_WARNING, "cannot collect configuration data for proxy \"%s\" at \"%s\": %s",
-				host, sock->peer, error);
+				proxy.host, sock->peer, error);
 		goto clean;
 	}
 
 	zabbix_log(LOG_LEVEL_WARNING, "sending configuration data to proxy \"%s\" at \"%s\", datalen " ZBX_FS_SIZE_T,
-			host, sock->peer, (zbx_fs_size_t)j.buffer_size);
+			proxy.host, sock->peer, (zbx_fs_size_t)j.buffer_size);
 	zabbix_log(LOG_LEVEL_DEBUG, "%s", j.buffer);
 
 	if (SUCCEED != zbx_tcp_send_to(sock, j.buffer, CONFIG_TRAPPER_TIMEOUT))
 	{
 		zabbix_log(LOG_LEVEL_WARNING, "cannot send configuration data to proxy \"%s\" at \"%s\": %s",
-				host, sock->peer, zbx_socket_strerror());
+				proxy.host, sock->peer, zbx_socket_strerror());
 	}
 clean:
 	zbx_json_free(&j);

wiper RESOLVED in r65539

vjaceslavs CLOSED

Comment by Alexander Vladishev [ 2017 Feb 06 ]

(11) [S] Errors in log file after server upgrade (proxy still 3.2.x).

 19367:20170206:103410.225 server #23 started [proxy poller #1]
...
 19367:20170206:105158.255 obtained data from proxy "proxy-dmz": [{"data":[{"host":"svnzbx-r","key":"icmppingsec[,3,500,,,max]","clock":1486371114,"ns":387710248,"value":"0.000250"},{"host":"svnzbx-r","
key":"icmppingloss[,3,500]","clock":1486371114,"ns":387710248,"value":"0.000000"},{"host":"svnzbx-r","key":"icmpping[,3,500]","clock":1486371114,"ns":387710248,"value":"1"},{"host":"svnzbx-r","key":"icm
ppingsec[,3,500,,,avg]","clock":1486371114,"ns":387710248,"value":"0.000223"},{"host":"svnzbx-r","key":"icmppingsec[,3,500,,,min]","clock":1486371114,"ns":387710248,"value":"0.000180"},{"host":"exam.zab
bix.com","key":"net.if.in[eth1]","clock":1486371115,"ns":131635027,"value":"0"},{"host":"svnzbx-rw","key":"system.cpu.load[percpu,avg1]","clock":1486371115,"ns":132622153,"value":"0.000000"},{"host":"or
g","key":"net.if.in[lo,bytes]","clock":1486371115,"ns":133533140,"value":"951031252.000000"},{"host":"jira","key":"system.cpu.util[,iowait,avg1]","clock":1486371115,"ns":134311177,"value":"2.377155"},{"
host":"mail","key":"vm.memory.size[free]","clock":1486371115,"ns":135024033,"value":"25526272"},{"host":"exam.zabbix.com","key":"net.if.in[lo]","clock":1486371115,"ns":550569082,"value":"6448870610"},{"
host":"org","key":"system.uptime","clock":1486371115,"ns":551448114,"value":"507912"},{"host":"org","key":"system.cpu.load[percpu,avg15]","clock":1486371115,"ns":552268948,"value":"0.310000"},{"host":"s
vnzbx-r","key":"system.cpu.load[percpu,avg1]","clock":1486371115,"ns":553174704,"value":"0.000000"},{"host":"svnzbx-rw","key":"system.cpu.util[,nice,avg1]","clock":1486371115,"ns":554018719,"value":"0.0
00000"},{"host":"mail","key":"log[/var/log/syslog,\"mysqld.*ERROR\",,,skip]","clock":1486371114,"ns":793753477,"lastlogsize":1561663,"mtime":0},{"host":"mail","key":"system.cpu.util[,guest,avg1]","clock
":1486371116,"ns":555107826,"value":"0.000000"},{"host":"jira","key":"system.cpu.util[,nice,avg1]","clock":1486371116,"ns":555664433,"value":"0.000000"},{"host":"org","key":"vfs.dev.read[,operations]","
clock":1486371116,"ns":556530617,"value":"2805417.000000"},{"host":"svnzbx-r","key":"net.if.total[lo,errors]","clock":1486371116,"ns":557520991,"value":"0.000000"},{"host":"org","key":"net.if.out[eth0,b
ytes]","clock":1486371116,"ns":793532393,"value":"7674616109.000000"},{"host":"svnzbx-rw","key":"system.cpu.util[,softirq,avg1]","clock":1486371116,"ns":794280027,"value":"0.000000"},{"host":"org","key"
:"system.cpu.load[percpu,avg1]","clock":1486371116,"ns":820927512,"value":"0.660000"},{"host":"exam.zabbix.com","key":"net.if.out[eth0]","clock":1486371116,"ns":932013940,"value":"21528143720"},{"host":
"mail","key":"vm.memory.size[pfree]","clock":1486371116,"ns":794916105,"state":1,"value":"Invalid first parameter."}],"clock":1486371117,"ns":149706051}]
 19367:20170206:105158.255 End of recv_data_from_proxy():SUCCEED
 19367:20170206:105158.255 In zbx_send_response()
 19367:20170206:105158.255 zbx_send_response() '{"response":"success"}'
 19367:20170206:105158.255 End of zbx_send_response():SUCCEED
 19367:20170206:105158.255 In disconnect_proxy()
 19367:20170206:105158.255 End of disconnect_proxy()
 19367:20170206:105158.255 End of get_data_from_proxy():SUCCEED
 19367:20170206:105158.255 In process_client_history_data()
 19367:20170206:105158.255 In parse_history_data()
 19367:20170206:105158.255 End of parse_history_data():SUCCEED processed:25/25
 19367:20170206:105158.255 In process_history_data()
zabbix_server [19367]: ERROR [file:proxy.c,line:2400] Something impossible has just happened.
 19367:20170206:105158.255 delay period [120/6-7,00:00-24:00]
 19367:20170206:105158.255 120 sec at 6-7,00:00-24:00
 19367:20170206:105158.255 In check_time_period() period:'6-7,00:00-24:00'

wiper Item metadata processing was messed up. To repeat create log item with regexp filtering and add non matching lines to log file. This will cause proxy to send metadata update packets (without item value) and server will throw the above error.
RESOLVED in r65540

vjaceslavs CLOSED

Comment by Alexander Vladishev [ 2017 Feb 06 ]

(12) [PS] After server and proxy upgrade

 19369:20170206:105750.251 proxy "proxy-dmz" at "192.168.***.***" returned invalid host availability data: Can't find pair with name "data"
 19367:20170206:105751.654 proxy "proxy-dmz" at "192.168.***.***" returned invalid host availability data: Can't find pair with name "data"
 19367:20170206:105752.656 proxy "proxy-dmz" at "192.168.***.***" returned invalid host availability data: Can't find pair with name "data"

wiper The proxy version was updated only after sending configuration data. Server was still spamming proxy with now unsupported 'host availability' request resulting in error (Deprecated request) response.
I added proxy version update and better proxy response handling also for other requests. Now there will be only single warining:

23370:20170206:145550.581 proxy "Active proxy" at "127.0.0.1" returned invalid host availability data: Deprecated request

Technically we could hide it, but I'm not sure it's a good idea.

RESOLVED in r65542

vjaceslavs CLOSED

Comment by Vladislavs Sokurenko [ 2017 Feb 07 ]

(13) [PS] There are several unused parameter warnings related to this development.

proxypoller.c: In function ‘proxy_check_error_response’:
proxypoller.c:259:55: warning: unused parameter ‘proxy’ [-Wunused-parameter]
 static int proxy_check_error_response(const DC_PROXY *proxy, const struct zbx_json_parse *jp, char **error)
trapper.c: In function ‘send_proxyhistory’:
trapper.c:153:67: warning: unused parameter ‘ts’ [-Wunused-parameter]
 static void send_proxyhistory(zbx_socket_t *sock, zbx_timespec_t *ts)
proxy.c: In function ‘get_active_proxy_from_request’:
proxy.c:298:82: warning: unused parameter ‘sock’ [-Wunused-parameter]
 int get_active_proxy_from_request(struct zbx_json_parse *jp, const zbx_socket_t *sock, DC_PROXY *proxy,
                                                                                  ^
proxy.c: In function ‘proxy_item_validator’:
proxy.c:2860:62: warning: unused parameter ‘sock’ [-Wunused-parameter]
 static int proxy_item_validator(DC_ITEM *item, zbx_socket_t *sock, void *args, char **error)
                                                              ^
proxy.c:2860:87: warning: unused parameter ‘error’ [-Wunused-parameter]
 static int proxy_item_validator(DC_ITEM *item, zbx_socket_t *sock, void *args, char **error)
dbcache.c: In function ‘dc_add_history’:
dbcache.c:2576:56: warning: unused parameter ‘value_type’ [-Wunused-parameter]
 void dc_add_history(zbx_uint64_t itemid, unsigned char value_type, unsigned char item_flags, AGENT_RESULT *result,

wiper One was already fixed in trunk, rest fixed in r65585, r65586, r65587
RESOLVED

vjaceslavs CLOSED

Comment by Andris Zeila [ 2017 Feb 09 ]

The latest fixes released in:

  • pre-3.3.0 r65607
Comment by Alexander Vladishev [ 2017 Feb 09 ]

sub-issue still open: (1)

Comment by Andris Zeila [ 2017 Feb 10 ]

Waiting on decision where or what documentation must be updated/published.

martins-v A summary of the changes should also be added to 'What's new' and the upgrade notes.

Comment by Andris Zeila [ 2017 Mar 08 ]

Updated protocol description in https://www.zabbix.com/documentation/3.4/manual/appendix/protocols/server_proxy

Comment by Andris Zeila [ 2017 Mar 09 ]

Please review

Comment by dimir [ 2017 Mar 28 ]

Looks good.

Comment by Filipe Paternot [ 2017 Mar 28 ]

Since we are talking about server-proxy optimization, couldn't we include ZBX-11782 here as well? It looks closely related.

Comment by richlv [ 2017 Jun 22 ]

subissue (1) not closed - the last comment there is slightly surprising

martins-v Thanks, Rich. I just crossed out my comment there.

<dimir> So as I understand subissue (1) is still not resolved?

martins-v Some changes have been listed in http://zabbix.org/wiki/Docs/protocols#Protocol_changes. But there is no new page on proxy protocol for 3.4.

Comment by Oleksii Zagorskyi [ 2018 Jan 11 ]

Is that possible that ZBX_MAX_HRECORDS (which is still 1000) makes sense to increase in 3.4?
One zabbix user said, that it helped to speedup proxy backlog synchronization.

wiper Zabbix reads history in batches of ZBX_MAX_HRECORDS records while the total number of records is less than ZBX_MAX_HRECORDS_TOTAL + ZBX_MAX_HRECORDS (it's also limited by the maximum size of packet server can accept).

If selecting 10k records one time is noticeably faster then selecting 1k records 10 times - then yes, it does. Apart from database access there should not be any difference.

Comment by Sudha K [ 2018 Jun 19 ]

Hello ,

I want to know more about ZBX_MAX_HRECORDS and ZBX_MAX_HRECORDS_TOTAL parameters in proxy.h file of zabbix proxy 3.4.9.

Please can anyone explain.

Generated at Sat Apr 12 04:19:12 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.