This issue covers low-connection-number-high-data-throughput cases (namely sender -> server/proxy, active agent -> server/proxy, proxy -> server communications). Related issue for high-connection-number-low-data-throughput cases is ZBXNEXT-3214.
Initially both agent and proxy were designed to send same data multiple times until they would get a reply that data reached destination and was processed. Then
ZBX-2285 changed their behaviour to the opposite. They assumed that if data was written to socket server will eventually receive and process this data. Then (I couldn't find exact point in time) proxy's behaviour was switched back to prevent data loss and now ZBX-10176 effectively requests the do the same for agent.
Neither of these approaches is ideal. When client fully relies on TCP implementation data may be lost if server hits timeout when reading data from TCP buffer or even earlier if something bad happens on TCP implementation level. When client always waits for reply from server there is a possibility of hitting timeout on client side if server is busy. This leads to data retransmission by client and reprocessing by server which makes it really complicated for Zabbix to recover from network downtimes.
Ideally client should send a short request message and wait for server to respond before sending data itself. Server response should contain an estimate of how much data it will be able to process. Client should send the desired amount of data and wait for server to confirm that data was successfully received and processed.
This involves changes to Zabbix protocols. Since daemons will be spending more time communicating (or waiting for replies) timeouts will need some adjustment. This will negatively affect performance, so it's worth implementing a sort of connection manager at the same time.