[ZBX-20187] Zabbix Proxy is not sending data fast enough when using Galera cluster as storage backend Created: 2021 Nov 08 Updated: 2024 Apr 12 Resolved: 2023 Mar 16 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | None |
Affects Version/s: | 5.4.6 |
Fix Version/s: | None |
Type: | Problem report | Priority: | Trivial |
Reporter: | Majd Sarhan | Assignee: | Igor Gorbach |
Resolution: | Cannot Reproduce | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: | image-2021-11-08-17-34-45-704.png zabbix-proxy.rar |
Description |
Steps to reproduce:
Result: when increasing debug level to 5, the following logs are repeated: 221:20211104:131507.233 proxy_get_history_data() 2 record(s) missing. Waiting 0.100000 sec, retrying. the function that fails is proxy_get_history_data. due to this, the Proxy fails to send data fast enough. so the data will be stuck in the queue for hours or even days Expected: data sender process not 100% busy |
Comments |
Comment by Igor Gorbach [ 2021 Nov 09 ] |
Hello! Has the same issue occuring on older Zabbix versions or you face it on 5.4.6 only? |
Comment by Majd Sarhan [ 2021 Nov 09 ] |
Hi, thanks alot actually, it's the first time we use the Zabbix proxy so yes. before we only used the Zabbix server.
|
Comment by Majd Sarhan [ 2021 Nov 10 ] |
Hi ,
|
Comment by Igor Gorbach [ 2021 Nov 10 ] |
Hello! |
Comment by Majd Sarhan [ 2021 Nov 11 ] |
please find attached logs of proxy with debug level 5 once i run single instance of mariadb it works fine |
Comment by Majd Sarhan [ 2021 Nov 17 ] |
any update on this issue ? |
Comment by Igor Gorbach [ 2023 Jan 17 ] |
Tried to reproduce on Galera 3-nodes Master-Master Cluster - no issues in 6.0/6.2 Zabbix 5.4 is an unsupported version. If the problem is actually on 6.0/6.2 - please, notify us with the exact reproducing steps |
Comment by Ash Crosby [ 2023 Feb 01 ] |
I hit exactly this issue with 6.2.6, against a 3-node MariaDB 10.5.18 Galera cluster, using an identical config for my servers. The servers run fine, but the proxies deliver a steady 10 items / second, which causes the proxy queues to rapidly fill up. After confirming that the bottleneck wasn't in the network or the databases, increasing the logging level for the data sender process revealed logs identical to the above - 10 sets of: 2 record(s) missing. No more retries. 2 record(s) missing. Waiting 0.100000 sec, retrying. After which it appears to send 10 records. I've replaced the databases with single instances now and the issue has disappeared, but it was consistently recreateable until I dropped Galera. If others can't recreate it, I'll spin up some more test instances and get more complete logs. |
Comment by Igor Gorbach [ 2023 Feb 02 ] |
Please, provide the details about your Galera cluster details as well - configuration, what have you using for requests re-direction in case of failover |
Comment by Rath [ 2024 Apr 12 ] |
Greetings! We see the same issue with a MariaDB Galera setup. The proxy is pulling data from its agents (5k+ items), but never sending it to the server.
Zabbix-Proxy: 6.4.13-1 MariaDB Server: 10.11.7
Even if there is just one node in the cluster - we hit this error: (at DebugLevel=4) > journalctl -u zabbix-proxy.service -f | grep proxy_get_history_data > proxy_get_history_data() 1 record(s) missing. No more retries
We traced its source to this area in the code: https://github.com/zabbix/zabbix/tree/master/src/libs/zbxproxybuffer I think it ocurrs at pb_history.c => pb_history_get_rows_db It feels like the proxy is stuck in an endless retry loop. Maybe because of some insert-delay caused by Galera? (as far as I understand it: proxy writes data to db and checks that all records that were written, do actually exist - before sending them to the server) See also the comment inside the code: > At least one record is missing. It can happen if some DB syncer process has started but not yet committed a transaction or a rollback occurred in a DB syncer.
For now we rolled-back the proxy to a standalone MariaDB instance as the data of the proxy is not (that) important. |