[ZBX-27039] HTTP Agent items are getting randomly stuck/queued Created: 2025 Sep 29 Updated: 2025 Nov 05 |
|
| Status: | Need info |
| Project: | ZABBIX BUGS AND ISSUES |
| Component/s: | Server (S) |
| Affects Version/s: | 7.0.18 |
| Fix Version/s: | None |
| Type: | Incident report | Priority: | Trivial |
| Reporter: | Danilo G. Baio | Assignee: | Zabbix Support Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
FreeBSD 14.3-RELEASE-p3 amd64 $ curl -V $ zabbix_server --version Copyright (C) 2025 Zabbix SIA Compiled with OpenSSL 3.0.16 11 Feb 2025 |
||
| Attachments: |
|
| Description |
|
Hello, After upgrading our Zabbix infrastructure from 6 to 7.0.18, we noticed that many HTTP Agent items got stuck and are queuing indefinitely. For instance, we have many HTTP Agent items collected every minute, and randomly, they get stuck and are never crawled/updated again until we restart the server instance. We tried deploying a proxy to handle those HTTP Agent items, but the issue remains the same. As Zabbix 7 supports threads in all pollers, we tweaked our configuration and even set them as default, but the behavior remains the same, the HTTP Agent items continue to be queued randomly. Do you have any advice here, or could this be an issue with FreeBSD/curl that we're having? Thank you in advance for your help. |
| Comments |
| Comment by Alexey Pustovalov [ 2025 Sep 30 ] |
|
Hello! Please switch "http agent poller" to debug mode, wait at least a couple of minutes when some stuck items must be checked, then share information about "stuck" items and log file:
zabbix_server -R log_level_increase="http agent poller"
|
| Comment by Danilo G. Baio [ 2025 Sep 30 ] |
|
Hi dotneft Here it is: Log file with debug: Example of four items that got stuck: And thank you in advance for taking a look at this. ps. I will delete those linked files once this issue is resolved/closed. |
| Comment by Danilo G. Baio [ 2025 Oct 01 ] |
|
Hi again, I checked the logs again and followed an item that got stuck. The history stops, and the last log that matches this item's entry is this: Itemid: 452411 81622:20250930:100011.330 finished processing itemid:339251 81622:20250930:100011.330 End of process_httpagent_result() 81622:20250930:100011.330 End of curl_perform() 81622:20250930:100011.331 In zbx_ipc_socket_write() 81622:20250930:100011.331 requeue num:1 81622:20250930:100011.331 End of zbx_ipc_socket_write():SUCCEED 81622:20250930:100011.331 requeue items nextcheck:1759237213 81622:20250930:100011.331 In zbx_ipc_async_socket_recv() timeout:0 81622:20250930:100011.331 End of zbx_ipc_async_socket_recv():0 81622:20250930:100011.340 In curl_perform() 81622:20250930:100011.340 In handle_socket() action:4 81622:20250930:100011.340 End of handle_socket() 81622:20250930:100011.340 running_handles:10 81622:20250930:100011.340 End of curl_perform() 81622:20250930:100011.340 In zbx_ipc_async_socket_recv() timeout:0 81622:20250930:100011.340 End of zbx_ipc_async_socket_recv():0 81622:20250930:100011.363 In curl_perform() 81622:20250930:100011.364 In handle_socket() action:1 81622:20250930:100011.364 End of handle_socket() 81622:20250930:100011.364 In handle_socket() action:4 81622:20250930:100011.364 End of handle_socket() 81622:20250930:100011.364 start_timeout() timeout:29935 81622:20250930:100011.364 running_handles:9 81622:20250930:100011.364 pending cURL messages:0 81622:20250930:100011.364 In process_httpagent_result() 81622:20250930:100011.364 name 'content-type' value 'application/json; charset=utf-8' amount:1 index:0 origin:134217729 81622:20250930:100011.364 In zbx_int_in_list() list:'200' value:200 81622:20250930:100011.364 End of zbx_int_in_list():SUCCEED 81622:20250930:100011.364 In zbx_preprocess_item_value() 81622:20250930:100011.364 End of zbx_preprocess_item_value() 81622:20250930:100011.364 finished processing itemid:452411 81622:20250930:100011.364 End of process_httpagent_result() 81622:20250930:100011.364 End of curl_perform() 81622:20250930:100011.364 In zbx_ipc_socket_write() 81622:20250930:100011.364 requeue num:1 81622:20250930:100011.365 End of zbx_ipc_socket_write():SUCCEED 81622:20250930:100011.365 In zbx_ipc_async_socket_recv() timeout:0 81622:20250930:100011.365 requeue items nextcheck:1759237213 81622:20250930:100011.365 End of zbx_ipc_async_socket_recv():0 seems to me, requeue items nextcheck:1759237213 this is always in the past. See the history: |
| Comment by Alexey Pustovalov [ 2025 Oct 01 ] |
|
What timezone do you have on server and frontend? |
| Comment by Danilo G. Baio [ 2025 Oct 01 ] |
|
Both UTC-3, America/Sao_Paulo |
| Comment by Danilo G. Baio [ 2025 Oct 01 ] |
|
Hi, now another example: item: 472580
67349:20251001:113420.182 finished processing itemid:472580
67349:20251001:113420.182 End of process_httpagent_result()
67349:20251001:113420.182 End of curl_perform()
67349:20251001:113420.182 In zbx_ipc_socket_write()
67349:20251001:113420.182 requeue num:1
67349:20251001:113420.182 End of zbx_ipc_socket_write():SUCCEED
67349:20251001:113420.182 requeue items nextcheck:1759329261 Wednesday, October 1, 2025 11:34:21
[...]
67349:20251001:113520.181 finished processing itemid:472580
67349:20251001:113520.181 End of process_httpagent_result()
67349:20251001:113520.181 End of curl_perform()
67349:20251001:113520.181 In zbx_ipc_socket_write()
67349:20251001:113520.181 requeue num:1
67349:20251001:113520.181 End of zbx_ipc_socket_write():SUCCEED
67349:20251001:113520.181 requeue items nextcheck:1759329321 Wednesday, October 1, 2025 11:35:21
67349:20251001:113520.181 In zbx_ipc_async_socket_recv() timeout:0
67349:20251001:113520.181 End of zbx_ipc_async_socket_recv():0
[...]
67349:20251001:113620.185 finished processing itemid:472580
67349:20251001:113620.186 End of process_httpagent_result()
67349:20251001:113620.186 End of curl_perform()
67349:20251001:113620.186 In zbx_ipc_socket_write()
67349:20251001:113620.186 requeue num:1
67349:20251001:113620.186 End of zbx_ipc_socket_write():SUCCEED
67349:20251001:113620.186 requeue items nextcheck:1759329381 Wednesday, October 1, 2025 11:36:21
67349:20251001:113620.186 In zbx_ipc_async_socket_recv() timeout:0
67349:20251001:113620.186 End of zbx_ipc_async_socket_recv():0
67349:20251001:113621.160 In zbx_dc_config_get_poller_items() poller_type:7
67349:20251001:113621.160 zbx_setproctitle() title:'http agent poller #1 [got 19 values, queued 19 in 5 sec, awaiting 7]'
Then, it stops showing in the logs, and the item gets stuck in the queue. I've downgraded until Zabbix 7.0.7, and the same happens. Now it's back on 7.0.18. These are the web server logs for the three requests mentioned above: 2025-10-01 11:34:19 10.5.11.206 GET /api/monitor/process - 443 - my-zabbix-server-ip - - 200 0 0 16 2025-10-01 11:35:19 10.5.11.206 GET /api/monitor/process - 443 - my-zabbix-server-ip - - 200 0 0 16 2025-10-01 11:36:20 10.5.11.206 GET /api/monitor/process - 443 - my-zabbix-server-ip - - 200 0 0 50 |
| Comment by Danilo G. Baio [ 2025 Oct 01 ] |
|
Ah, this doesn't happen all of a sudden... it usually takes an hour and a half before it starts clogging the queue with many HTTP agent items. |
| Comment by Alexey Pustovalov [ 2025 Oct 02 ] |
|
Sorry, but according history screenshot I do not see delays or missed data points. Do you have such examples? |
| Comment by Danilo G. Baio [ 2025 Oct 02 ] |
|
Several examples I can show. ...
Shows four items 'HTTP agent' queued/delayed:
In the most host/latest data, you can see that the last check was 24 minutes ago.
History of the item 388505: *screenshot
In the logs (with zabbix_server -R log_level_increase="http agent poller") , I have a record of checking this item every two minutes, but after 10:33:05, it no longer appears. 15024:20251002:103305.610 End of zbx_ipc_async_socket_recv():0 15024:20251002:103305.610 In curl_perform() 15024:20251002:103305.610 running_handles:5 15024:20251002:103305.610 End of curl_perform() 15024:20251002:103305.610 In zbx_ipc_async_socket_recv() timeout:0 15024:20251002:103305.610 End of zbx_ipc_async_socket_recv():0 15024:20251002:103305.629 In curl_perform() 15024:20251002:103305.630 In handle_socket() action:1 15024:20251002:103305.630 End of handle_socket() 15024:20251002:103305.630 In handle_socket() action:4 15024:20251002:103305.630 End of handle_socket() 15024:20251002:103305.630 running_handles:4 15024:20251002:103305.630 pending cURL messages:0 15024:20251002:103305.630 In process_httpagent_result() 15024:20251002:103305.630 name 'content-type' value 'application/json; charset=utf-8' amount:1 index:0 origin:134217729 15024:20251002:103305.630 In zbx_int_in_list() list:'200' value:200 15024:20251002:103305.630 End of zbx_int_in_list():SUCCEED 15024:20251002:103305.630 In zbx_preprocess_item_value() 15024:20251002:103305.630 End of zbx_preprocess_item_value() 15024:20251002:103305.630 finished processing itemid:388505 15024:20251002:103305.630 End of process_httpagent_result() 15024:20251002:103305.630 End of curl_perform() 15024:20251002:103305.630 In zbx_ipc_socket_write() 15024:20251002:103305.630 End of zbx_ipc_socket_write():SUCCEED 15024:20251002:103305.630 In zbx_ipc_async_socket_recv() timeout:0 15024:20251002:103305.630 requeue num:1 15024:20251002:103305.630 End of zbx_ipc_async_socket_recv():0 15024:20251002:103305.630 requeue items nextcheck:1759411987 --> Thursday, October 2, 2025 10:33:07 AM I only noticed this issue because I have triggers set to alert me if the data are not collected; otherwise, it could possibly go unnoticed. Is there any way to investigate a specific item, what happened to it, and why the server is no longer trying to collect it? Thank you in advance.
|
| Comment by Danilo G. Baio [ 2025 Oct 03 ] |
|
For more information, when the item is stuck and logging is enabled with `zabbix_server -R log_level_increase="http agent poller"`, and I force an `Execute now` for a particular item stuck, the message `finished processing itemid:XXXXX` does not appear in the logs, unlike for the other items.
Item config.
|
| Comment by Vladislavs Sokurenko [ 2025 Oct 07 ] |
|
Please try updating curl version to latest and see if issue persists |
| Comment by Vladislavs Sokurenko [ 2025 Oct 07 ] |
|
Thank you for providing logs issue can be due to following: Please try lower timeout, 30 seconds might be too much. |
| Comment by Danilo G. Baio [ 2025 Oct 07 ] |
|
Hi, I've just upgraded curl from curl-8.15.0 to curl-8.16.0 (also rebuilt zabbix-serer 7.0.19), lowered the timeouts to 5s and 15s, but this intensified the issue. I don't know how to debug this further, any tips/advice? Thank you in advance. |
| Comment by Vladislavs Sokurenko [ 2025 Oct 08 ] |
|
So it appears to be problem when using flexible interval ? Unfortunately log is not available by the link provided. |
| Comment by Danilo G. Baio [ 2025 Nov 05 ] |
|
Hi vso, I set up a new Linux VM and moved only the zabbix_server processes to it. The main differences are these: FreeBSD 14.3, curl 8.15.0, OpenSSL 3.0.16, nghttp2 1.66.0 AlmaLinux 9.6, curl 7.76.1, OpenSSL 3.2.2, nghttp2/1.43.0 Could this be related to curl, or something else in the threads FreeBSD vs Linux? |
| Comment by Vladislavs Sokurenko [ 2025 Nov 05 ] |
|
Yes, there are bug reports on some curl versions about this, better to use some stable version |
| Comment by Danilo G. Baio [ 2025 Nov 05 ] |
|
Unfortunately, I can't easily downgrade cURL on FreeBSD. |
| Comment by Vladislavs Sokurenko [ 2025 Nov 05 ] |
|
Maybe it’s possible to upgrade, anyway, thank you for feedback. |