[ZBX-19398] Sqlite deadlock between history syncer or conf syncer. Created: 2021 May 18 Updated: 2024 Apr 10 Resolved: 2021 Jun 14 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Proxy (P) |
Affects Version/s: | 5.4.0 |
Fix Version/s: | 5.4.1rc2, 6.0.0alpha1, 6.0 (plan) |
Type: | Problem report | Priority: | Blocker |
Reporter: | Andrei Gushchin (Inactive) | Assignee: | Vladislavs Sokurenko |
Resolution: | Fixed | Votes: | 17 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: |
![]() |
||||||||||||||||||||
Issue Links: |
|
||||||||||||||||||||
Team: | |||||||||||||||||||||
Sprint: | Sprint 76 (May 2021), Sprint 77 (Jun 2021) | ||||||||||||||||||||
Story Points: | 1 |
Description |
Steps to reproduce: 1261:20210518:152725.569 == locks diagnostic information == 1261:20210518:152725.578 locks: 1261:20210518:152725.587 ZBX_MUTEX_LOG:0x7fbb6fecd000 1261:20210518:152725.596 ZBX_MUTEX_CACHE:0x7fbb6fecd028 1261:20210518:152725.604 ZBX_MUTEX_TRENDS:0x7fbb6fecd050 1261:20210518:152725.617 ZBX_MUTEX_CACHE_IDS:0x7fbb6fecd078 1261:20210518:152725.626 ZBX_MUTEX_SELFMON:0x7fbb6fecd0a0 1261:20210518:152725.640 ZBX_MUTEX_CPUSTATS:0x7fbb6fecd0c8 1261:20210518:152725.652 ZBX_MUTEX_DISKSTATS:0x7fbb6fecd0f0 1261:20210518:152725.661 ZBX_MUTEX_ITSERVICES:0x7fbb6fecd118 1261:20210518:152725.671 ZBX_MUTEX_VALUECACHE:0x7fbb6fecd140 1261:20210518:152725.680 ZBX_MUTEX_VMWARE:0x7fbb6fecd168 1261:20210518:152725.689 ZBX_MUTEX_SQLITE3:0x7fbb6fecd190 1261:20210518:152725.698 ZBX_MUTEX_PROCSTAT:0x7fbb6fecd1b8 1261:20210518:152725.707 ZBX_MUTEX_PROXY_HISTORY:0x7fbb6fecd1e0 1261:20210518:152725.717 ZBX_MUTEX_MODBUS:0x7fbb6fecd208 1261:20210518:152725.725 ZBX_MUTEX_TREND_FUNC:0x7fbb6fecd230 1261:20210518:152725.735 ZBX_RWLOCK_CONFIG:0x7fbb6fecd258 1261:20210518:152725.744 ZBX_RWLOCK_VALUECACHE:0x7fbb6fecd290 1261:20210518:152725.753 == root@s01:~$ ps ax | grep sync 1263 ? S 1:06 /usr/sbin/zabbix_proxy: configuration syncer [loading configuration] 1332 ? S 0:02 /usr/sbin/zabbix_proxy: history syncer #1 [processed 0 values in 0.000029 sec, syncing history] 1333 ? S 0:02 /usr/sbin/zabbix_proxy: history syncer #2 [processed 6090 values in 0.397928 sec, syncing history] 1334 ? S 0:02 /usr/sbin/zabbix_proxy: history syncer #3 [processed 0 values in 0.000036 sec, syncing history] 1335 ? S 0:02 /usr/sbin/zabbix_proxy: history syncer #4 [processed 7000 values in 0.397221 sec, syncing history] 13236 pts/4 S+ 0:00 grep --color=auto sync root@s01:~$ strace -p 1332 strace: Process 1332 attached futex(0x7fbb6fecd190, FUTEX_WAIT, 2, NULL^Cstrace: Process 1332 detached <detached ...> root@s01:~$ strace -p 1263 strace: Process 1263 attached futex(0x7fbb6fecd190, FUTEX_WAIT, 2, NULL^Cstrace: Process 1263 detached <detached ...> Result: |
Comments |
Comment by Adrian Kirchner [ 2021 May 21 ] |
I'm experiencing this on two proxies. Happens roughly every 12 hours
$ lsb_release -a Distributor ID: Ubuntu Description: Ubuntu 20.04.2 LTS Release: 20.04 Codename: focal dpkg -l | grep zabbix ii zabbix-agent 1:5.4.0-1+ubuntu20.04 amd64 Zabbix network monitoring solution - agent ii zabbix-proxy-sqlite3 1:5.4.0-1+ubuntu20.04 amd64 Zabbix network monitoring solution - proxy (SQLite3) ii zabbix-sender 1:5.4.0-1+ubuntu20.04 amd64 Zabbix network monitoring solution - sender
|
Comment by Max Ried [ 2021 May 25 ] |
I'm seeing the same issue I guess. I doesn't happen regularly, so I tried to avoid enabling the max debugging level all the time. Neither executing zabbix_proxy --runtime-control diaginfo while in this state doesn't produce any log output, nor does increasing the log level at runtime change anything. When I send a config_cache_reload it displays "configuration cache reloading is already in progress". It's on Debian 10, amd64, zabbix-proxy-sqlite 5.4.0. systemd can't restart it, unless if you first stop it with --signal=SIGKILL. |
Comment by Guilherme Xavier [ 2021 May 30 ] |
I am also facing this problem. After upgrading from version 5.0 to 5.4 my zabbix proxy is unable to send data to the zabbix server. I use Debian 10, with proxy, server and database installed on different servers. I tried to redo the bank of the proxy that uses sqlite but without success. When downloading the service or restarting the host there is a very long delay. I have 350 vps on average.
zabbix-proxy-sqlite3 1:5.4.0-2+debian10
zabbix_server.conf AlertScriptsPath=/usr/lib/zabbix/alertscripts
zabbix_proxy.conf CacheSize=512M |
Comment by Ted Serreyn [ 2021 May 31 ] |
Looks like I am seeing this issue also on Zabbix-proxy with sqlite3 on raspberry pi. I have several of these proxies and most of them have experienced this problem. The restart issue is kind of a big deal also as just a simple restart doesn't easily fix it.
|
Comment by Bruno Scota de Carvalho [ 2021 May 31 ] |
Oh! its not happening only to me! It started to happen after upgrade 5.2 to 5.4 using zabbix proxy containered. I have 3 proxies at separated locations. All with the same symptoms. (200vps each) |
Comment by Max Ried [ 2021 Jun 01 ] |
I only have 20 resp. 40 values per second, so it does not seem to have to to with this. The 40 vps one locks up more often though. |
Comment by Vladislavs Sokurenko [ 2021 Jun 01 ] |
It seems to have been caused by 0x00007f747f7a829c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) bt #0 0x00007f747f7a829c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00007f747f7a1714 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x0000561e93929f4c in __zbx_mutex_lock () #3 0x0000561e9397667c in zbx_db_vselect () #4 0x0000561e9395e75a in DBselect () #5 0x0000561e938da887 in zbx_dbsync_compare_item_tags () #6 0x0000561e938b9727 in DCsync_configuration () #7 0x0000561e9396d27b in process_proxyconfig () #8 0x0000561e937d4fac in ?? () #9 0x0000561e937d5172 in proxyconfig_thread () #10 0x0000561e9392a274 in zbx_thread_start () #11 0x0000561e937cc793 in MAIN_ZABBIX_ENTRY () #12 0x0000561e93924c31 in daemon_start () #13 0x0000561e937cbda1 in main () As you see zbx_dbsync_compare_item_tags is called under configuration cache lock, meaning that configuration cache lock is locked for write lock, then for database queries sqlite mutex is locked. Expected: The other process is database syncer that locks mutex for sqlite3 and then locks mutex for configuration cache thus deadlock occur. 0x00007f747f7a4025 in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) #0 0x00007f747f7a4025 in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x0000561e93929d48 in __zbx_rwlock_wrlock () #2 0x0000561e938cb23d in DCconfig_items_apply_changes () #3 0x0000561e938a5379 in ?? () #4 0x0000561e938a7099 in ?? () #5 0x0000561e938a8276 in zbx_sync_history_cache () #6 0x0000561e937cd7f5 in dbsyncer_thread () #7 0x0000561e9392a274 in zbx_thread_start () #8 0x0000561e937cca12 in MAIN_ZABBIX_ENTRY () #9 0x0000561e93924c31 in daemon_start () #10 0x0000561e937cbda1 in main () |
Comment by Vladislavs Sokurenko [ 2021 Jun 01 ] |
Fixed in pull request feature/ZBX-19398-5.4 |
Comment by Rostislav Palivoda (Inactive) [ 2021 Jun 02 ] |
Releasing in 5.4.1rc2 today. |
Comment by Vladislavs Sokurenko [ 2021 Jun 02 ] |
Fixed in:
|
Comment by Konstantīns Ošmjans [ 2021 Jun 09 ] |
In the headers of this Jira issue:
Is it really fixed or unresolved? |
Comment by Alex Kalimulin [ 2021 Jun 09 ] |
constantin.oshmyan, it's fixed and available in the recently released 5.4.1. |