Bug #119473 InnoDB Mutex Deadlock Cause Mysql Crash
Submitted: 29 Nov 8:17 Modified: 29 Nov 8:24
Reporter: fei yang Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0.41 OS:Any
Assigned to: CPU Architecture:Any

[29 Nov 8:17] fei yang
Description:
We deployed MySQL version 8.0.41 on a server with a specification of 16c64g. When we set loose-innodb_buffer_pool_size to 1G, we ran the following Sysbench command::

./sysbench --db-driver=mysql --report-interval=1 --tables=128 --table-size=25000 --mysql-db=white_paper --mysql-host=** --mysql-port=3306 --mysql-user=root --mysql-password=** --threads=100 ./oltp_common.lua prepare

on two other servers with the same 16c64g specification, we executed:
/sysbench --db-driver=mysql --report-interval=1 --tables=128 --table-size=25000 --mysql-db=white_paper --mysql-host=** --mysql-port=3306 --mysql-user=root --mysql-password=** --time=300 --threads=15000 --percentile=95 --db-ps-mode=disable --forced-shutdown --mysql-ignore-errors=1062,1213 ./oltp_read_write.lua run

./sysbench --db-driver=mysql --report-interval=1 --tables=128 --table-size=25000 --mysql-db=white_paper --mysql-host=** --mysql-port=3306 --mysql-user=root --mysql-password=** --time=300 --threads=2000 --percentile=95 --db-ps-mode=disable --forced-shutdown --mysql-ignore-errors=1062,1213 ./oltp_read_write.lua run

After running for a while, we occasionally encountered a mutex deadlock issue as described below:
########################################
DEADLOCK of threads detected!
--Thread 281422091242112 has waited at buf0buf.cc line 5406 for 0 seconds the semaphore:
Mutex at 0xffff66040050, Mutex BUF_POOL_LRU_LIST created buf0buf.cc:1231, locked by 281415897959040

--Thread 281415897959040 has waited at fil0fil.cc line 2233 for 0 seconds the semaphore:
Mutex at 0xffff655e50f0, Mutex FIL_SHARD created fil0fil.cc:2014, locked by 281422091242112

--Thread 281410351712896 has waited at buf0buf.cc line 5016 for 0 seconds the semaphore:
Mutex at 0xffff66040050, Mutex BUF_POOL_LRU_LIST created buf0buf.cc:1231, locked by 281415897959040

2025-11-27T13:25:22.879491-00:00 0 281470832659072 [ERROR] [MY-012982] [InnoDB] [FATAL] ######################################## Deadlock Detected!
2025-11-27T13:25:22.879547-00:00 0 281470832659072 [ERROR] [MY-013183] [InnoDB] Assertion failure: sync0arr.cc:678:ib::fatal triggered thread 281470832659072
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2025-11-27T13:25:22Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=b07863a4716ac5a1fea7ef405fb8c3d75342f37f
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x100000
/opt/taurusdb/app/engine/bin/taurus_server(my_print_stacktrace(unsigned char const*, unsigned long)+0x44) [0x1ce6364]
/opt/taurusdb/app/engine/bin/taurus_server(print_fatal_signal(int)+0x340) [0x11cb140]
/opt/taurusdb/app/engine/bin/taurus_server(my_server_abort()+0x7c) [0x11cb2ac]
/opt/taurusdb/app/engine/bin/taurus_server(my_abort()+0x14) [0x1ce4bf4]
/opt/taurusdb/app/engine/bin/taurus_server(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x188) [0x1dbb808]
/opt/taurusdb/app/engine/bin/taurus_server(ib::fatal::~fatal()+0) [0x1dd0790]
/opt/taurusdb/app/engine/bin/taurus_server() [0xf3ec5c]
/opt/taurusdb/app/engine/bin/taurus_server(sync_array_detect_deadlock()+0xfc) [0xbd1b5c]
/opt/taurusdb/app/engine/bin/taurus_server(srv_error_monitor_thread()+0xc8) [0xbd15c8]
/opt/taurusdb/app/engine/bin/taurus_server() [0xb86a1c]
/opt/taurusdb/app/engine/lib/libstdc++.so.6(+0xcebcc) [0xffff6c15ebcc]
/usr/lib64/libpthread.so.0(+0x878c) [0xffff6e27878c]
/usr/lib64/libc.so.6(+0xd508c) [0xffff6bee508c]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

How to repeat:
Referring to the steps described above, may be this is not a consistently reproducible issue.

Suggested fix:

We investigated the core file and found that the mutex deadlock was due to the following stack trace. 
It is clear that while exiting the connection and truncating the temporary tablespace, fil_space_get was called twice. 
The first call occurred in the Tablespace_pool::free_ts function, while the second call to fil_space_get got stuck, as shown in the stack trace below:
[Switching to thread 649 (LWP 1596642)]
#0  0x0000ffff6e27e5a0 in pthread_cond_wait () from /usr/lib64/libpthread.so.0
(gdb) bt
#0  0x0000ffff6e27e5a0 in pthread_cond_wait () from /usr/lib64/libpthread.so.0
#1  0x0000000000b89160 in os_event::wait (this=<optimized out>) at ../storage/innobase/../../../../storage/innobase/include/sync0types.h:555
#2  os_event::wait_low (this=0xffff66148b80, reset_sig_count=<optimized out>) at ../storage/innobase/../../../../storage/innobase/os/os0event.cc:366
#3  0x0000000000bc9610 in os_event_wait_low (reset_sig_count=<optimized out>, event=<optimized out>)
    at ../storage/innobase/../../../../storage/innobase/os/os0event.cc:590
#4  sync_array_wait_event (arr=0xffff661a0020, cell=@0xfff23ea1b5f0: 0xffff64b7c618)
    at ../storage/innobase/../../../../storage/innobase/sync/sync0arr.cc:338
#5  0x00000000009353c0 in TTASEventMutex<GenericPolicy>::wait (spin=4, line=2233, filename=0x2e8a6a0 "../../../../storage/innobase/fil/fil0fil.cc", 
    this=<optimized out>) at ../storage/innobase/../../../../storage/innobase/include/sync0arr.ic:71
#6  TTASEventMutex<GenericPolicy>::spin_and_try_lock (line=2233, filename=0x2e8a6a0 "../../../../storage/innobase/fil/fil0fil.cc", max_delay=6, 
    max_spins=90, this=0xffff655e50f0) at ../storage/innobase/../../../../storage/innobase/include/ib0mutex.h:542
#7  TTASEventMutex<GenericPolicy>::enter (line=2233, filename=0x2e8a6a0 "../../../../storage/innobase/fil/fil0fil.cc", max_delay=6, max_spins=30, 
    this=<optimized out>) at ../storage/innobase/../../../../storage/innobase/include/ib0mutex.h:443
#8  PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (line=2233, name=0x2e8a6a0 "../../../../storage/innobase/fil/fil0fil.cc", n_delay=6, n_spins=30, 
    this=<optimized out>) at ../storage/innobase/../../../../storage/innobase/include/ib0mutex.h:637
#9  Fil_shard::acquire (line=2233, this=<optimized out>) at ../storage/innobase/../../../../storage/innobase/fil/fil0fil.cc:686
#10 fil_space_get (space_id=4243769263) at ../storage/innobase/../../../../storage/innobase/fil/fil0fil.cc:2233
#11 0x0000000000ba7870 in buf_page_t::space_id_changed (this=0xffff5a95ed10) at ../storage/innobase/../../../../storage/innobase/include/buf0types.h:239
#12 buf_page_t::reset_page_id (this=this@entry=0xffff5a95ed10, new_page_id=...) at ../storage/innobase/../../../../storage/innobase/include/buf0buf.h:1251
#13 0x0000000000ba7598 in buf_block_set_file_page (page_id=..., block=0xffff5a95ed10)
    at ../storage/innobase/../../../../storage/innobase/include/buf0buf.ic:434
#14 buf_page_init (buf_pool=buf_pool@entry=0xffff66040020, page_id=..., page_size=..., block=block@entry=0xffff5a95ed10)
    at ../storage/innobase/../../../../storage/innobase/buf/buf0buf.cc:4737
#15 0x0000000000ba70ac in buf_page_create (page_id=..., page_size=..., rw_latch=rw_latch@entry=RW_SX_LATCH, mtr=mtr@entry=0xfff23ea1b9e8)
    at ../storage/innobase/../../../../storage/innobase/buf/buf0buf.cc:5071
#16 0x0000000000ba6810 in fsp_header_init (space_id=4243769263, size=5, mtr=0xfff23ea1b9e8)
    at ../storage/innobase/../../../../storage/innobase/fsp/fsp0fsp.cc:1003
#17 0x0000000000ddfcf8 in ibt::Tablespace::truncate (this=0xfff44c46d360) at ../storage/innobase/../../../../storage/innobase/srv/srv0tmp.cc:150
#18 0x0000000000ddfbd4 in ibt::Tablespace_pool::free_ts (this=0xffff6559efc0, ts=<optimized out>)
    at ../storage/innobase/../../../../storage/innobase/srv/srv0tmp.cc:232
#19 0x0000000000c59d7c in innodb_session_t::~innodb_session_t (this=<optimized out>, this=<optimized out>)
    at ../storage/innobase/../../../../storage/innobase/include/sess0sess.h:95
#20 0x0000000000c59760 in ut::delete_<innodb_session_t> (ptr=0xfff449d74460) at ../storage/innobase/../../../../include/my_compiler.h:56
#21 innobase_close_connection (hton=<optimized out>, thd=0xfff378ead800) at ../storage/innobase/../../../../storage/innobase/handler/ha_innodb.cc:6271
#22 0x0000000000b2c65c in closecon_handlerton (thd=0xfff378ead800, plugin=<optimized out>) at ../../../sql/handler.cc:1007
#23 0x0000000000924cd4 in plugin_foreach_with_mask (thd=thd@entry=0xfff378ead800, funcs=funcs@entry=0xfff23ea1c0a8, type=type@entry=1, 
--Type <RET> for more, q to quit, c to continue without paging--
    state_mask=4294967287, state_mask@entry=8, arg=arg@entry=0x0) at ../../../sql/sql_plugin.cc:2786
#24 0x0000000000b2b89c in plugin_foreach_with_mask (thd=thd@entry=0xfff378ead800, 
    func=func@entry=0xb2c610 <closecon_handlerton(THD*, plugin_ref, void*)>, type=type@entry=1, state_mask=state_mask@entry=8, arg=arg@entry=0x0)
    at ../../../sql/sql_plugin.cc:2797
#25 0x0000000000b2b868 in ha_close_connection (thd=thd@entry=0xfff378ead800) at ../../../sql/handler.cc:1019
#26 0x0000000000b22818 in THD::release_resources (this=0xfff378ead800) at ../../../sql/sql_class.cc:1390
#27 0x0000000000f38794 in threadpool_remove_connection (thd=0xfff378ead800) at ../plugin/threadpool/../../../../plugin/threadpool/threadpool_common.cc:235
#28 0x0000000000e82258 in connection_abort (connection=0xfff3820130b0) at ../plugin/threadpool/../../../../plugin/threadpool/threadpool_unix.cc:1516
#29 handle_event (connection=0xfff3820130b0) at ../plugin/threadpool/../../../../plugin/threadpool/threadpool_unix.cc:1741
#30 worker_main (param=param@entry=0x474d800 <g_all_groups+84992>) at ../plugin/threadpool/../../../../plugin/threadpool/threadpool_unix.cc:1779
#31 0x0000000000b697e8 in pfs_spawn_thread (arg=<optimized out>) at ../storage/perfschema/../../../../storage/perfschema/pfs.cc:3050
#32 0x0000ffff6e27878c in ?? () from /usr/lib64/libpthread.so.0
#33 0x0000ffff6bee508c in ?? () from /usr/lib64/libc.so.6
(gdb) f 11
#11 0x0000000000ba7870 in buf_page_t::space_id_changed (this=0xffff5a95ed10) at ../storage/innobase/../../../../storage/innobase/include/buf0types.h:239
239	../storage/innobase/../../../../storage/innobase/include/buf0types.h: No such file or directory.
(gdb) p *this
Cannot access memory at address 0xffff5a95ed10
(gdb) f 4
#4  sync_array_wait_event (arr=0xffff661a0020, cell=@0xfff23ea1b5f0: 0xffff64b7c618)
    at ../storage/innobase/../../../../storage/innobase/sync/sync0arr.cc:338
338	../storage/innobase/../../../../storage/innobase/sync/sync0arr.cc: No such file or directory.
(gdb) p *cell
$1 = {latch = {lock = 0xffff655e50f0, mutex = 0xffff655e50f0, bpmutex = 0xffff655e50f0}, request_type = 1, 
  file = 0x2e8a6a0 "../../../../storage/innobase/fil/fil0fil.cc", line = 2233, thread_id = {_M_thread = 281415897959040}, waiting = true, 
  signal_count = 152, reservation_time = {__d = {__r = 419656909187655}}, last_scan = 628}

Perhaps we should check mutex_owned in the fil_space_get function. If it is held, we should avoid calling shard->mutex_acquire() again?
[29 Nov 8:24] fei yang
The issue https://bugs.mysql.com/bug.php?id=116552
 may be similar to the one we encountered.