Bug #113473 InnoDB: Assertion failure in thread 139747416012544 in file ut0ut.cc line 947
Submitted: 20 Dec 2023 11:29 Modified: 28 Dec 2023 12:19
Reporter: Jingyan Liu Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.7.25 OS:CentOS (7.3)
Assigned to: CPU Architecture:Any
Tags: crash, dead lock, semaphore wait, slave

[20 Dec 2023 11:29] Jingyan Liu
Description:
we have a master-slave mysql cluster, version: 5.7.25;
slave also have some read query on it, and the slave crash with error logs below:

2023-12-20T00:24:01.050430+08:00 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139747105769216 has waited at trx0rseg.ic line 48 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x7f2c2e671a10 created in file buf0buf.cc line 1425
a writer (thread id 139747399227136) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0server-5.7.25/storage/innobase/include/trx0rseg.ic line 48
2023-12-20T00:24:01.050497+08:00 0 [Warning] InnoDB:
Last time read locked in file not yet reserved line 0
Last time write locked in file /tmp/deploy/percona- A long semaphore wait:
--Thread 139747399227136 has waited at trx0undo.ic line 171 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x7f28f81d9b70 created in file buf0buf.cc line 1425
a writer (thread id 139747105769216) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file trx0undo.ic line 190
Last time write locked in file /tmp/deploy/percona-server-5.7.25/storage/innobase/include/trx0undo.ic line 171
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0

=====================================
2023-12-20 00:24:16 0x7f18edffb700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 15 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 61334337 srv_active, 0 srv_shutdown, 349881 srv_idle
srv_master_thread log flush and writes: 61681841
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 20529595
--Thread 139747099547392 has waited at trx0trx.cc line 1192 for 253.00 seconds the semaphore:
Mutex at 0x414d890, Mutex REDO_RSEG created trx0rseg.cc:211, lock var 1

--Thread 139747098945280 has waited at trx0trx.cc line 1192 for 250.00 seconds the semaphore:
Mutex at 0x414d890, Mutex REDO_RSEG created trx0rseg.cc:211, lock var 1

--Thread 139747105769216 has waited at trx0rseg.ic line 48 for 256.00 seconds the semaphore:
X-lock on RW-latch at 0x7f2c2e671a10 created in file buf0buf.cc line 1425
a writer (thread id 139747399227136) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file not yet reserved line 0
Last time write locked in file /tmp/deploy/percona-server-5.7.25/storage/innobase/include/trx0rseg.ic line 48
--Thread 139745223276288 has waited at buf0flu.cc line 1229 for 226.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f2661abd990 created in file buf0buf.cc line 1425
a writer (thread id 139747105769216) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0sel.cc line 5189
Last time write locked in file /tmp/deploy/percona-server-5.7.25/storage/innobase/include/trx0undo.ic line 171
--Thread 139747399227136 has waited at trx0undo.ic line 171 for 256.00 seconds the semaphore:
X-lock on RW-latch at 0x7f28f81d9b70 created in file buf0buf.cc line 1425
a writer (thread id 139747105769216) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file trx0undo.ic line 190
Last time write locked in file /tmp/deploy/percona-server-5.7.25/storage/innobase/include/trx0undo.ic line 171
OS WAIT ARRAY INFO: signal count 27151915
RW-shared spins 0, rounds 54808837, OS waits 17398973
RW-excl spins 0, rounds 76412291, OS waits 524144
RW-sx spins 4500145, rounds 47782943, OS waits 295248
Spin rounds per wait: 54808837.00 RW-shared, 76412291.00 RW-excl, 10.62 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 15346024984
Purge done for trx's n:o < 15346024703 undo n:o < 0 state: running
History list length 75
LIST OF TRANSACTIONS FOR EACH SESSION:
.......
---TRANSACTION 15346024761, ACTIVE (PREPARED) 256 sec committing
3 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 2
MySQL thread id 17509818, OS thread handle 139747105769216, query id 9108196723 System lock

.......................
----------------------------
END OF INNODB MONITOR OUTPUT
============================
InnoDB: ###### Diagnostic info printed to the standard error stream
2023-12-20T00:35:53.079935+08:00 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2023-12-20 00:35:53 0x7f197b16f700  InnoDB: Assertion failure in thread 139747416012544 in file ut0ut.cc line 947
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
16:35:53 UTC - mysqld got signal 6 ;

key_buffer_size=33554432
read_buffer_size=65536
max_used_connections=1648
max_threads=100003
thread_count=1584
connection_count=1574
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 32177930 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x30000
/home/work/mysql/bin/mysqld(my_print_stacktrace+0x2c)[0xee68bc]
/home/work/mysql/bin/mysqld(handle_fatal_signal+0x461)[0xd32be1]
/lib64/libpthread.so.0(+0xf370)[0x7f2edbc46370]
/lib64/libc.so.6(gsignal+0x37)[0x7f2eda4251d7]
/lib64/libc.so.6(abort+0x148)[0x7f2eda4268c8]
/home/work/mysql/bin/mysqld[0x78ec54]
/home/work/mysql/bin/mysqld(_ZN2ib5fatalD1Ev+0xee)[0x11781be]
/home/work/mysql/bin/mysqld(srv_error_monitor_thread+0xae0)[0x11152f0]
/lib64/libpthread.so.0(+0x7dc5)[0x7f2edbc3edc5]
/lib64/libc.so.6(clone+0x6d)[0x7f2eda4e773d]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.

==========================================
with the error log, it's a dead lock between thread 139747105769216 and  139747399227136;

and there has logs:

---TRANSACTION 15346024761, ACTIVE (PREPARED) 256 sec committing
3 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 2
MySQL thread id 17509818, OS thread handle 139747105769216, query id 9108196723 System lock

so, check thread id 17509818, from our monitor:

2023-12-20 00:13:02	17509818	system user			Connect	0	System lock	
2023-12-20 00:14:01	17509818	system user			Connect	0	System lock	INSERT INTO xx.xx
2023-12-20 00:15:01	17509818	system user			Connect	0	Waiting for an event from Coordinator	
2023-12-20 00:16:02	17509818	system user			Connect	0	Waiting for an event from Coordinator	
2023-12-20 00:17:01	17509818	system user			Connect	1	Waiting for an event from Coordinator	

17509818 is a work thread which execute DML read from relay log;

our slave options:
slave-parallel-type = LOGICAL_CLOCK
slave-parallel-workers = 8
slave_pending_jobs_size_max                                     = 134217728
slave_preserve_commit_order                                     = 0
transaction_write_set_extraction                                = XXHASH64
binlog_transaction_dependency_tracking                          = WRITESET

How to repeat:
can't repeat, we meet this problem last time is 2023/09/25;
[20 Dec 2023 13:25] MySQL Verification Team
Hi Mr. Liu,

Thank you for your bug report.

However, version 5.7 is not supported for a long time.

If you manage to repeat the problem with 8.0.35 or 8.2.0, please file a new bug report and set it under the category of "Group Replication'.

Unsupported.
[28 Dec 2023 12:19] Jingyan Liu
I found the bug alread fix in 5.7.35. 

release note:

InnoDB: After upgrading from MySQL 5.6 to MySQL 5.7 and starting the server with undo log truncation enabled (innodb_undo_log_truncate=ON), a deadlock occurred when an undo tablespace truncate operation was initiated. The deadlock caused a long semaphore wait and an eventual failure. A direct upgrade from MySQL 5.6 to MySQL 5.7.35 or later avoids this potential issue. (Bug #32800020)

commit:
https://github.com/mysql/mysql-server/commit/be9954a5233bc5d24a9c613dae42c7be44623cff
[8 Jan 2024 10:26] MySQL Verification Team
Thank you, Mr Liu.