Description:
we have a master-slave mysql cluster, version: 5.7.25;
slave also have some read query on it, and the slave crash with error logs below:
2023-12-20T00:24:01.050430+08:00 0 [Warning] InnoDB: A long semaphore wait:
--Thread 139747105769216 has waited at trx0rseg.ic line 48 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x7f2c2e671a10 created in file buf0buf.cc line 1425
a writer (thread id 139747399227136) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0server-5.7.25/storage/innobase/include/trx0rseg.ic line 48
2023-12-20T00:24:01.050497+08:00 0 [Warning] InnoDB:
Last time read locked in file not yet reserved line 0
Last time write locked in file /tmp/deploy/percona- A long semaphore wait:
--Thread 139747399227136 has waited at trx0undo.ic line 171 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x7f28f81d9b70 created in file buf0buf.cc line 1425
a writer (thread id 139747105769216) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file trx0undo.ic line 190
Last time write locked in file /tmp/deploy/percona-server-5.7.25/storage/innobase/include/trx0undo.ic line 171
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0
=====================================
2023-12-20 00:24:16 0x7f18edffb700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 15 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 61334337 srv_active, 0 srv_shutdown, 349881 srv_idle
srv_master_thread log flush and writes: 61681841
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 20529595
--Thread 139747099547392 has waited at trx0trx.cc line 1192 for 253.00 seconds the semaphore:
Mutex at 0x414d890, Mutex REDO_RSEG created trx0rseg.cc:211, lock var 1
--Thread 139747098945280 has waited at trx0trx.cc line 1192 for 250.00 seconds the semaphore:
Mutex at 0x414d890, Mutex REDO_RSEG created trx0rseg.cc:211, lock var 1
--Thread 139747105769216 has waited at trx0rseg.ic line 48 for 256.00 seconds the semaphore:
X-lock on RW-latch at 0x7f2c2e671a10 created in file buf0buf.cc line 1425
a writer (thread id 139747399227136) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file not yet reserved line 0
Last time write locked in file /tmp/deploy/percona-server-5.7.25/storage/innobase/include/trx0rseg.ic line 48
--Thread 139745223276288 has waited at buf0flu.cc line 1229 for 226.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f2661abd990 created in file buf0buf.cc line 1425
a writer (thread id 139747105769216) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0sel.cc line 5189
Last time write locked in file /tmp/deploy/percona-server-5.7.25/storage/innobase/include/trx0undo.ic line 171
--Thread 139747399227136 has waited at trx0undo.ic line 171 for 256.00 seconds the semaphore:
X-lock on RW-latch at 0x7f28f81d9b70 created in file buf0buf.cc line 1425
a writer (thread id 139747105769216) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file trx0undo.ic line 190
Last time write locked in file /tmp/deploy/percona-server-5.7.25/storage/innobase/include/trx0undo.ic line 171
OS WAIT ARRAY INFO: signal count 27151915
RW-shared spins 0, rounds 54808837, OS waits 17398973
RW-excl spins 0, rounds 76412291, OS waits 524144
RW-sx spins 4500145, rounds 47782943, OS waits 295248
Spin rounds per wait: 54808837.00 RW-shared, 76412291.00 RW-excl, 10.62 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 15346024984
Purge done for trx's n:o < 15346024703 undo n:o < 0 state: running
History list length 75
LIST OF TRANSACTIONS FOR EACH SESSION:
.......
---TRANSACTION 15346024761, ACTIVE (PREPARED) 256 sec committing
3 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 2
MySQL thread id 17509818, OS thread handle 139747105769216, query id 9108196723 System lock
.......................
----------------------------
END OF INNODB MONITOR OUTPUT
============================
InnoDB: ###### Diagnostic info printed to the standard error stream
2023-12-20T00:35:53.079935+08:00 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2023-12-20 00:35:53 0x7f197b16f700 InnoDB: Assertion failure in thread 139747416012544 in file ut0ut.cc line 947
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
16:35:53 UTC - mysqld got signal 6 ;
key_buffer_size=33554432
read_buffer_size=65536
max_used_connections=1648
max_threads=100003
thread_count=1584
connection_count=1574
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 32177930 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x30000
/home/work/mysql/bin/mysqld(my_print_stacktrace+0x2c)[0xee68bc]
/home/work/mysql/bin/mysqld(handle_fatal_signal+0x461)[0xd32be1]
/lib64/libpthread.so.0(+0xf370)[0x7f2edbc46370]
/lib64/libc.so.6(gsignal+0x37)[0x7f2eda4251d7]
/lib64/libc.so.6(abort+0x148)[0x7f2eda4268c8]
/home/work/mysql/bin/mysqld[0x78ec54]
/home/work/mysql/bin/mysqld(_ZN2ib5fatalD1Ev+0xee)[0x11781be]
/home/work/mysql/bin/mysqld(srv_error_monitor_thread+0xae0)[0x11152f0]
/lib64/libpthread.so.0(+0x7dc5)[0x7f2edbc3edc5]
/lib64/libc.so.6(clone+0x6d)[0x7f2eda4e773d]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
==========================================
with the error log, it's a dead lock between thread 139747105769216 and 139747399227136;
and there has logs:
---TRANSACTION 15346024761, ACTIVE (PREPARED) 256 sec committing
3 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 2
MySQL thread id 17509818, OS thread handle 139747105769216, query id 9108196723 System lock
so, check thread id 17509818, from our monitor:
2023-12-20 00:13:02 17509818 system user Connect 0 System lock
2023-12-20 00:14:01 17509818 system user Connect 0 System lock INSERT INTO xx.xx
2023-12-20 00:15:01 17509818 system user Connect 0 Waiting for an event from Coordinator
2023-12-20 00:16:02 17509818 system user Connect 0 Waiting for an event from Coordinator
2023-12-20 00:17:01 17509818 system user Connect 1 Waiting for an event from Coordinator
17509818 is a work thread which execute DML read from relay log;
our slave options:
slave-parallel-type = LOGICAL_CLOCK
slave-parallel-workers = 8
slave_pending_jobs_size_max = 134217728
slave_preserve_commit_order = 0
transaction_write_set_extraction = XXHASH64
binlog_transaction_dependency_tracking = WRITESET
How to repeat:
can't repeat, we meet this problem last time is 2023/09/25;