Bug #104273 Replication sql_thread hung while executing XA START in System lock state
Submitted: 11 Jul 2021 16:33 Modified: 16 Jul 2021 22:45
Reporter: Ji Zhang Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.7.30 OS:Red Hat (7.5)
Assigned to: MySQL Verification Team CPU Architecture:x86

[11 Jul 2021 16:33] Ji Zhang
Description:

We have noticed multiple slave hung issues on mysql 5.7.30 version.
The sql thread hung in "System lock" state while executing "XA Start" command;Other sql threads are waiting for this thread,Status is "Waiting for preceding transaction to commit";
And stopping replication slave using "stop slave" didnt work as the mysql is waiting for the replication sql_thread to exit. 
The only solution is to restart mysqld instance.

mysql>   select * from information_schema.processlist where command!='Sleep' order by time;
+-------+-------------+-----------------+--------------------+---------+--------+---------------------------------------------------+-----------------------------------------------------------------------------------+
| ID    | USER        | HOST            | DB                 | COMMAND | TIME   | STATE                                             | INFO                                                                              |
+-------+-------------+-----------------+--------------------+---------+--------+---------------------------------------------------+-----------------------------------------------------------------------------------+
| 10297 | root        | 127.0.0.1:48130 | performance_schema | Query   |      0 | executing                                         | select * from information_schema.processlist where command!='Sleep' order by time |
|    38 | system user |                 | NULL               | Connect |   5173 | Waiting for preceding transaction to commit       | NULL                                                                              |
|    36 | system user |                 | NULL               | Connect |   5173 | Waiting for preceding transaction to commit       | NULL                                                                              |
|    31 | system user |                 | NULL               | Connect |   5173 | Waiting for preceding transaction to commit       | NULL                                                                              |
|    28 | system user |                 | NULL               | Connect |   5173 | Waiting for preceding transaction to commit       | NULL                                                                              |
|    29 | system user |                 | NULL               | Connect |   5173 | Waiting for preceding transaction to commit       | NULL                                                                              |
|    27 | system user |                 | NULL               | Connect |   5173 | Waiting for preceding transaction to commit       | NULL                                                                              |
|    25 | system user |                 | NULL               | Connect |   5173 | Waiting for preceding transaction to commit       | NULL                                                                              |
|    26 | system user |                 | NULL               | Connect |   5173 | System lock                                       | NULL                                                                              |
|    23 | system user |                 | NULL               | Connect |   5173 | Waiting for preceding transaction to commit       | NULL                                                                              |
|    24 | system user |                 | NULL               | Connect |   5173 | Waiting for preceding transaction to commit       | NULL                                                                              |
|    22 | system user |                 | NULL               | Connect |   5174 | Waiting for slave workers to process their queues | NULL                                                                              |
|    39 | system user |                 | NULL               | Connect | 933155 | Queueing master event to the relay log            | NULL                                                                              |
+-------+-------------+-----------------+--------------------+---------+--------+---------------------------------------------------+-----------------------------------------------------------------------------------+
19 rows in set (0.00 sec)

The following is the pstack information of the sql thread in question:

#             cat 82840.pstack
Thread 1 (process 82840):
#0  0x00007fe2cd5cbeed in nanosleep () from /lib64/libpthread.so.0
#1  0x00000000011991b0 in os_thread_sleep (tm=<optimized out>) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/storage/innobase/os/os0thread.cc:287
#2  0x000000000112dd8b in wait (trx=0x7fe2c0ddb750) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/storage/innobase/include/trx0trx.h:1572
#3  TrxInInnoDB::enter (trx=0x7fe2c0ddb750, disable=<optimized out>) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/storage/innobase/include/trx0trx.h:1492
#4  0x000000000111feb6 in begin_stmt (trx=0x7fe2c0ddb750) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/storage/innobase/include/trx0trx.h:1423
#5  ha_innobase::external_lock (this=0x7fc84caacdb0, thd=0x7fc84c0008c0, lock_type=<optimized out>) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/storage/innobase/handler/ha_innodb.cc:15786
#6  0x000000000084deda in handler::ha_external_lock (this=0x7fc84caacdb0, thd=0x7fc84c0008c0, lock_type=1) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/sql/handler.cc:8009
#7  0x0000000000e8297c in lock_external (count=1, tables=<optimized out>, thd=0x7fc84c0008c0) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/sql/lock.cc:396
#8  mysql_lock_tables (thd=0x7fc84c0008c0, tables=0x7fc84c005d48, count=1, flags=0) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/sql/lock.cc:332
#9  0x0000000000cf3595 in lock_tables (thd=0x7fc84c0008c0, tables=0x7fc84cb5e620, count=<optimized out>, flags=<optimized out>) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/sql/sql_base.cc:6762
#10 0x0000000000cfc26a in open_and_lock_tables (thd=0x7fc84c0008c0, tables=0x7fc84cb5e620, flags=0, prelocking_strategy=<optimized out>) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/sql/sql_base.cc:6580
#11 0x0000000000f140ee in open_and_lock_tables (flags=0, tables=<optimized out>, thd=<optimized out>) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/sql/sql_base.h:486
#12 Rows_log_event::do_apply_event (this=0x7fc85c091780, rli=0x7fc85c1586a0) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/sql/log_event.cc:11009
#13 0x0000000000f68e05 in slave_worker_exec_job_group (worker=0x7fc85c1586a0, rli=0xacf49230) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/sql/rpl_rli_pdb.cc:2673
#14 0x0000000000f5430b in handle_slave_worker (arg=0x7fc85c1586a0) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/sql/rpl_slave.cc:6239
#15 0x00000000013f6034 in pfs_spawn_thread (arg=0x7fc85c15c360) at /export/home/pb2/build/sb_0-38465026-1584987238.22/mysql-5.7.30/storage/perfschema/pfs.cc:2197
#16 0x00007fe2cd5c4dd5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007fe2cc07eb3d in clone () from /lib64/libc.so.6

How to repeat:
None.

Suggested fix:
Restart mysqld.
[11 Jul 2021 16:34] Ji Zhang
All thread pstack info

Attachment: all_thread_pstack_info.txt (text/plain), 142.89 KiB.

[11 Jul 2021 16:36] Ji Zhang
Performance schema information of sql thread No. 55

Attachment: thread_history_info.txt (text/plain), 90.08 KiB.

[16 Jul 2021 22:45] MySQL Verification Team
Hi,

I'm not able to reproduce this neither on 5.7 nor 8.0.

Can you provide a repeatable test case?

thanks