Bug #111863 Assertion failed in tp_wait_end(THD *thd)
Submitted: 24 Jul 2023 14:37 Modified: 26 Jul 2023 12:21
Reporter: ziyao lian Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S6 (Debug Builds)
Version:8.0.25 OS:Linux
Assigned to: CPU Architecture:x86

[24 Jul 2023 14:37] ziyao lian
Description:
I got an assertion fail

THD_WAIT_NONE != connection->wait_type && 0 == connection->num_nested_waits

and the print result is connection->wait_type = THD_WAIT_BL_DUMPER and connection->num_nested_waits == 1

num_nested_waits auto-decrement behind the assertion.

(gdb) bt
#0  0x0000ffff9fffb470 in raise () from /usr/lib64/libc.so.6
#1  0x0000ffff9fffc81c in abort () from /usr/lib64/libc.so.6
#2  0x0000ffff9fff491c in ?? () from /usr/lib64/libc.so.6
#3  0x0000ffff9fff499c in __assert_fail () from /usr/lib64/libc.so.6
#4  0x0000000002e54dd4 in tp_wait_end (thd=0xfffe00b0c000) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/plugin/threadpool/threadpool_unix.cc:1448
#5  0x000000000144b280 in scheduler_wait_net_end () at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/sql/conn_handler/connection_handler_manager.cc:108
#6  0x0000000002faf304 in vio_io_wait (vio=0xfffe00aa6730, event=<optimized out>, timeout=60000) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/vio/viosocket.cc:970
#7  0x0000000002faf388 in vio_socket_io_wait (vio=<optimized out>, event=<optimized out>) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/vio/viosocket.cc:129
#8  0x0000000002faf958 in vio_write (vio=0xfffe00aa6730, buf=0xfffe00b5c164 "\215\001", size=14860) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/vio/viosocket.cc:313
#9  0x000000000135bb24 in net_write_raw_loop (net=net@entry=0xfffe00b0dd58, buf=0xfffe00b5c164 "\215\001", buf@entry=0xfffe00b5bb70 "\001", count=14860)
    at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/sql-common/net_serv.cc:1025
#10 0x000000000135e75c in net_write_packet (net=0xfffe00b0dd58, packet=0xfffe00b5bb70 "\001", length=<optimized out>) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/sql-common/net_serv.cc:1346
#11 0x000000000135e92c in net_write_buff (net=net@entry=0xfffe00b0dd58, packet=packet@entry=0xfffe0db0a030 "", len=len@entry=8179)
    at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/sql-common/net_serv.cc:984
#12 0x000000000135ecf0 in my_net_write (net=0xfffe00b0dd58, packet=0xfffe0db0a030 "", len=8179) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/sql-common/net_serv.cc:506
#13 0x000000000227d430 in Binlog_sender::send_packet (this=this@entry=0xfffe353b24f0) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/include/sql_string.h:249
#14 0x000000000227a0f8 in Binlog_sender::send_events (this=0xfffe353b24f0, reader=0xfffe353b1ef0, end_pos=47641592) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/sql/rpl_binlog_sender.cc:670
#15 0x000000000227c1fc in Binlog_sender::send_binlog (this=0xfffe353b24f0, reader=0xfffe353b1ef0, start_pos=<optimized out>)
    at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/sql/rpl_binlog_sender.cc:521
#16 0x000000000227c690 in Binlog_sender::run (this=this@entry=0xfffe353b24f0) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/sql/rpl_binlog_sender.cc:419
#17 0x0000000002272720 in mysql_binlog_send (thd=0xfffe00b0c000, log_ident=0xfffe353b30d0 "", pos=4, slave_gtid_executed=0xfffe353b2f10, flags=0)
    at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/sql/rpl_master.cc:1025
#18 0x0000000002277144 in com_binlog_dump_gtid (thd=thd@entry=0xfffe00b0c000, packet=packet@entry=0xfffe00b5bb71 "", packet_length=<optimized out>)
    at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/sql/rpl_master.cc:1004
#19 0x00000000011ce724 in dispatch_command (thd=0xfffe00b0c000, com_data=0xfffe353b3d70, command=COM_BINLOG_DUMP_GTID) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/sql/protocol_classic.h:198
#20 0x00000000011cfae0 in do_command (thd=thd@entry=0xfffe00b0c000) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/sql/sql_parse.cc:1335
#21 0x0000000002e4fd88 in threadpool_process_request (thd=0xfffe00b0c000) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/plugin/threadpool/threadpool_common.cc:241
#22 0x0000000002e53d50 in handle_event (connection=0xfffe00aefd30) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/plugin/threadpool/threadpool_unix.cc:1574
#23 0x0000000002e53e68 in worker_main (param=param@entry=0x5d47600 <all_groups+1536>) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/plugin/threadpool/threadpool_unix.cc:1620
#24 0x0000000002d74fe8 in pfs_spawn_thread (arg=0xffff9fa4a4f0) at /data/fuxi_ci_workspace/63ecd945cb305611deaa6918/storage/perfschema/pfs.cc:2899
#25 0x0000ffffa05d67b0 in ?? () from /usr/lib64/libpthread.so.0
#26 0x0000ffffa009a56c in ?? () from /usr/lib64/libc.so.6

How to repeat:
hard to repeat, just shutdown when wait for output buffer to be writable

Suggested fix:
modify the assertion
[24 Jul 2023 14:58] MySQL Verification Team
Hi Mr. lian,

Thank you for your bug report.

However, .........

We are not able to repeat your problem at all. We tried both release and debug binaries, with 8.0.34 and 8.1.0.

You simply have a case where the other host is off-line or TCP/IP is too slow, so you get a timeout.

This is not a bug.
[24 Jul 2023 14:59] MySQL Verification Team
This sometimes happens when binary log entries are sent to another, but unreachable, remote or overloaded server.

This is not a bug.
[25 Jul 2023 2:23] ziyao lian
In function tp_wait_end() for the THD_WAIT_BL_DUMPER connection,if this is normal, the BINLOG DUMP thread connection will trigger assert and cannot enter the else block.
We can't run our business.
[25 Jul 2023 12:17] MySQL Verification Team
Hi Mr. lian,

First of all, you are using an old release of our product. Second and more important.

We have. noticed that you are using our Thread Pool Plugin.

Thread Pool Plugin is a commercial product/plugin and using community Bugs DB is not an appropriate place to report issues against this product/plugin. If you are Oracle customer, please open SR and we will assist you via our Support team.

This is not the appropriate forum.
[26 Jul 2023 12:21] ziyao lian
I repeated it.

First, set up the replication relationship: one primary database and two standby databases.

master's my.cnf:

slave's my.cnf:

slave2's my.cnf:

Then, create the database test, table t1, and table t2.

create database test;
use test;
create table t1 ( id int, a int)engine=innodb;
create table t2 (id int, a int)engine=innodb;

Open a client of the master and run the following stored procedure:
delimiter ;;
create procedure idata()
begin  declare i int;  
set i=1;  
while(i<=100000)
do    
insert into t1 values(i, i+1);    
set i=i+1;  
end while;
end;;
delimiter ;
call idata();

Open another client of the master and run the following stored procedure:
delimiter ;;
create procedure idata2()
begin  declare i int;  
set i=1;  
while(i<=100000)
do    
insert into t2 values(i, i+1);    
set i=i+1;  
end while;
end;;
delimiter ;
call idata2();

Wait for a while, the maser crashes, the replication status becomes abnormal, and the I/O thread of the salve stops.
[26 Jul 2023 12:34] MySQL Verification Team
Hi Mr. lian,

We have to repeat our last comment here.

First of all, try 8.0.34 and 8.1.0.

More important ......

We have. noticed that you are using our Thread Pool Plugin.

Thread Pool Plugin is a commercial product/plugin and using community
Bugs DB is not an appropriate place to report issues against Enterprise
product / plugin. If you are Oracle customer, please open SR and we will
assist you via our Support team.

This is not the appropriate forum for our Enterprise products ....