Bug #99577 mysql-5.7 fails to recover from a crash
Submitted: 14 May 2020 18:18 Modified: 18 May 2020 12:06
Reporter: Mohit Joshi Email Updates:
Status: Not a Bug Impact on me:
Category:MySQL Server: InnoDB storage engine Severity:S6 (Debug Builds)
Version:5.7.30 OS:Any
Assigned to: CPU Architecture:Any

[14 May 2020 18:18] Mohit Joshi
While doing crash recovery testing on MySQL-5.7.30, the server fails to recover and crashes with the below stacktrace:

#0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
#1  0x00000000019892c2 in my_write_core (sig=6) at /home/mohit.joshi/upstream-5.7/mysys/stacktrace.c:261
#2  0x0000000000f16dc4 in handle_fatal_signal (sig=6) at /home/mohit.joshi/upstream-5.7/sql/signal_handler.cc:227
#3  <signal handler called>
#4  0x00007f2267a77428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#5  0x00007f2267a7902a in __GI_abort () at abort.c:89
#6  0x0000000001c37393 in ut_dbg_assertion_failed (expr=0x223f008 "!ret", file=0x2233f68 "/home/mohit.joshi/upstream-5.7/storage/innobase/handler/ha_innodb.cc", line=20692) at /home/mohit.joshi/upstream-5.7/storage/innobase/ut/ut0dbg.cc:75
#7  0x00000000019f5874 in innobase_init_vc_templ (table=0x7f2204086ed0) at /home/mohit.joshi/upstream-5.7/storage/innobase/handler/ha_innodb.cc:20692
#8  0x0000000001b71e0c in row_purge_parse_undo_rec (node=0x5785010, undo_rec=0x7f2214066640 ".\332\016\070\201", <incomplete sequence \351>, updated_extern=0x7f221abfddc6, thr=0x5784f48) at /home/mohit.joshi/upstream-5.7/storage/innobase/row/row0purge.cc:921
#9  0x0000000001b723a7 in row_purge (node=0x5785010, undo_rec=0x7f2214066640 ".\332\016\070\201", <incomplete sequence \351>, thr=0x5784f48) at /home/mohit.joshi/upstream-5.7/storage/innobase/row/row0purge.cc:1063
#10 0x0000000001b72683 in row_purge_step (thr=0x5784f48) at /home/mohit.joshi/upstream-5.7/storage/innobase/row/row0purge.cc:1145
#11 0x0000000001af1b4f in que_thr_step (thr=0x5784f48) at /home/mohit.joshi/upstream-5.7/storage/innobase/que/que0que.cc:1057
#12 0x0000000001af1d60 in que_run_threads_low (thr=0x5784f48) at /home/mohit.joshi/upstream-5.7/storage/innobase/que/que0que.cc:1119
#13 0x0000000001af1f29 in que_run_threads (thr=0x5784f48) at /home/mohit.joshi/upstream-5.7/storage/innobase/que/que0que.cc:1159
#14 0x0000000001bc6c33 in srv_task_execute () at /home/mohit.joshi/upstream-5.7/storage/innobase/srv/srv0srv.cc:2479
#15 0x0000000001bc6dd8 in srv_worker_thread (arg=0x0) at /home/mohit.joshi/upstream-5.7/storage/innobase/srv/srv0srv.cc:2529
#16 0x00007f22686b46ba in start_thread (arg=0x7f221abfe700) at pthread_create.c:333
#17 0x00007f2267b4941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

How to repeat:
git clone https://github.com/Percona-QA/pstress pstress_5.7
cd pstress_5.7
cmake . -DBASEDIR=<path_to_5.7_basedir> -DMYSQL=ON

cd pstress_5.7/pstress
./pstress-run.sh pstress-run.conf

Before using the above tool, make sure to edit the pstress-run.conf file and set the below variables

DYNAMIC_QUERY_PARAMETER="--tables 20 --records 700 --log-all-queries --log-failed-queries"

BASEDIR=<path of binaries>
[15 May 2020 11:58] MySQL Verification Team
Hi Mr. Joshi,

Thank you for your bug report.

However, we do not use pstress. 

Can you repeat this behaviour with sysbench or mysqlslap ???

If not, are there any binaries of pstress available for macOS ???

Many thanks in advance.
[15 May 2020 14:31] Mohit Joshi
Hi Sinisa,

The same steps can be used to generate the binary & perform the test on MacOS. 

Mohit Joshi
[18 May 2020 12:06] MySQL Verification Team
Hi Mr. Joshi,

We have discovered what pstress does essentially. These are the steps:

1.  launch mysqld
2.  run some DML + DDL in multiple threads
3.  kill -9 mysqld
4.  repeat 50 times.

At some point InnoDB crash recovery apparently fails with the debug assertion that you provided.

Using entire framework/"pstress" for this seems like an overkill in complexity. Next, it does not represent a typical production environment.

Last, but not least, pstress is NOT accepted as standard tool by us for stress testing. Neither are harshest kills, because we can not check whether the entire setup is 100 % ACID compliant.

Not a bug.