Bug #93170 undo truncation in 8.0.13 is not crash safe
Submitted: 13 Nov 2018 7:47 Modified: 7 May 9:05
Reporter: zhai weixiang (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:8.0.13 OS:Any
Assigned to: CPU Architecture:Any

[13 Nov 2018 7:47] zhai weixiang
Description:
In 8.0.13. two checkpoint before and after undo tablespace was removed and insteadly, it uses flush oberver to remove releated dirty page before truncation, and flush related pages after truncation. 

The problem is that, the checkpoint lsn may be very old and there maybe some log entries belonging to the truncated area of undo tablespace. If the server is crashed after undo truncation, the crash recovery may fail to perform crash recovery because the requested page doesn't exist anymore..

How to repeat:
run test case innodb_undo.truncate_recover and it fails with assertion like this:

2018-11-13T07:44:57.787206Z 1 [Note] [MY-012533] [InnoDB] 100%
2018-11-13T07:44:57.787898Z 0 [ERROR] [MY-013183] [InnoDB] Assertion failure: log0recv.cc:1873:0 || offs == 0 + (FIL_PAGE_DATA + 36 + 2 * 10) + 0 || offs == 0 + (FIL_PAGE_DATA + 36 + 2 * 10) + 4 || offs == 36 + FIL_PAGE_DATA || offs == 36 + FIL_PAGE_DATA + FIL_ADDR_PAGE || offs == 36 + FIL_PAGE_DATA + FIL_ADDR_PAGE + FIL_ADDR_SIZE || offs == 36 + FIL_PAGE_DATA + 4 || offs == 36 + FIL_PAGE_DATA + 0 || offs == (36 + 10) + FIL_PAGE_DATA + 4 || offs == (36 + 10) + FIL_PAGE_DATA + 0 || offs == 36 + FIL_PAGE_DATA + FIL_ADDR_PAGE + 0 || offs == 36 + FIL_PAGE_DATA + FIL_ADDR_PAGE + FIL_ADDR_SIZE thread 140115214153472
InnoDB: We intentionally generate a memory trap.

backtrace:

[Thread debugging using libthread_db enabled]
Core was generated by `/u01/code/mysql-server/runtime_output_directory/mysqld --defaults-group-suffix='.
Program terminated with signal 6, Aborted.
#0  0x00007f6f2cff397c in pthread_kill () from /lib64/libpthread.so.0
#0  0x00007f6f2cff397c in pthread_kill () from /lib64/libpthread.so.0
#1  0x0000000005437f43 in my_write_core (sig=6) at /u01/code/mysql-server/mysys/stacktrace.cc:278
#2  0x0000000002f4a283 in handle_fatal_signal (sig=6) at /u01/code/mysql-server/sql/signal_handler.cc:249
#3  <signal handler called>
#4  0x00007f6f2b84f625 in raise () from /lib64/libc.so.6
#5  0x00007f6f2b850e05 in abort () from /lib64/libc.so.6
#6  0x0000000005ccb034 in ut_dbg_assertion_failed (expr=0x7367f00 "0 || offs == 0 + (FIL_PAGE_DATA + 36 + 2 * 10) + 0 || offs == 0 + (FIL_PAGE_DATA + 36 + 2 * 10) + 4 || offs == 36 + FIL_PAGE_DATA || offs == 36 + FIL_PAGE_DATA + FIL_ADDR_PAGE || offs == 36 + FIL_PAGE"..., file=0x7367888 "/u01/code/mysql-server/storage/innobase/log/log0recv.cc", line=1873) at /u01/code/mysql-server/storage/innobase/ut/ut0dbg.cc:90
#7  0x00000000059a8946 in recv_parse_or_apply_log_rec_body (type=MLOG_4BYTES, ptr=0x7f6f168074e0 "", end_ptr=0x7f6f168074e7 "", space_id=4294967279, page_no=6, block=0x7f6f166d1c68, mtr=0x7f6f1d911bf0, parsed_bytes=18446744073709551615) at /u01/code/mysql-server/storage/innobase/log/log0recv.cc:1857
#8  0x00000000059acd9e in recv_recover_page_func (just_read_in=true, block=0x7f6f166d1c68) at /u01/code/mysql-server/storage/innobase/log/log0recv.cc:2540
#9  0x0000000005dca9f1 in buf_page_io_complete (bpage=0x7f6f166d1c68, evict=false) at /u01/code/mysql-server/storage/innobase/buf/buf0buf.cc:4786
#10 0x0000000005f5db14 in fil_aio_wait (segment=2) at /u01/code/mysql-server/storage/innobase/fil/fil0fil.cc:7430
#11 0x0000000005bdc97e in io_handler_thread (segment=2) at /u01/code/mysql-server/storage/innobase/srv/srv0start.cc:267
#12 0x0000000005bf93b7 in std::__invoke_impl<void, void (*&)(unsigned long), unsigned long&> (__f=@0x7f6f1d912b50, __args#0=@0x7f6f1d912b58) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/invoke.h:60
#13 0x0000000005bf9254 in std::__invoke<void (*&)(unsigned long), unsigned long&> (__fn=@0x7f6f1d912b50, __args#0=@0x7f6f1d912b58) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/invoke.h:95
#14 0x0000000005bf9082 in std::_Bind<void (*(long unsigned int))(long unsigned int)>::__call<void, 0>(std::tuple<> &, std::_Index_tuple<0>) (this=0x7f6f1d912b50, __args=empty std::tuple) at /opt/rh/devtoolset-7/root/usr/include/c++/7/functional:467
#15 0x0000000005bf8dc0 in std::_Bind<void (*(long unsigned int))(long unsigned int)>::operator()<>(void) (this=0x7f6f1d912b50) at /opt/rh/devtoolset-7/root/usr/include/c++/7/functional:551
#16 0x0000000005bf898e in Runnable::operator()<void (*)(unsigned long), unsigned long> (this=0x7f6f183ebe88, f=@0x7f6f183ebe80, args#0=@0x7f6f183ebe78) at /u01/code/mysql-server/storage/innobase/include/os0thread-create.h:92
#17 0x0000000005bf7d08 in std::__invoke_impl<void, Runnable, void (*)(unsigned long), unsigned long> (__f=..., __args#0=@0x7f6f183ebe80, __args#1=@0x7f6f183ebe78) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/invoke.h:60
#18 0x0000000005bf474d in std::__invoke<Runnable, void (*)(unsigned long), unsigned long> (__fn=..., __args#0=@0x7f6f183ebe80, __args#1=@0x7f6f183ebe78) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/invoke.h:95
#19 0x0000000005bf9617 in std::thread::_Invoker<std::tuple<Runnable, void (*)(unsigned long), unsigned long> >::_M_invoke<0, 1, 2> (this=0x7f6f183ebe78) at /opt/rh/devtoolset-7/root/usr/include/c++/7/thread:234
#20 0x0000000005bf94f3 in std::thread::_Invoker<std::tuple<Runnable, void (*)(unsigned long), unsigned long> >::operator() (this=0x7f6f183ebe78) at /opt/rh/devtoolset-7/root/usr/include/c++/7/thread:243
#21 0x0000000005bf94ae in std::thread::_State_impl<std::thread::_Invoker<std::tuple<Runnable, void (*)(unsigned long), unsigned long> > >::_M_run (this=0x7f6f183ebe70) at /opt/rh/devtoolset-7/root/usr/include/c++/7/thread:186
#22 0x000000000659d48f in execute_native_thread_routine ()
#23 0x00007f6f2cfeeaa1 in start_thread () from /lib64/libpthread.so.0
#24 0x00007f6f2b90593d in clone () from /lib64/libc.so.6

Suggested fix:
N/A
[13 Nov 2018 8:43] Umesh Shastry
Hello Zhai Weixiang,

Thank you for the report and feedback!

regards,
Umesh
[13 Nov 2018 8:44] Umesh Shastry
test results

Attachment: 93170_8.0.13.results (application/octet-stream, text), 150.76 KiB.

[11 Feb 13:27] zhai weixiang
I believe this bug should be fixed by WL#9508. After this commit, the undo truncation will behavior like drop and create a new space with new space id.
[7 May 9:05] Erlend Dahl
The particular assert that this bug points out does not happen after WL9508
was introduced in 8.0.14.
 
This is because WL9508 changes the undo tablespace truncation process so that
the each time a space is truncated it gets a new space_id.  In this way, old
redo log entries are referring to a space_id that has been deleted.
 
Hence this bug was fixed in 8.0.14.