Bug #93760 InnoDB got assert failure while figuring out space id at startup
Submitted: 28 Dec 2018 23:44 Modified: 8 Feb 2019 6:57
Reporter: Fungo Wang (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S6 (Debug Builds)
Version:8.0.13 OS:Any
Assigned to: CPU Architecture:Any

[28 Dec 2018 23:44] Fungo Wang
Description:
We spotted the crash while running mtr regression test, the corresponding testcase is innodb.innodb_redo_debug_1. This crash is an assert failure at startup after unmormally shutdown, with debug version. The assert failure could not be stably repeated, but fortunately we have the coredump, and after analyzing the stack, we beleive this is a bug.

This is the crash stack from coredump, with file name and line number.

```
#0  0x00007fcdc4a5297c in pthread_kill () from /lib64/libpthread.so.0
#1  0x00000000054d4e47 in my_write_core (sig=6) at 8.0.13/mysys/stacktrace.cc:278
#2  0x0000000002fc83f9 in handle_fatal_signal (sig=6) at 8.0.13/sql/signal_handler.cc:249
#3  <signal handler called>                                                                                                                              
#4  0x00007fcdc30b0625 in raise () from /lib64/libc.so.6
#5  0x00007fcdc30b1e05 in abort () from /lib64/libc.so.6
#6  0x0000000005cce858 in ut_dbg_assertion_failed (expr=0x74e0fd3 "header.m_version == 1", file=0x74e0f28 "8.0.13/storage/innobase/os/file.cc", line=137) at 8.0.13/storage/innobase/ut/ut0dbg.cc:90
#7  0x0000000005ce168c in Compression::deserialize (dblwr_recover=false, src=0x7fcdb02eaa00 "\373!\200", dst=0x0, dst_len=2048) at 8.0.13/storage/innobase/os/file.cc:137
#8  0x0000000005ce1fbe in os_file_decompress_page (dblwr_recover=false, src=0x7fcdb02eaa00 "\373!\200", dst=0x0, dst_len=2048) at 8.0.13/storage/innobase/os/file.cc:264
#9  0x00000000059d3f81 in os_file_io_complete (type=..., fh=4, buf=0x7fcdb02eaa00 "\373!\200", scratch=0x0, src_len=2048, offset=90112, len=2048) at 8.0.13/storage/innobase/os/os0file.cc:1682
#10 0x00000000059dd7d3 in os_file_io (in_type=..., file=4, buf=0x7fcdb02eaa00, n=2048, offset=90112, err=0x7fcdb716f2ec) at 8.0.13/storage/innobase/os/os0file.cc:4909
#11 0x00000000059dec2a in os_file_pread (type=..., file=4, buf=0x7fcdb02eaa00, n=2048, offset=90112, err=0x7fcdb716f2ec) at 8.0.13/storage/innobase/os/os0file.cc:5053
#12 0x00000000059dee3b in os_file_read_page (type=..., file=4, buf=0x7fcdb02eaa00, offset=90112, n=2048, o=0x0, exit_on_err=true) at 8.0.13/storage/innobase/os/os0file.cc:5084
#13 0x00000000059e0fa5 in os_file_read_func (type=..., file=4, buf=0x7fcdb02eaa00, offset=90112, n=2048) at 8.0.13/storage/innobase/os/os0file.cc:5491
#14 0x0000000006000ad2 in pfs_os_file_read_func (type=..., file=..., buf=0x7fcdb02eaa00, offset=90112, n=2048, src_file=0x7555b50 "8.0.13/storage/innobase/fsp/fsp0file.cc", src_line=765) at 8.0.13/storage/innobase/include/os0file.ic:258
#15 0x000000000600620c in Datafile::find_space_id (this=0x7fcdb7170400) at 8.0.13/storage/innobase/fsp/fsp0file.cc:765
#16 0x0000000005fa0285 in Fil_system::get_tablespace_id (filename=...) at 8.0.13/storage/innobase/fil/fil0fil.cc:10032
#17 0x0000000005fa06d5 in Tablespace_dirs::duplicate_check (this=0x7fcdb003a008, start=..., end=..., thread_id=0, mutex=0x7fcdb71714d0, unique=0x7fcdb7171530, duplicates=0x7fcdb7171500) at 8.0.13/storage/innobase/fil/fil0fil.cc:10066                                                                                                                      

#18 0x0000000005fd6940 in std::__invoke_impl<void, void () at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/invoke.h:73                                                                                                                                              

#19 0x0000000005fd2d1a in std::__invoke<void () at /usr/include/c++/7/bits/invoke.h:95
#20 0x0000000005fccf4d in std::_Bind<void () at /usr/include/c++/7/functional:467                                                                                                        
#21 0x0000000005fc4313 in std::_Bind<void () at /usr/include/c++/7/functional:551                                        
#22 0x0000000005fbac68 in std::_Function_handler<void () at /usr/include/c++/7/bits/std_function.h:316
#23 0x0000000005fbb880 in std::function<void () at /usr/include/c++/7/bits/std_function.h:706
#24 0x0000000005fb3a89 in par_for<std::vector< at 8.0.13/storage/innobase/include/os0thread-create.h:190

#25 0x0000000005fa3007 in Tablespace_dirs::scan (this=0x7fcdb003a008, in_directories=...) at 8.0.13/storage/innobase/fil/fil0fil.cc:10256
#26 0x0000000005fac5ad in Fil_system::scan (this=0x7fcdb0039fc8, directories=...) at 8.0.13/storage/innobase/fil/fil0fil.cc:1388
#27 0x0000000005fa398e in fil_scan_for_tablespaces (directories=...) at 8.0.13/storage/innobase/fil/fil0fil.cc:10287
#28 0x0000000005bdaa4b in srv_start (create_new_db=false, scan_directories=...) at 8.0.13/storage/innobase/srv/srv0start.cc:1724
#29 0x00000000056e7a24 in innobase_init_files (dict_init_mode=DICT_INIT_CHECK_FILES, tablespaces=0x7fcdb7179760) at 8.0.13/storage/innobase/handler/ha_innodb.cc:4508
#30 0x00000000057052b9 in innobase_ddse_dict_init (dict_init_mode=DICT_INIT_CHECK_FILES, version=80013, tables=0x7fcdb7179780, tablespaces=0x7fcdb7179760) at 8.0.13/storage/innobase/handler/ha_innodb.cc:11011

```

I think these 2 functions should be foucsed, 
1. Fil_system::get_tablespace_id(), 
2. Datafile::find_space_id()

In Datafile::find_space_id(), page sizes from UNIV_ZIP_SIZE_MIN(1K) to UNIV_PAGE_SIZE_MAX(64K) are tried to parse the page.

For an actual 16K page, if InnoDB try to use 2K page size to parse, and for the 2nd - 8th round parsing, the page header (InnoDB thought) is actually part of the page body. It's possible the *is_compressed_page(src)* is TRUE (depend the data value), and in our coredump it is.

>>> f 7
#7  0x0000000005ce168c in Compression::deserialize (dblwr_recover=false, src=0x7fcdb02eaa00 "\373!\200", dst=0x0, dst_len=2048) 
137       ut_ad(header.m_version == 1);
>>> p src + 24
$11 = (unsigned char *) 0x7fcdb02eaa18 ""
>>> p *(src + 24)
$12 = 0 '\000'
>>> p *(src + 25)
$13 = 14 '\016'

What a coincidence!

The header version is not 1, so assert failure.
>>> p header.m_version
$14 = 127 '\177'

The reason why Fil_system::get_tablespace_id() falled into the heavy `file.find_space_id()`, is that the first 3 pages with different space ids:

>>> f 16
#16 0x0000000005fa0285 in Fil_system::get_tablespace_id (filename=...) 
>>> p *(space_ids._M_impl._M_start)@3
$20 = {[0] = 2, [1] = 0, [2] = 2}

This could probably caused by abnormal shutdown, and the 2nd page is not fully written to disk.

Unfortunately I can not provide the coredump file, but if you want more information, I'm happy to provide :)

How to repeat:
The crash is reported by running innodb.innodb_redo_debug_1 testcase, but it's not stably repeated. The straight way is to read the code.

Suggested fix:
Remove the ut_ad(header.m_version == 1); assert in Compression::deserialize().
[16 Jan 2019 15:13] MySQL Verification Team
Hi,

Thank you for your bug report.

Unfortunately, I have to agree with you, since that field can take, indeed, different values.

I also had to set a Severity that is compatible with debug builds.

Thank you very much.
[16 Jan 2019 15:16] MySQL Verification Team
I had a typo.
[23 Jan 2019 13:59] Erlend Dahl
This has been seen in internal testing, but very seldom (once per month maybe in standard per-push runs).
[8 Feb 2019 6:57] Erlend Dahl
[23 Jan 2019 15:29] Daniel T Price

Fixed as of the upcoming 8.0.16 release, and here's the changelog entry:

Invalid assertions were raised at startup after a failure to retrieve the
space ID of a compressed file-per-table tablespace. The invalid assertion
code was removed.