Bug #79378 buf_block_align() makes incorrect assumptions about chunk size
Submitted: 23 Nov 2015 5:01 Modified: 20 Jul 2016 12:20
Reporter: Alexey Kopytov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any

[23 Nov 2015 5:01] Alexey Kopytov
Description:
This is likely related to bug #74775, but since that report discussed
multiple potential issues, I'm reporting this one separately.

The following code in buf_block_align() has been written with the
assumption that srv_buf_pool_chunk_unit is the total size of all pages
in a buffer pool chunk:

	if (ptr < reinterpret_cast<byte*>(srv_buf_pool_chunk_unit)) {
		it = chunk_map->upper_bound(0);
	} else {
		it = chunk_map->upper_bound(
			ptr - srv_buf_pool_chunk_unit);
	}

That is, to find the chunk corresponding to the give pointer, step back
by srv_buf_pool_chunk_unit bytes and use std::map::upper_bound() to find
the first chunk in the map whose key is >= the resulting pointer.

However, the real size of a chunk (and thus, the total size of its
pages) may differ from the value configured with
innodb_buffer_pool_chunk_size due to rounding up to the OS page
size. There's also a comment about it in buf_chunk_init():

	/* Align a pointer to the first frame.  Note that when
	os_large_page_size is smaller than UNIV_PAGE_SIZE,
	we may allocate one fewer block than requested.  When
	it is bigger, we may allocate more blocks than requested. */

The thing is, it does not only applies to large pages, because regular
pages can vary in size on some architectures. I see crashes that look
very similar to those reported in bug #74775 with a non-default OS page
size of 64K, and I have tracked them down to that incorrect assumption
in buf_block_align(). We indeed allocate more blocks than requested in
buf_chunk_init(), so srv_buf_pool_chunk_unit is not the total size of
pages in each chunk, and thus buf_block_align() fails when used on a
pointer belonging to one of those "extra" pages in a chunk.

I'm not sure if this is possible to repeat on x86. Perhaps with
--use-large-pages or transparent huge pages, but it may be tricky.

How to repeat:
Code analysis.

Suggested fix:
In buf_block_align() do not assume that srv_buf_pool_chunk_unit is the
total size of all pages in a chunk. The real size (and thus the number
of pages) may differ from the configured value due to rounding up (or
down) to the OS page size. Instead store the real size during buffer
pool initialization and use that value.
[23 Nov 2015 5:31] Stewart Smith
64k page size is default on POWER systems.
[31 Dec 2015 20:57] OCA Admin
Contribution submitted via Github - Bug #79378: buf_block_align() makes incorrect assumptions about chunk size 
(*) Contribution by Alexey Kopytov (Github akopytov, mysql-server/pull/44#issuecomment-168211556): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_54827445.txt (text/plain), 1.79 KiB.

[12 Apr 2016 1:13] Daniel Black
Reproduced on Power, version 5.7.12 in test case innodb.table_encrypt_kill

Thread 1 (Thread 0x3fff6c2ff170 (LWP 30409)):
#0  0x00003fff95802dbc in __pthread_kill (threadid=<optimised out>, signo=<optimised out>) at ../sysdeps/unix/sysv/linux/pthread_kill.c:58
#1  0x0000000010bbcd80 in my_write_core (sig=<optimised out>) at /home/danielgb/mysql-5.7-POWER_FIXES/mysys/stacktrace.c:247
#2  0x00000000103775b4 in handle_fatal_signal (sig=<optimised out>) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/signal_handler.cc:220
#3  <signal handler called>
#4  0x00003fff94f2f2dc in __GI_raise (sig=<optimised out>) at ../sysdeps/unix/sysv/linux/raise.c:55
#5  0x00003fff94f31db4 in __GI_abort () at abort.c:89
#6  0x000000001033e8d0 in ut_dbg_assertion_failed (expr=0x112ac490 "it != chunk_map->end()", file=0x112ac448 "/home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/buf/buf0buf.cc", line=3863) at /home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/ut/ut0dbg.cc:67
#7  0x0000000010e14610 in buf_block_from_ahi (ptr=<optimised out>) at /home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/buf/buf0buf.cc:3863
#8  0x0000000010e028ec in btr_search_guess_on_hash (index=0x3fff480220d8, info=0x3fff48023aa8, tuple=0x3fff280b4858, mode=4, latch_mode=2, cursor=0x3fff6c2fa9e0, has_search_latch=0, mtr=0x3fff6c2f9ed0) at /home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/btr/btr0sea.cc:1023
#9  0x0000000010df4bf4 in btr_cur_search_to_nth_level (index=0x3fff480220d8, level=0, tuple=0x3fff280b4858, mode=<optimised out>, latch_mode=2, cursor=0x3fff6c2fa9e0, has_search_latch=0, file=0x11298cf8 "/home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/row/row0ins.cc", line=2344, mtr=0x3fff6c2f9ed0) at /home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/btr/btr0cur.cc:926
#10 0x0000000010cf528c in btr_pcur_open_low (level=0, file=0x11298cf8 "/home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/row/row0ins.cc", mtr=0x3fff6c2f9ed0, line=2344, cursor=0x3fff6c2fa9e0, latch_mode=2, mode=PAGE_CUR_LE, tuple=0x3fff280b4858, index=0x3fff480220d8) at /home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/include/btr0pcur.ic:465
#11 row_ins_clust_index_entry_low (flags=0, mode=2, index=0x3fff480220d8, n_uniq=1, entry=0x3fff280b4858, n_ext=0, thr=0x3fff280b3460, dup_chk_only=false) at /home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/row/row0ins.cc:2344
#12 0x0000000010cf9ff4 in row_ins_clust_index_entry (index=0x3fff480220d8, entry=0x3fff280b4858, thr=0x3fff280b3460, n_ext=0, dup_chk_only=false) at /home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/row/row0ins.cc:3162
#13 0x0000000010cfa98c in row_ins_index_entry (thr=<optimised out>, entry=<optimised out>, index=<optimised out>) at /home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/row/row0ins.cc:3292
#14 row_ins_index_entry_step (thr=<optimised out>, node=0x3fff280b3088) at /home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/row/row0ins.cc:3442
#15 row_ins (thr=0x3fff280b3460, node=0x3fff280b3088) at /home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/row/row0ins.cc:3584
#16 row_ins_step (thr=0x3fff280b3460) at /home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/row/row0ins.cc:3769
#17 0x0000000010d0d800 in row_insert_for_mysql_using_ins_graph (mysql_rec=<optimised out>, prebuilt=<optimised out>) at /home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/row/row0mysql.cc:1732
#18 0x0000000010c057f4 in ha_innobase::write_row (this=0x3fff280aeb30, record=0x3fff280aef40 "\360\361\t") at /home/danielgb/mysql-5.7-POWER_FIXES/storage/innobase/handler/ha_innodb.cc:7484
#19 0x00000000103f0d68 in handler::ha_write_row (this=0x3fff280aeb30, buf=0x3fff280aef40 "\360\361\t") at /home/danielgb/mysql-5.7-POWER_FIXES/sql/handler.cc:7844
#20 0x0000000010acfbe4 in write_record (thd=0x3fff28000b30, table=0x3fff280ae190, info=0x3fff6c2fb4f0, update=0x3fff6c2fb470) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/sql_insert.cc:1860
#21 0x0000000010ad1448 in Sql_cmd_insert::mysql_insert (this=0x3fff2802d4e0, thd=0x3fff28000b30, table_list=0x3fff2802c438) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/sql_insert.cc:780
#22 0x0000000010ad1b8c in Sql_cmd_insert::execute (this=0x3fff2802d4e0, thd=0x3fff28000b30) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/sql_insert.cc:3092
#23 0x000000001090dea0 in mysql_execute_command (thd=0x3fff28000b30, first_level=<optimised out>) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/sql_parse.cc:3520
#24 0x0000000010878094 in sp_instr_stmt::exec_core (this=0x3fff2802d618, thd=0x3fff28000b30, nextp=<optimised out>) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/sp_instr.cc:948
#25 0x000000001087ad84 in sp_lex_instr::reset_lex_and_exec_core (this=0x3fff2802d618, thd=0x3fff28000b30, nextp=0x3fff6c2fccd8, open_tables=<optimised out>) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/sp_instr.cc:411
#26 0x000000001087b7e4 in sp_lex_instr::validate_lex_and_execute_core (this=0x3fff2802d618, thd=0x3fff28000b30, nextp=0x3fff6c2fccd8, open_tables=<optimised out>) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/sp_instr.cc:676
#27 0x000000001087c914 in sp_instr_stmt::execute (this=0x3fff2802d618, thd=0x3fff28000b30, nextp=0x3fff6c2fccd8) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/sp_instr.cc:859
#28 0x0000000010872e5c in sp_head::execute (this=0x3fff2801d7a0, thd=0x3fff28000b30, merge_da_on_success=<optimised out>) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/sp_head.cc:789
#29 0x0000000010876d40 in sp_head::execute_procedure (this=0x3fff2801d7a0, thd=0x3fff28000b30, args=0x3fff28003030) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/sp_head.cc:1522
#30 0x000000001090e640 in mysql_execute_command (thd=0x3fff28000b30, first_level=<optimised out>) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/sql_parse.cc:4506
#31 0x000000001091381c in mysql_parse (thd=0x3fff28000b30, parser_state=<optimised out>) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/sql_parse.cc:5519
#32 0x000000001091427c in dispatch_command (thd=0x3fff28000b30, com_data=<optimised out>, command=COM_QUERY) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/sql_parse.cc:1429
#33 0x0000000010915e24 in do_command (thd=0x3fff28000b30) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/sql_parse.cc:997
#34 0x00000000109fa350 in handle_connection (arg=<optimised out>) at /home/danielgb/mysql-5.7-POWER_FIXES/sql/conn_handler/connection_handler_per_thread.cc:301
#35 0x0000000010f69398 in pfs_spawn_thread (arg=0x10038605490) at /home/danielgb/mysql-5.7-POWER_FIXES/storage/perfschema/pfs.cc:2188
#36 0x00003fff957f82cc in start_thread (arg=0x3fff6c2ff170) at pthread_create.c:336
#37 0x00003fff95023f04 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96

the code changes due to https://github.com/mysql/mysql-server/commit/19849a47fc28c93a73317bb4f0454a1a1b55f420 means a new patch is required.
[17 May 2016 13:36] OCA Admin
Contribution submitted via Github - Bug #79378: buf_block_align() makes incorrect assumptions about chunk size 
(*) Contribution by Alexey Kopytov (Github akopytov, mysql-server/pull/73#issuecomment-219698120): I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_70239416.txt (text/plain), 1.58 KiB.

[17 May 2016 22:53] Daniel Black
I've tested Alexey's latest patch. It corrects a segfault in the innodb.table_encrypt_kill test that I was able to produce reliably on Power8.
[20 Jul 2016 12:20] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 5.7.15 release, and here's the changelog entry:

In some cases, code that locates a buffer pool chunk corresponding to
given pointer returned the wrong chunk. 

Thanks to Alexey Kopytov for the patch.