MySQL Bugs: #96236: Big intrinsic tmp InnoDB table make InnoDB engine get totally stuck

Bug #96236	Big intrinsic tmp InnoDB table make InnoDB engine get totally stuck
Submitted:	17 Jul 2019 14:56	Modified:	18 Jul 2019 5:18
Reporter:	Fungo Wang (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S1 (Critical)
Version:	5.7.25, 8.0.16	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	buf_fix_count, Contribution, innodb, intrinsic

Description:
A single query, which generate big intrinsic temp InnoDB table, will make InnoDB engine be totally stuck. 

Even queries can not be executed.

How to repeat:
Configuration:
innodb_buffer_pool_size = 128M
internal_tmp_disk_storage_engine = InnoDB

Note: BP size can be any large value, we use small BP size just to quickly reproduce.

## Prepare 2 x 1M sysbench tables

./src/sysbench oltp_read_write --tables=2  --table_size=1000000  prepare

## update sysbench table, change pad of every rec to a same value

update sbtest1 set pad = '22195207048-70116052123-74140395089-76317954521-98694025897';
update sbtest2 set pad = '22195207048-70116052123-74140395089-76317954521-98694025897';

## create a view that "union all" the 2 sysben tables
create view view_sb_2 as select * from sbtest1 union all select * from sbtest2;

## Execute query 
## This query will create an intrinsic InnoDB table, with an index on pad

select * from view_sb_2 where pad = '22195207048-70116052123-74140395089-76317954521-98694025897' order by id desc limit 0, 10;

InnoDB will get stuck after running the query for a while, you can not do anything, even query another InnoDB table.

And in the error log, you'll see some error msg like this:

2019-07-17T22:42:19.068364+08:00 4 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (2892 search iterations)! 2892 failed attempts to flush a page! Consider increasing the buffer pool size. It is also possible that in your Unix version fsync is very slow, or completely frozen inside the OS kernel. Then upgrading to a newer version of your operating system may help. Look at the number of fsyncs in diagnostic info below. Pending flushes (fsync) log: 0; buffer pool: 0. 39220 OS file reads, 103258 OS file writes, 7 OS fsyncs. Starting InnoDB Monitor to print further diagnostics to the standard output.

Suggested fix:
This reason is that all the pages in buffer pool are from intrinsic temp table (ibtmp1) and all the pages are pinned (buf_fix_count > 0) in buffer pool (this is the culprit part), and the query need to load more temp page from disk, but there are no replaceable pages. The query stuck itself.

Because all the bpages are pinned in buffer pool, not only the query itself is stuck, no other page can be loaded into buffer pool.

The stack of the query is this:
1 nanosleep,os_thread_sleep(os0thread.cc:279),buf_LRU_get_free_block(buf0lru.cc:1409),buf_page_init_for_read(buf0buf.cc:5148),buf_read_page_low(buf0rea.cc:149),buf_read_page(buf0rea.cc:149),buf_page_get_gen(buf0buf.cc:4199),btr_block_get_func(btr0btr.ic:63),btr_pcur_move_to_next_page(btr0btr.ic:63),btr_pcur_move_to_next(btr0pcur.ic:360),row_search_traverse(btr0pcur.ic:360),row_search_no_mvcc(btr0pcur.ic:360),ha_innobase::general_fetch(ha_innodb.cc:9024),handler::ha_index_next_same(handler.cc:3255),handler::multi_range_read_next(handler.cc:6418),QUICK_RANGE_SELECT::get_next(opt_range.cc:11233),find_all_keys(filesort.cc:970),filesort(filesort.cc:970),create_sort_index(sql_executor.cc:3672),QEP_TAB::sort_table(sql_executor.cc:3672),join_init_read_record(sql_executor.cc:2472),sub_select(sql_executor.cc:1277),do_select(sql_executor.cc:950),JOIN::exec(sql_executor.cc:950),handle_query(sql_select.cc:184),execute_sqlcom_select(sql_parse.cc:5413),mysql_execute_command(sql_parse.cc:2972),mysql_parse(sql_parse.cc:5839),dispatch_command(sql_parse.cc:1628),do_command(sql_parse.cc:1087),handle_connection(connection_handler_per_thread.cc:307),pfs_spawn_thread(pfs.cc:2190),start_thread,clone

using gdb to show some info from buf_pool_ptr:

>>> p buf_pool_ptr[0].free
$8 = {
  count = 0,
  start = 0x0,
  end = 0x0,
  node = &buf_page_t::list
}
>>> p buf_pool_ptr[0].LRU
$9 = {
  count = 8189,
  start = 0x7f2f00df1920,
  end = 0x7f2f00cad6c8,
  node = &buf_page_t::LRU
}
>>> p buf_pool_ptr[0].flush_list
$10 = {
  count = 4,
  start = 0x7f2f00cad3b8,
  end = 0x7f2f00b39548,
  node = &buf_page_t::list
}
>>> p buf_pool_ptr[0]->LRU.end.buf_fix_count
$11 = 1
>>> p buf_pool_ptr[0]->LRU.end.LRU.prev.buf_fix_count
$12 = 1
>>> p buf_pool_ptr[0]->LRU.end.LRU.prev.LRU.prev.buf_fix_count
$13 = 13611
>>> p buf_pool_ptr[0]->LRU.start.buf_fix_count
$14 = 26
>>> p buf_pool_ptr[0]->LRU.start.LRU.next.buf_fix_count
$15 = 26

>>> p buf_pool_ptr[0]->LRU.end.id
$16 = {
  m_space = 7719,
  m_page_no = 10804,
  m_fold = 3799009371
}
>>> p buf_pool_ptr[0]->LRU.start.id
$17 = {
  m_space = 7719,
  m_page_no = 10889,
  m_fold = 3799009456
}
>>

>>> p srv_tmp_space.m_space_id
$18 = 7719

We can see that there is no free page, and it's very likely that all the pages in LRU are intrinsic temp pages and the buf_fix_count is not 0.

The buf_fix_count is decreased on mtr_commit(), and normally the duration of a mtr is very short.
But for intrinsic temp table, this is a totally different story, the same mtr(index->last_sel_cur->mtr) can last until statement/transaction end:

```
803/** Cache position of last inserted or selected record by caching record
804and holding reference to the block where record resides.
805Note: We don't commit mtr and hold it beyond a transaction lifetime as this is
806a special case (intrinsic table) that are not shared accross connection. */
807class last_ops_cur_t
808{
809public:
810        /** Constructor */
811        last_ops_cur_t()
```

Using gdb to show detail of mtr:
>>> p mtr->m_impl->m_memo->m_size
$25 = 10239472
>>> p mtr->m_impl->m_memo->m_size/(sizeof(mtr_memo_slot_t))
$26 = 639967

the number of slot is much bigger then total bpages on LRU list (8189), which means many pages are pinned many times.

IMHO, this is a serious design bug of intrinsic temp table, the mtr should not last for transaction duration.

Add 8.0.16 to version set, it's also affected.

For reproducing with 8.0.16, add these configurations, to make sure InnoDB intrinsic table is used:

temptable_max_ram=2097152
temptable_use_mmap=OFF

Hello Fungo Wang,

Thank you for the report.
Verified as described with 8.0.16 build.

regards,
Umesh

Test results - 8.0.16

Attachment: 96236_8.0.16.results (application/octet-stream, text), 18.09 KiB.

Hello!

I would like to contribute patch that addresses this problem.

This fix solves the problem by ensuring that we commit mini-transaction
which is used for access to initrinsic temporary table after performing
operations on each 100 rows. This unfixes buffer pages touched by
mini-transaction making them available for further use by other or this
statements.
    
This patch might lead to 1-3% performance reduction in some of scenarios
which heavily use intrinsic temporary tables. OTOH we also have seen
significant performance improvements in other scenarios where big
intrinsic tables were used. So we consider this acceptable.

A better fix for this problem might be possible, but it is likely
to be more complex, and IMO requires someone more proficient with
this code.
    opinion.

Simple fix for bug 96236 which commits mtr per each 100 rows processed. (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: ps-8772-bug96236-contrib.patch (text/x-patch), 16.55 KiB.

The contributed patch is on top source code for 8.0.34 version of MySQL server.

Hello Dmitry,

Thank you for the contribution.

regards,
Umesh