Description:
A single query, which generate big intrinsic temp InnoDB table, will make InnoDB engine be totally stuck.
Even queries can not be executed.
How to repeat:
Configuration:
innodb_buffer_pool_size = 128M
internal_tmp_disk_storage_engine = InnoDB
Note: BP size can be any large value, we use small BP size just to quickly reproduce.
## Prepare 2 x 1M sysbench tables
./src/sysbench oltp_read_write --tables=2 --table_size=1000000 prepare
## update sysbench table, change pad of every rec to a same value
update sbtest1 set pad = '22195207048-70116052123-74140395089-76317954521-98694025897';
update sbtest2 set pad = '22195207048-70116052123-74140395089-76317954521-98694025897';
## create a view that "union all" the 2 sysben tables
create view view_sb_2 as select * from sbtest1 union all select * from sbtest2;
## Execute query
## This query will create an intrinsic InnoDB table, with an index on pad
select * from view_sb_2 where pad = '22195207048-70116052123-74140395089-76317954521-98694025897' order by id desc limit 0, 10;
InnoDB will get stuck after running the query for a while, you can not do anything, even query another InnoDB table.
And in the error log, you'll see some error msg like this:
2019-07-17T22:42:19.068364+08:00 4 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (2892 search iterations)! 2892 failed attempts to flush a page! Consider increasing the buffer pool size. It is also possible that in your Unix version fsync is very slow, or completely frozen inside the OS kernel. Then upgrading to a newer version of your operating system may help. Look at the number of fsyncs in diagnostic info below. Pending flushes (fsync) log: 0; buffer pool: 0. 39220 OS file reads, 103258 OS file writes, 7 OS fsyncs. Starting InnoDB Monitor to print further diagnostics to the standard output.
Suggested fix:
This reason is that all the pages in buffer pool are from intrinsic temp table (ibtmp1) and all the pages are pinned (buf_fix_count > 0) in buffer pool (this is the culprit part), and the query need to load more temp page from disk, but there are no replaceable pages. The query stuck itself.
Because all the bpages are pinned in buffer pool, not only the query itself is stuck, no other page can be loaded into buffer pool.
The stack of the query is this:
1 nanosleep,os_thread_sleep(os0thread.cc:279),buf_LRU_get_free_block(buf0lru.cc:1409),buf_page_init_for_read(buf0buf.cc:5148),buf_read_page_low(buf0rea.cc:149),buf_read_page(buf0rea.cc:149),buf_page_get_gen(buf0buf.cc:4199),btr_block_get_func(btr0btr.ic:63),btr_pcur_move_to_next_page(btr0btr.ic:63),btr_pcur_move_to_next(btr0pcur.ic:360),row_search_traverse(btr0pcur.ic:360),row_search_no_mvcc(btr0pcur.ic:360),ha_innobase::general_fetch(ha_innodb.cc:9024),handler::ha_index_next_same(handler.cc:3255),handler::multi_range_read_next(handler.cc:6418),QUICK_RANGE_SELECT::get_next(opt_range.cc:11233),find_all_keys(filesort.cc:970),filesort(filesort.cc:970),create_sort_index(sql_executor.cc:3672),QEP_TAB::sort_table(sql_executor.cc:3672),join_init_read_record(sql_executor.cc:2472),sub_select(sql_executor.cc:1277),do_select(sql_executor.cc:950),JOIN::exec(sql_executor.cc:950),handle_query(sql_select.cc:184),execute_sqlcom_select(sql_parse.cc:5413),mysql_execute_command(sql_parse.cc:2972),mysql_parse(sql_parse.cc:5839),dispatch_command(sql_parse.cc:1628),do_command(sql_parse.cc:1087),handle_connection(connection_handler_per_thread.cc:307),pfs_spawn_thread(pfs.cc:2190),start_thread,clone
using gdb to show some info from buf_pool_ptr:
>>> p buf_pool_ptr[0].free
$8 = {
count = 0,
start = 0x0,
end = 0x0,
node = &buf_page_t::list
}
>>> p buf_pool_ptr[0].LRU
$9 = {
count = 8189,
start = 0x7f2f00df1920,
end = 0x7f2f00cad6c8,
node = &buf_page_t::LRU
}
>>> p buf_pool_ptr[0].flush_list
$10 = {
count = 4,
start = 0x7f2f00cad3b8,
end = 0x7f2f00b39548,
node = &buf_page_t::list
}
>>> p buf_pool_ptr[0]->LRU.end.buf_fix_count
$11 = 1
>>> p buf_pool_ptr[0]->LRU.end.LRU.prev.buf_fix_count
$12 = 1
>>> p buf_pool_ptr[0]->LRU.end.LRU.prev.LRU.prev.buf_fix_count
$13 = 13611
>>> p buf_pool_ptr[0]->LRU.start.buf_fix_count
$14 = 26
>>> p buf_pool_ptr[0]->LRU.start.LRU.next.buf_fix_count
$15 = 26
>>> p buf_pool_ptr[0]->LRU.end.id
$16 = {
m_space = 7719,
m_page_no = 10804,
m_fold = 3799009371
}
>>> p buf_pool_ptr[0]->LRU.start.id
$17 = {
m_space = 7719,
m_page_no = 10889,
m_fold = 3799009456
}
>>
>>> p srv_tmp_space.m_space_id
$18 = 7719
We can see that there is no free page, and it's very likely that all the pages in LRU are intrinsic temp pages and the buf_fix_count is not 0.
The buf_fix_count is decreased on mtr_commit(), and normally the duration of a mtr is very short.
But for intrinsic temp table, this is a totally different story, the same mtr(index->last_sel_cur->mtr) can last until statement/transaction end:
```
803/** Cache position of last inserted or selected record by caching record
804and holding reference to the block where record resides.
805Note: We don't commit mtr and hold it beyond a transaction lifetime as this is
806a special case (intrinsic table) that are not shared accross connection. */
807class last_ops_cur_t
808{
809public:
810 /** Constructor */
811 last_ops_cur_t()
```
Using gdb to show detail of mtr:
>>> p mtr->m_impl->m_memo->m_size
$25 = 10239472
>>> p mtr->m_impl->m_memo->m_size/(sizeof(mtr_memo_slot_t))
$26 = 639967
the number of slot is much bigger then total bpages on LRU list (8189), which means many pages are pinned many times.
IMHO, this is a serious design bug of intrinsic temp table, the mtr should not last for transaction duration.