Bug #100630 buf_pool_is_obsolete is not thread safe
Submitted: 25 Aug 2020 3:06 Modified: 25 Aug 2020 6:25
Reporter: Baolin Huang Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0.18, 5.7 OS:Any
Assigned to: CPU Architecture:Any
Tags: buffer pool resize, restore_position

[25 Aug 2020 3:06] Baolin Huang
Description:
In function buf_pool_is_obsolete, it uses buf_pool_withdrawing/buf_withdraw_clock to indicate the buffer pool is resizing or resized before.

In btr_pcur_t::restore_position, buf_pool_is_obsolete is called in this way. 
```
    if (!buf_pool_is_obsolete(m_withdraw_clock) &&
        btr_cur_optimistic_latch_leaves(m_block_when_stored, m_modify_clock,
                                        &latch_mode, &m_btr_cur, file, line,
                                        mtr))
``` 

If buffer pool resize is shrinked just between buf_pool_is_obsolete and btr_cur_optimistic_latch_leaves. The invalid block pointer would be accessed,  then cause cause bad result.

How to repeat:
Acturally the probability of this happening is small. 

I add some debug code at the begin of btr_cur_optimistic_latch_leave.

```
bool btr_cur_optimistic_latch_leaves(buf_block_t *block,
                                      const char *file, ulint line, mtr_t *mtr) {
   ulint mode;
   page_no_t left_page_no;
-
+  DBUG_EXECUTE_IF("between_obsolete_optimistic", os_thread_sleep(60 * 1000000););
   switch (*latch_mode) {
     case BTR_SEARCH_LEAF:
``` 
Use two connections:
1. In first connection do a select and goes into the sleep,  
2. The second thread change buffer pool size from 2g to 128M. 
3. After first connection completed the sleep, then server crashed. Here is the stack

```
#0   pthread_kill () from /lib64/libpthread.so.0
#1   my_write_core (sig=11) at mysys/stacktrace.cc:305
#2   handle_fatal_signal (sig=11) at sql/signal_handler.cc:169
#4   PolicyMutex<TTASEventMutex<BlockMutexPolicy> >::pfs_begin_lock (this=0x7f347451ad60, state=0x7f34043e7dd0,    name=0x7352aa8 "storage/innobase/buf/buf0buf.cc", line=4255) at storage/innobase/include/ib0mutex.h:861
#5   PolicyMutex<TTASEventMutex<BlockMutexPolicy> >::enter (this=0x7f347451ad60, n_spins=100, n_delay=30,    name=0x7352aa8 "storage/innobase/buf/buf0buf.cc", line=4255) at storage/innobase/include/ib0mutex.h:761
#6   buf_page_optimistic_get (rw_latch=1, block=0x7f347451aa80, modify_clock=0, fetch_mode=NORMAL,    file=0x72dc2d0 "storage/innobase/row/row0sel.cc", line=3458, mtr=0x7f34043e87d0)    at storage/innobase/buf/buf0buf.cc:4255
#7   btr_cur_optimistic_latch_leaves (block=0x7f347451aa80, modify_clock=0, latch_mode=0x7f34043e7fd0, cursor=0x7f32cc02e6e0,    file=0x72dc2d0 "storage/innobase/row/row0sel.cc", line=3458, mtr=0x7f34043e87d0)    at storage/innobase/btr/btr0cur.cc:344
#8   btr_pcur_t::restore_position (this=0x7f32cc02e6e0, latch_mode=1, mtr=0x7f34043e87d0,    file=0x72dc2d0 "storage/innobase/row/row0sel.cc", line=3458) at storage/innobase/btr/btr0pcur.cc:174
#9   sel_restore_position_for_mysql (same_user_rec=0x7f34043e8cc8, latch_mode=1, pcur=0x7f32cc02e6e0, moves_up=1, mtr=0x7f34043e87d0)    at storage/innobase/row/row0sel.cc:3458
```

I will try to add a test case later.

Suggested fix:
Only using buf_pool_withdrawing/buf_withdraw_clock flags is unsafe, may be a lock is needed to protect.
[25 Aug 2020 6:20] MySQL Verification Team
Hello Baolin Huang,

Thank you for the report and feedback.

regards,
Umesh
[25 Aug 2020 6:25] MySQL Verification Team
This turned out to be duplicate of internally reported defect Bug#31036301 which is fixed as of the upcoming 5.7.32, 8.0.22 release. More details will be available once change log is published at
https://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-32.html
https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-22.html