Bug #104787 need wait buf_fix_count =0 in buf_read_page_handle_error
Submitted: 1 Sep 2021 8:57 Modified: 22 Aug 2023 12:48
Reporter: alex xing (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[1 Sep 2021 8:57] alex xing
Description:
If page corruption exists, mysql use buf_read_page_handle_error to free page, but there may be another thread trying to read the page in buf_wait_for_read.
So the waiting thread needs to be notified to abandon the wait and then the page can be free in buf_read_page_handle_error.

#0  0x00007f454ac670c1 in __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:61
#1  0x0000000001e76477 in my_write_core (sig=sig@entry=6) at /mysql-8.0.19/mysys/stacktrace.cc:306
#2  0x000000000112ea5d in handle_fatal_signal (sig=6) at /mysql-8.0.19/sql/signal_handler.cc:169
#3  <signal handler called>
#4  0x00007f45476a2067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#5  0x00007f45476a3448 in __GI_abort () at abort.c:89
#6  0x0000000002156d13 in ut_dbg_assertion_failed (expr=expr@entry=0x2e746be "buf_fix_count == 0", file=file@entry=0x2db8ad8 "/mysql-8.0.19/storage/innobase/buf/buf0buf.cc",
    line=line@entry=5123) at /mysql-8.0.19/storage/innobase/ut/ut0dbg.cc:98
#7  0x00000000021bbdbb in buf_read_page_handle_error (bpage=bpage@entry=0x7f450ca60d40) at /mysql-8.0.19/storage/innobase/buf/buf0buf.cc:5123
#8  0x00000000021c34ad in buf_page_io_complete (bpage=0x7f450ca60d40, evict=evict@entry=false, sync=sync@entry=false) at /mysql-8.0.19/storage/innobase/buf/buf0buf.cc:5292
#9  0x00000000022916a1 in fil_aio_wait (segment=segment@entry=2) at /mysql-8.0.19/storage/innobase/fil/fil0fil.cc:7830
#10 0x00000000020f0b40 in io_handler_thread (segment=2) at /mysql-8.0.19/storage/innobase/srv/srv0start.cc:280
#11 0x00000000020f0e4a in __call<void, 0ul> (__args=<optimized out>, this=<synthetic pointer>) at /gcc-6.1.0-install/include/c++/6.1.0/functional:943
#12 operator()<> (this=<synthetic pointer>) at /gcc-6.1.0-install/include/c++/6.1.0/functional:1002
#13 operator()<void (*)(long unsigned int), long unsigned int> (f=<unknown type in /mysql-8.0.19/bld/runtime_output_directory/mysqld, CU 0x14b23972, DIE 0x14bff4dd>,
    this=0x7f4530449f98) at /mysql-8.0.19/storage/innobase/include/os0thread-create.h:101
#14 _M_invoke<0ul, 1ul> (this=0x7f4530449f88) at /gcc-6.1.0-install/include/c++/6.1.0/functional:1400
#15 operator() (this=0x7f4530449f88) at /gcc-6.1.0-install/include/c++/6.1.0/functional:1389
#16 std::thread::_State_impl<std::_Bind_simple<Runnable (void (*)(unsigned long), unsigned long)> >::_M_run() (this=0x7f4530449f80) at /gcc-6.1.0-install/include/c++/6.1.0/thread:196
#17 0x00007f4547fe7812 in std::execute_native_thread_routine (__p=0x7f4530449f80) at ../../../.././libstdc++-v3/src/c++11/thread.cc:83
#18 0x00007f454ac62064 in start_thread (arg=0x7f449f7fe700) at pthread_create.c:309
#19 0x00007f454775562d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

How to repeat:
just read the code

Suggested fix:
If page corruption exists, mysql use buf_read_page_handle_error to free page, but there may be another thread trying to read the page in buf_wait_for_read.
So the waiting thread needs to be notified to abandon the wait and then the page can be free in buf_read_page_handle_error.

I made a simple patch in 8.0.19.  which I believe works for the latest code as well
[1 Sep 2021 8:58] alex xing
a simple patch based on  8.0.19.  The same may be true for the latest version of the code.

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bug_fix.patch (text/plain), 2.50 KiB.

[1 Sep 2021 9:42] alex xing
new patch

Attachment: bug_fix.patch (text/plain), 2.48 KiB.

[2 Sep 2021 11:51] MySQL Verification Team
Hi Mr. xing,

Thank you for your bug report.

Thank you, also, for your analysis and even more for your patch.

However, we have a small problem with your idea. Since server had to assert immediately, since it has hit upon a page corruption. If it does not assert, your data on the permanent storage could be corrupted. Under these conditions, what does it matter whether another thread is waiting on the page or not ???

Assertion leads to the instant crash, so why all the hussle ?????
[2 Sep 2021 14:33] alex xing
Hi MySQL Verification Team, 
  Thank you for your response.
  Is there a scenario when innodb_force_recovery=0 with only one page corrupted, but the user doesn't want mysql to be cored due to the corrupted page?
  User  just want to get an error message from mysqld when accesses the corrupted page,  and the error message recorded in err-log as well.
  If not, why not just die when find a  corrupted page ,  instead of releasing the Pgae by buf_read_page_handle_error -->buf_LRU_free_one_page
[3 Sep 2021 11:45] MySQL Verification Team
Hi Mr. xing,

No, there is no scenario that is feasible in the case that you have described.

It is impossible just to return the error, because InnoDB is an ACID storage angine and  all buffer pages that are changed have to be flushed to the tablespace(s). That is one of the essential and basic of premises that are defined by the ACID standard and InnoDB SE adheres to it fully.

Not a bug.
[22 Aug 2023 2:07] Kang Wang
buf_LRU_free_one_page from buf_read_page_handle_error is unsafe, since no check is done to make sure there is no other thread hold the buf_fix_count

This may happened on deleted table space:

buf_read_page_low {
  ...
  
  if (*err != DB_SUCCESS) {
    if (IORequest::ignore_missing(type) || *err == DB_TABLESPACE_DELETED) {
      buf_read_page_handle_error(bpage);
      return (0);
    }

    ut_error;
  }
...
}
[22 Aug 2023 2:10] alex xing
hi Kang Wang, I agree with you
[22 Aug 2023 12:35] MySQL Verification Team
Hi Mr. Wang,

We do not see the presence of the buf_LRU_free_one_page() function in your example code.
[22 Aug 2023 12:48] MySQL Verification Team
Hi Mr. xing,

We have analysed your report and your patch in far greater detail.

This is now a verified bug report.

Thank you for your patch.