Bug #97950 buf_read_page_handle_error can trigger assert failure
Submitted: 10 Dec 2019 15:51 Modified: 11 Feb 13:18
Reporter: Shu Lin Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.0.18 OS:Any
Assigned to: CPU Architecture:Any

[10 Dec 2019 15:51] Shu Lin
This is the commit of the code in which I hit the problem:

commit 91a17cedb1ee880fe7915fb14cfd74c04e8d6588
Author: mysql-builder@oracle.com <>
Date:   Fri Sep 20 10:09:56 2019 +0200

The assert ut_a(bpage->buf_fix_count == 0) fails:

1983 static bool buf_LRU_block_remove_hashed(buf_page_t *bpage, bool zip,
1984                                         bool ignore_content) {
1985   const buf_page_t *hashed_bpage;
1986   buf_pool_t *buf_pool = buf_pool_from_bpage(bpage);
1987   rw_lock_t *hash_lock;
1989   ut_ad(mutex_own(&buf_pool->LRU_list_mutex));
1990   ut_ad(mutex_own(buf_page_get_mutex(bpage)));
1992   hash_lock = buf_page_hash_lock_get(buf_pool, bpage->id);
1994   ut_ad(rw_lock_own(hash_lock, RW_LOCK_X));
1996   ut_a(buf_page_get_io_fix(bpage) == BUF_IO_NONE);
1997   ut_a(bpage->buf_fix_count == 0);

The code path is the following:


The assert fails because there is another thread wants to read the same page and put a fix_count on the page

3453 dberr_t Buf_fetch_normal::get(buf_block_t *&block) {
3454   /* Keep this path as simple as possible. */
3455   for (;;) {
3456     /* Lookup the page in the page hash. If it doesn't exist in the
3457     buffer pool then try and read it in from disk. */
3459     block = lookup();
3461     if (block != nullptr) {
3462       buf_block_fix(block);
3464       /* Now safe to release page_hash S lock. */
3465       rw_lock_s_unlock(m_hash_lock);
3466       break;
3467     }
3469     /* Page not in buf_pool: needs to be read from file */
3470     read_page();
3471   }

I believe this is the same problem reported in https://bugs.mysql.com/bug.php?id=90587, but #90587 is no longer active. I can't re-open it since I am not the original reporter. Hence I am opening a new bug report

How to repeat:
Short of encouting an actual page corruption, you will need to use debugger to force the code into buf_page_io_complete->buf_read_page_handle_error

When thread A finishes reading a page and enters buf_page_io_complete(), use debugger to force "is_corrupted" to true. Suspend thread A for now

5182     /* From version 3.23.38 up we store the page checksum
5183     to the 4 first bytes of the page end lsn field */
5184     bool is_corrupted;
5185     {
5186       BlockReporter reporter =
5187           BlockReporter(true, frame, bpage->size,
5188                         fsp_is_checksum_disabled(bpage->id.space()));
5189       is_corrupted = reporter.is_corrupted();
5190     }
5192     if (compressed_page || is_corrupted) {

Let thread B read the same page, and suspend thread B after it adds fix_count in Buf_fetch_normal::get()

Let thread A continue, it will hit the assert failure.
[11 Dec 2019 14:06] Sinisa Milivojevic
Hello Mr. Lin,

Thank you for your bug report.

However, what you report is not quite clear.

In general, what we require is a repeatable test case in the form of the set of SQL statements, or several concurrent sets of SQL statements, which when executed lead to the assert you report.

We can make an exception and accept a detailed code analysis which will show how InnoDB page gets corrupted all by itself, without any hardware or OS problems.

Thanks in advance.
[11 Dec 2019 14:52] Shu Lin
Page corruption can happen due to external reasons, for example, hardware failure or operating system bug.  This is clearly stated in the messages to the users:

bool buf_page_io_complete(buf_page_t *bpage, bool evict) {
5210         ib::info(ER_IB_MSG_82) << "It is also possible that your"
5211                                   " operating system has corrupted"
5212                                   " its own file cache and rebooting"
5213                                   " your computer removes the error."
5214                                   " If the corrupt page is an index page."
5215                                   " You can also try to fix the"
5216                                   " corruption by dumping, dropping,"
5217                                   " and reimporting the corrupt table."
5218                                   " You can use CHECK TABLE to scan"
5219                                   " your table for corruption. "
5220                                << FORCE_RECOVERY_MSG;
5221       }

As an reliable database management software, InnoDB should guard against page corruption, no matter what the source is. There is clearly a logic flaw in this error-handling code path, which can either cause software crash or let the page corruption spread unnoticed. Please address.
[11 Dec 2019 14:57] Sinisa Milivojevic

InnoDB can not prevent corruption due to the hardware errors, due to OS problems, due to running out of resources and other similar factors. All that it can do is assert, which it does. In that way, corrupted pages are not saved to disk.

That is fully documented in our Reference Manual.
[11 Dec 2019 15:44] Shu Lin
I know InnoDB can't prevent corruption, but it should prevent corruption from spreading. 

The intenion of buf_read_page_handle_error() is clearly trying to let the software continue (i.e not crashing) even if corruption is detected. But it doesn't do so correctly. As it stands right now, it can spread page corruption unnoticed. Please see my analysis of the race condition.

If the intention is to crash as soon as corruption is detected, then please fix the code to do that.
[12 Dec 2019 14:43] Sinisa Milivojevic

InnoDB already has TOO many checks for corruption. Adding more would not guarantee earlier catching of the corruption, nor would it lead to any improvements. It would only lead to significantly decreased performance.

These decisions were made during all these years and will not be changed.
[12 Dec 2019 22:41] Shu Lin
I am not asking you to write code to detect more page corruptions. 

I am asking you to handle the ALREADY-DETECTED corruption correctly! The current function that handles ALREADY-DETECTED corruption, namely buf_read_page_handle_error(), has logic flaws, as I've illustrated.
[13 Dec 2019 13:07] Sinisa Milivojevic

The function that you mention does exactly what it is designed to do, namely discover corruption while it is still in memory.

Hence, it unfixes the page, unlatches the page, removes it from page_hash and removes it from LRU.

That way corruption is stopped without assert.

Not a bug.
[13 Dec 2019 14:34] Shu Lin
Please keep in mind that InnoDB is a multi-threaded software. What you descried, is absolutely correct in a single-thread world. Now consider what will happen if another thread is accessing the same page at the same time? As I've described, the way it is coded right now, there are two race conditions that can happen

(1) the assert can cause InnoDB to crash

(2) the corruption can spread to other pages.

When a software detect a corruption, the most simple way to react is crashing immediately. The more sophisticated way is try to get the software to tolerate the corruption and continue running, but it has to do so safely, that is, WITHOUT SPREADING CORRUPTION.

buf_read_page_handle_error() is obviously trying to be sophisticated. But it has logic flaws. Please review my code analysis. 

Because this is an error-handling plus multi-thread race condition problem, it is difficult, if not impossible, to design a reproducible test case. Code analysis and manipulating threads (e.g. pause one thread, and let the other thread run, etc) in debugger is required to understand the race conditions.
[13 Dec 2019 14:43] Sinisa Milivojevic
Because our server is multi-threaded, we have all those mutex locks and other types of locks, that are taken in that function. That prevents simultaneous running of more then one thread over that code.

If you think that you can prove that multiple access of N threads is possible within the protected code, please send us a repeatable test case that will prove so. A repeatable test case is a set of SQL statements that can be run in one or N threads that would lead to several threads running within the protected code.
[13 Dec 2019 14:50] Shu Lin
It is possible to run into the race conditions, if you look at code flow carefully, and you can also reproduce that by pausing one thread and let the other thread run in debugger, as I've described in the bug report.

If you refuse to do that, and insist on a repeatable test case, then we can end this discussion.
[16 Dec 2019 13:48] Sinisa Milivojevic
I have tried it and it did not work ......
[16 Dec 2019 13:48] Sinisa Milivojevic
This bug is closed now.
[11 Feb 13:18] Sinisa Milivojevic
Hi Mr. Lin,

We are reporting back to you.

We have spent more time investigating your claims and seems that there is a potential race. 

I am verifying this bug, so that it could be tested further.

Thank you for your contribution.