Bug #97950 | buf_read_page_handle_error can trigger assert failure | ||
---|---|---|---|
Submitted: | 10 Dec 2019 15:51 | Modified: | 11 Feb 2020 13:18 |
Reporter: | Shu Lin | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 8.0.18 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[10 Dec 2019 15:51]
Shu Lin
[11 Dec 2019 14:06]
MySQL Verification Team
Hello Mr. Lin, Thank you for your bug report. However, what you report is not quite clear. In general, what we require is a repeatable test case in the form of the set of SQL statements, or several concurrent sets of SQL statements, which when executed lead to the assert you report. We can make an exception and accept a detailed code analysis which will show how InnoDB page gets corrupted all by itself, without any hardware or OS problems. Thanks in advance.
[11 Dec 2019 14:52]
Shu Lin
Page corruption can happen due to external reasons, for example, hardware failure or operating system bug. This is clearly stated in the messages to the users: bool buf_page_io_complete(buf_page_t *bpage, bool evict) { ... 5209 5210 ib::info(ER_IB_MSG_82) << "It is also possible that your" 5211 " operating system has corrupted" 5212 " its own file cache and rebooting" 5213 " your computer removes the error." 5214 " If the corrupt page is an index page." 5215 " You can also try to fix the" 5216 " corruption by dumping, dropping," 5217 " and reimporting the corrupt table." 5218 " You can use CHECK TABLE to scan" 5219 " your table for corruption. " 5220 << FORCE_RECOVERY_MSG; 5221 } As an reliable database management software, InnoDB should guard against page corruption, no matter what the source is. There is clearly a logic flaw in this error-handling code path, which can either cause software crash or let the page corruption spread unnoticed. Please address.
[11 Dec 2019 14:57]
MySQL Verification Team
Hi, InnoDB can not prevent corruption due to the hardware errors, due to OS problems, due to running out of resources and other similar factors. All that it can do is assert, which it does. In that way, corrupted pages are not saved to disk. That is fully documented in our Reference Manual.
[11 Dec 2019 15:44]
Shu Lin
I know InnoDB can't prevent corruption, but it should prevent corruption from spreading. The intenion of buf_read_page_handle_error() is clearly trying to let the software continue (i.e not crashing) even if corruption is detected. But it doesn't do so correctly. As it stands right now, it can spread page corruption unnoticed. Please see my analysis of the race condition. If the intention is to crash as soon as corruption is detected, then please fix the code to do that.
[12 Dec 2019 14:43]
MySQL Verification Team
Hi, InnoDB already has TOO many checks for corruption. Adding more would not guarantee earlier catching of the corruption, nor would it lead to any improvements. It would only lead to significantly decreased performance. These decisions were made during all these years and will not be changed.
[12 Dec 2019 22:41]
Shu Lin
I am not asking you to write code to detect more page corruptions. I am asking you to handle the ALREADY-DETECTED corruption correctly! The current function that handles ALREADY-DETECTED corruption, namely buf_read_page_handle_error(), has logic flaws, as I've illustrated.
[13 Dec 2019 13:07]
MySQL Verification Team
Hi, The function that you mention does exactly what it is designed to do, namely discover corruption while it is still in memory. Hence, it unfixes the page, unlatches the page, removes it from page_hash and removes it from LRU. That way corruption is stopped without assert. Not a bug.
[13 Dec 2019 14:34]
Shu Lin
Please keep in mind that InnoDB is a multi-threaded software. What you descried, is absolutely correct in a single-thread world. Now consider what will happen if another thread is accessing the same page at the same time? As I've described, the way it is coded right now, there are two race conditions that can happen (1) the assert can cause InnoDB to crash (2) the corruption can spread to other pages. When a software detect a corruption, the most simple way to react is crashing immediately. The more sophisticated way is try to get the software to tolerate the corruption and continue running, but it has to do so safely, that is, WITHOUT SPREADING CORRUPTION. buf_read_page_handle_error() is obviously trying to be sophisticated. But it has logic flaws. Please review my code analysis. Because this is an error-handling plus multi-thread race condition problem, it is difficult, if not impossible, to design a reproducible test case. Code analysis and manipulating threads (e.g. pause one thread, and let the other thread run, etc) in debugger is required to understand the race conditions.
[13 Dec 2019 14:43]
MySQL Verification Team
Because our server is multi-threaded, we have all those mutex locks and other types of locks, that are taken in that function. That prevents simultaneous running of more then one thread over that code. If you think that you can prove that multiple access of N threads is possible within the protected code, please send us a repeatable test case that will prove so. A repeatable test case is a set of SQL statements that can be run in one or N threads that would lead to several threads running within the protected code.
[13 Dec 2019 14:50]
Shu Lin
It is possible to run into the race conditions, if you look at code flow carefully, and you can also reproduce that by pausing one thread and let the other thread run in debugger, as I've described in the bug report. If you refuse to do that, and insist on a repeatable test case, then we can end this discussion.
[16 Dec 2019 13:48]
MySQL Verification Team
I have tried it and it did not work ......
[16 Dec 2019 13:48]
MySQL Verification Team
This bug is closed now.
[11 Feb 2020 13:18]
MySQL Verification Team
Hi Mr. Lin, We are reporting back to you. We have spent more time investigating your claims and seems that there is a potential race. I am verifying this bug, so that it could be tested further. Thank you for your contribution.
[22 Oct 2020 13:43]
MySQL Verification Team
see https://bugs.mysql.com/bug.php?id=101271
[22 Oct 2020 14:16]
MySQL Verification Team
This bug is the original bug for the following one: https://bugs.mysql.com/bug.php?id=101271