| Bug #103914 | Thread running log_resume_writer_threads gets blocked | ||
|---|---|---|---|
| Submitted: | 4 Jun 2021 15:55 | Modified: | 6 Jun 2021 15:07 |
| Reporter: | Luis Donoso | Email Updates: | |
| Status: | Can't repeat | Impact on me: | |
| Category: | MySQL Server | Severity: | S2 (Serious) |
| Version: | 8.0.25 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[5 Jun 2021 0:01]
MySQL Verification Team
Logging: ./mtr --parallel=20 --force --max-test-fail=0 --retry-failure=0 --repeat=500 --mysqld=--secure-file-priv= --debug-common --clean-vardir sys_vars.innodb_log_writer_threads_basic
MySQL Version 8.0.26
Checking supported features
- Binaries are debug compiled
Using 'all' suites
Collecting tests
<CUT>
[ 99%] sys_vars.innodb_log_writer_threads_basic w10 [ pass ] 591
[ 99%] sys_vars.innodb_log_writer_threads_basic w6 [ pass ] 694
[ 99%] sys_vars.innodb_log_writer_threads_basic w7 [ pass ] 95
[100%] shutdown_report w0 [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 193.703 of 1676 seconds executing testcases
Completed: All 501 tests were successful.
miguel@vb-ubuntu:~/dbsd/8.0/mysql-test$
[6 Jun 2021 15:07]
MySQL Verification Team
Thank you for the bug report. Co-worker was able to repeat with 8.0.25, however with most recent source server, 10 times tested I couldn't repeat:
miguel@vb-ubuntu:~/dbsd/8.0$ bin/mysqld --version
/home/miguel/dbsd/8.0/bin/mysqld Ver 8.0.26-debug for Linux on x86_64 (Source distribution BUILT: 2021-MAY-26)
[ 99%] sys_vars.innodb_log_writer_threads_basic w8 [ pass ] 358
[ 99%] sys_vars.innodb_log_writer_threads_basic w3 [ pass ] 321
[100%] shutdown_report w0 [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 190.894 of 1586 seconds executing testcases
Completed: All 501 tests were successful.
===============================================================================
[ 99%] sys_vars.innodb_log_writer_threads_basic w19 [ pass ] 431
[ 99%] sys_vars.innodb_log_writer_threads_basic w13 [ pass ] 434
[ 99%] sys_vars.innodb_log_writer_threads_basic w8 [ pass ] 615
[100%] shutdown_report w0 [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 190.091 of 1648 seconds executing testcases
Completed: All 501 tests were successful.
===============================================================================
[ 99%] sys_vars.innodb_log_writer_threads_basic w7 [ pass ] 429
[ 99%] sys_vars.innodb_log_writer_threads_basic w4 [ pass ] 87
[100%] shutdown_report w0 [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 190.060 of 1592 seconds executing testcases
Completed: All 501 tests were successful.
===============================================================================
[ 99%] sys_vars.innodb_log_writer_threads_basic w17 [ pass ] 165
[ 99%] sys_vars.innodb_log_writer_threads_basic w16 [ pass ] 155
[100%] shutdown_report w0 [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 195.205 of 1613 seconds executing testcases
Completed: All 501 tests were successful.
[7 Jun 2021 14:11]
Yura Sorokin
It looks like the problem was introduced in 8.0.22 in the following commit https://github.com/mysql/mysql-server/commit/9bfe6028224b6125c7e9104b7e91734e9b030952 While "log_resume_writer_threads()" implements an additional while loop to confirm that *_notifier_resume_lsn have been accepted, there is no similar logic in "log_pause_writer_threads()" making sure that notifiers has entered "pause if" ("if (UNIV_UNLIKELY( log.xxx_threads_paused.load(std::memory_order_acquire)))") and are waiting on "writer_threads_resume_event". This may have been fixed after 8.0.25 as you say that it is no longer reproducible, but please double check with Yasufumi / Pawel.
[30 Jul 2021 10:04]
Yura Sorokin
Fixed in 8.0.26 InnoDB: A failure occurred during testing of innodb_log_writer_threads variable configuration. The failure was caused by a race condition. (Bug #32129814) References: This issue is a regression of: Bug #30088404.

Description: The thread running `log_resume_writer_threads` might get waiting indefinitely for an ack. The reason is an ABA problem affecting the functions `log_write_notifier` and `log_flush_notifier`. The steps producing the problem are: - Initial state: `log.writer_threads_paused` is false - `log_resume_writer_threads` is called and busy wait for an ack. - `log_pause_writer_threads` is called. Please, mind this function doesn't wait for ack. - The threads running `log_write_notifier` and `log_flush_notifier` do not notice the request in `log_resume_writer_thread` so never acknoledge the request so the requesting thread gets stuck. How to repeat: The error is not deterministic but by using `--parallel` and `--repeat` almost always one instance gets stuck by running: ./mtr \ --parallel=20 \ --force \ --max-test-fail=0 \ --retry-failure=0 \ --repeat=500 \ --mysqld=--secure-file-priv="" \ --debug-common \ --clean-vardir \ sys_vars.innodb_log_writer_threads_basic Suggested fix: diff --git a/storage/innobase/log/log0write.cc b/storage/innobase/log/log0write.cc index 24370ed26a2..ff4ef4c564c 100644 --- a/storage/innobase/log/log0write.cc +++ b/storage/innobase/log/log0write.cc @@ -2667,6 +2667,11 @@ void log_write_notifier(log_t *log_ptr) { log.write_notifier_resume_lsn.store(0, std::memory_order_release); log_write_notifier_mutex_enter(log); + } else if (UNIV_UNLIKELY(log.write_notifier_resume_lsn.load( + std::memory_order_acquire)) != 0) { + /* There might be the case there has occurred an unset/set + * and log_resume_writer_threads is waiting for Ack */ + log.write_notifier_resume_lsn.store(0, std::memory_order_release); } LOG_SYNC_POINT("log_write_notifier_before_check"); @@ -2785,6 +2790,11 @@ void log_flush_notifier(log_t *log_ptr) { log.flush_notifier_resume_lsn.store(0, std::memory_order_release); log_flush_notifier_mutex_enter(log); + } else if (UNIV_UNLIKELY(log.flush_notifier_resume_lsn.load( + std::memory_order_acquire)) != 0) { + /* There might be the case there has occurred an unset/set + * and log_resume_writer_threads is waiting for Ack */ + log.flush_notifier_resume_lsn.store(0, std::memory_order_release); }