Bug #103914 Thread running log_resume_writer_threads gets blocked
Submitted: 4 Jun 2021 15:55 Modified: 6 Jun 2021 15:07
Reporter: Luis Donoso Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:8.0.25 OS:Any
Assigned to: CPU Architecture:Any

[4 Jun 2021 15:55] Luis Donoso
Description:
The thread running `log_resume_writer_threads` might get waiting
indefinitely for an ack.

The reason is an ABA problem affecting the functions `log_write_notifier` and
`log_flush_notifier`. The steps producing the problem are:

- Initial state: `log.writer_threads_paused` is false
- `log_resume_writer_threads` is called and busy wait for an ack.
- `log_pause_writer_threads` is called. Please, mind this function doesn't wait
  for ack.
- The threads running `log_write_notifier` and `log_flush_notifier` do not
  notice the request in `log_resume_writer_thread` so never acknoledge the request 
  so the requesting thread gets stuck.

How to repeat:
The error is not deterministic but by using `--parallel` and `--repeat` almost
always one instance gets stuck by running:

    ./mtr \
        --parallel=20 \
        --force \
        --max-test-fail=0 \
        --retry-failure=0 \
        --repeat=500 \
        --mysqld=--secure-file-priv="" \
        --debug-common \
        --clean-vardir \
        sys_vars.innodb_log_writer_threads_basic

Suggested fix:
diff --git a/storage/innobase/log/log0write.cc b/storage/innobase/log/log0write.cc
index 24370ed26a2..ff4ef4c564c 100644
--- a/storage/innobase/log/log0write.cc
+++ b/storage/innobase/log/log0write.cc
@@ -2667,6 +2667,11 @@ void log_write_notifier(log_t *log_ptr) {
       log.write_notifier_resume_lsn.store(0, std::memory_order_release);
 
       log_write_notifier_mutex_enter(log);
+    } else if (UNIV_UNLIKELY(log.write_notifier_resume_lsn.load(
+                   std::memory_order_acquire)) != 0) {
+      /* There might be the case there has occurred an unset/set
+       * and log_resume_writer_threads is waiting for Ack */
+      log.write_notifier_resume_lsn.store(0, std::memory_order_release);
     }
 
     LOG_SYNC_POINT("log_write_notifier_before_check");
@@ -2785,6 +2790,11 @@ void log_flush_notifier(log_t *log_ptr) {
       log.flush_notifier_resume_lsn.store(0, std::memory_order_release);
 
       log_flush_notifier_mutex_enter(log);
+    } else if (UNIV_UNLIKELY(log.flush_notifier_resume_lsn.load(
+                   std::memory_order_acquire)) != 0) {
+      /* There might be the case there has occurred an unset/set
+       * and log_resume_writer_threads is waiting for Ack */
+      log.flush_notifier_resume_lsn.store(0, std::memory_order_release);
     }
[5 Jun 2021 0:01] MySQL Verification Team
Logging: ./mtr  --parallel=20 --force --max-test-fail=0 --retry-failure=0 --repeat=500 --mysqld=--secure-file-priv= --debug-common --clean-vardir sys_vars.innodb_log_writer_threads_basic
MySQL Version 8.0.26
Checking supported features
 - Binaries are debug compiled
Using 'all' suites
Collecting tests
<CUT>
[ 99%] sys_vars.innodb_log_writer_threads_basic w10 [ pass ]    591                                  
[ 99%] sys_vars.innodb_log_writer_threads_basic w6  [ pass ]    694                                  
[ 99%] sys_vars.innodb_log_writer_threads_basic w7  [ pass ]     95                                  
[100%] shutdown_report                          w0  [ pass ]                                         
------------------------------------------------------------------------------                       
The servers were restarted 0 times                                                                   
The servers were reinitialized 0 times                                                               
Spent 193.703 of 1676 seconds executing testcases                                                    
                                                                                                     
Completed: All 501 tests were successful.                                                            
                                                                                                     
miguel@vb-ubuntu:~/dbsd/8.0/mysql-test$
[6 Jun 2021 15:07] MySQL Verification Team
Thank you for the bug report. Co-worker was able to repeat with 8.0.25, however with most recent source server, 10 times tested I couldn't repeat:

miguel@vb-ubuntu:~/dbsd/8.0$ bin/mysqld --version
/home/miguel/dbsd/8.0/bin/mysqld  Ver 8.0.26-debug for Linux on x86_64 (Source distribution BUILT: 2021-MAY-26)

[ 99%] sys_vars.innodb_log_writer_threads_basic w8  [ pass ]    358
[ 99%] sys_vars.innodb_log_writer_threads_basic w3  [ pass ]    321
[100%] shutdown_report                          w0  [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 190.894 of 1586 seconds executing testcases

Completed: All 501 tests were successful.
===============================================================================
[ 99%] sys_vars.innodb_log_writer_threads_basic w19 [ pass ]    431                
[ 99%] sys_vars.innodb_log_writer_threads_basic w13 [ pass ]    434                
[ 99%] sys_vars.innodb_log_writer_threads_basic w8  [ pass ]    615                
[100%] shutdown_report                          w0  [ pass ]                       
------------------------------------------------------------------------------     
The servers were restarted 0 times                                                 
The servers were reinitialized 0 times                                             
Spent 190.091 of 1648 seconds executing testcases                                  
                                                                                   
Completed: All 501 tests were successful.                                          
     
=============================================================================== 
[ 99%] sys_vars.innodb_log_writer_threads_basic w7  [ pass ]    429
[ 99%] sys_vars.innodb_log_writer_threads_basic w4  [ pass ]     87
[100%] shutdown_report                          w0  [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 190.060 of 1592 seconds executing testcases

Completed: All 501 tests were successful.
=============================================================================== 
[ 99%] sys_vars.innodb_log_writer_threads_basic w17 [ pass ]    165
[ 99%] sys_vars.innodb_log_writer_threads_basic w16 [ pass ]    155
[100%] shutdown_report                          w0  [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 195.205 of 1613 seconds executing testcases

Completed: All 501 tests were successful.
[7 Jun 2021 14:11] Yura Sorokin
It looks like the problem was introduced in 8.0.22 in the following commit
https://github.com/mysql/mysql-server/commit/9bfe6028224b6125c7e9104b7e91734e9b030952

While "log_resume_writer_threads()" implements an additional while loop to confirm that *_notifier_resume_lsn have been accepted, there is no similar logic in "log_pause_writer_threads()" making sure that notifiers has entered "pause if" ("if (UNIV_UNLIKELY(           log.xxx_threads_paused.load(std::memory_order_acquire)))") and are waiting on 
"writer_threads_resume_event".

This may have been fixed after 8.0.25 as you say that it is no longer reproducible, but please double check with Yasufumi / Pawel.
[30 Jul 2021 10:04] Yura Sorokin
Fixed in 8.0.26

InnoDB: A failure occurred during testing of innodb_log_writer_threads variable configuration. The failure was caused by a race condition. (Bug #32129814)

References: This issue is a regression of: Bug #30088404.