Bug #75014 gtid_executed table compression thread may not wake up
Submitted: 26 Nov 2014 21:41 Modified: 30 Apr 2015 15:34
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any

[26 Nov 2014 21:41] Sven Sandberg
Description:
In WL#6559, GTIDs were stored into a table.
The table is range compressed by a background thread, every N transactions, where N is specified by @@global.executed_gtids_compression_period.

The communication between a committing client thread and the waiting background thread is done using a global variable, which indicates whether the compression thread should wake up or not; a mutex and condition variable, which the the compression hangs on, and a signal, which the committing thread sends for the condition variable.

However, the protocol for signalling on condition variables is not followed. The committing thread does not protect the update of the global variable with the mutex. So when the compression thread wakes up, it is possible that the global variable has only been changed in a memory cache that is only visible to the committing thread. Then the compression thread will read the old value of the global variable, and think that the signal was only sent spuriously, and go back to sleep.

How to repeat:
Read the code in rpl_gtid_persist.cc:

In compress_gtid_table, it uses the correct protocol:

    mysql_mutex_lock(&LOCK_compress_gtid_table);
    if (terminate_compress_thread)
      break;
    THD_ENTER_COND(thd, &COND_compress_gtid_table,
                   &LOCK_compress_gtid_table,
                   &stage_suspending, NULL);
    /* Add the check to handle spurious wakeups from system. */
    while(!(should_compress || terminate_compress_thread))
      mysql_cond_wait(&COND_compress_gtid_table, &LOCK_compress_gtid_table);

In Gtid_table_persistor::save(THD *, Gtid *), it does not use the mutex:

    if ((executed_gtids_compression_period != 0) &&
        (m_count >= executed_gtids_compression_period ||
         DBUG_EVALUATE_IF("compress_gtid_table", 1, 0)))
    {
      m_count= 0;
      should_compress= true;
      mysql_cond_signal(&COND_compress_gtid_table);
    }

Suggested fix:
diff --git a/sql/rpl_gtid_persist.cc b/sql/rpl_gtid_persist.cc
index 080a910..6662c2f 100644
--- a/sql/rpl_gtid_persist.cc
+++ b/sql/rpl_gtid_persist.cc
@@ -321,9 +321,11 @@ end:
         (m_count >= executed_gtids_compression_period ||
          DBUG_EVALUATE_IF("compress_gtid_table", 1, 0)))
     {
+      mysql_mutex_lock(&LOCK_compress_gtid_table);
       m_count= 0;
       should_compress= true;
       mysql_cond_signal(&COND_compress_gtid_table);
+      mysql_mutex_unlock(&LOCK_compress_gtid_table);
     }
   }
[6 Mar 2015 8:36] Bill Qu
Posted by developer:
 
The committing thread and the compression thread are created in the same process,
they are using the same memory. So the global variable has only been changed in
a memory cache that is visible to both the committing thread and the compression
thread. Did I miss something?
[6 Mar 2015 14:15] Sven Sandberg
Posted by developer:
 
The two threads could be executing on different cores, thus see different memory caches, which could differ unless the lock is there.
[30 Apr 2015 15:34] David Moss
Noted in the 5.7.8 changelog:

When gtid_executed_compression_period is set to a number greater than 0, there is a thread that wakes up after every number of transactions specified by gtid_executed_compression_period to perform range compression on the mysql.gtid_executed table. There was a small chance that the thread would miss a signal and not wake up, so that one pass of the compression algorithm would be missed and the table left uncompressed. The fix ensures that the thread wakes up consistently.
[14 Jun 2017 8:26] MySQL Verification Team
https://bugs.mysql.com/bug.php?id=86692