Bug #77585 Slave_open_temp_tables wrong after RESET SLAVE when using multi-source
Submitted: 1 Jul 2015 15:23 Modified: 14 Sep 2015 16:28
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any

[1 Jul 2015 15:23] Sven Sandberg
Description:
The status variable Slave_open_temp_tables keeps track of the number of temporary tables that are opened by the replication slave. If multi-source is enabled, it is the total number of tables for all channels.

Problem 1:
However, RESET SLAVE FOR CHANNEL 'channel' will force the value to 0. This is wrong in case any other replication channel has open temporary tables, since those are not affected by the statement. If those other channels later drop their tables, the value 0 will decrease and wrap around to 2^32-1.

The variable is used for monitoring, and it also determines if a warning is printed when issuing a STOP SLAVE or CHANGE MASTER TO statement. Thus having the variable set wrong can cause spurious warnings or missed warnings.

Problem 2:
The function that modifies the counter is implemented like:
  static void modify_slave_open_temp_tables(THD *thd, int inc)
  {
    if (thd->system_thread == SYSTEM_THREAD_SLAVE_WORKER)
    {
      my_atomic_add32(&slave_open_temp_tables, inc);
    }
    else
    {
      slave_open_temp_tables += inc;
    }
  }
This code relies on two wrong assumptions: (1) when multi-threaded slave is not enabled the variable is only written by one thread; (2) it is safe to use non-atomic operations in single-writer-multiple-readers cases. Assumption (1) is wrong in multi-source. Assumption (2) is wrong on some architectures; stale values may live for very long time in thread-local caches.

How to repeat:
Test case for problem 1:

--let $rpl_multi_source=1
--let $rpl_topology= 1->2, 3->2
--source include/rpl_init.inc

--echo # Create temporary tables on two channels.
--connection server_1
CREATE TEMPORARY TABLE t1 (a INT);
--let $rpl_channel_name= channel_1
--let $sync_slave_connection= server_2
--source include/sync_slave_sql_with_master.inc

--connection server_3
CREATE TEMPORARY TABLE t3 (a INT);
--let $rpl_channel_name= channel_3
--let $sync_slave_connection= server_2
--source include/sync_slave_sql_with_master.inc

--echo # The value is correctly set to 2
SHOW STATUS LIKE 'Slave_open_temp_tables';

--echo # Stopping the replication thread does not delete tables and does not
--echo # affect the status variable. It also generates a warning because there
--echo # are open temporary tables. So far all is ok.
--let $rpl_channel_name= channel_1
--source include/stop_slave.inc
SHOW STATUS LIKE 'Slave_open_temp_tables';

--echo # RESET SLAVE FOR CHANNEL wrongly sets the status variable to 0, despite
--echo # it leaves a table open on channel_3.
RESET SLAVE FOR CHANNEL 'channel_1';
SHOW STATUS LIKE 'Slave_open_temp_tables';

--echo # STOP SLAVE should generate a warning because there are open temp
--echo # tables, but does not because the status variable is zero.
STOP SLAVE FOR CHANNEL 'channel_3';

--echo # CHANGE MASTER TO should generate a warning because there are open
--echo # temp tables, but does not because the status variable is zero.
eval CHANGE MASTER TO MASTER_PORT = $SERVER_MYPORT_3 FOR CHANNEL 'channel_3';

--echo # Dropping the table that is still open on the other channel will
--echo # decrease the value from zero. Internally it is stored as signed, but
--echo # it is displayed as unsigned, so it wraps around and shows 2<<32-1.
--let $rpl_channel_name= channel_3
--source include/start_slave.inc
--connection server_3
DROP TEMPORARY TABLE t3;
--let $rpl_channel_name= channel_3
--let $sync_slave_connection= server_2
--source include/sync_slave_sql_with_master.inc
SHOW STATUS LIKE 'Slave_open_temp_tables';

--echo # STOP SLAVE generates a warning despite there are no open temp
--echo # tables.
STOP SLAVE FOR CHANNEL 'channel_3';

--echo # Contrary to STOP SLAVE, CHANGE MASTER TO does not generate a
--echo # warning, this is because the variable is signed internally, so it
--echo # has value -1; and STOP SLAVE checks if the value is nonzero whereas
--echo # CHANGE MASTER checks if it is positive.
CHANGE MASTER TO MASTER_PORT = 4711 FOR CHANNEL 'channel_3';

SHOW STATUS LIKE 'Slave_open_temp_tables';

--exit

Problem 2 is harder to test.

Suggested fix:
Two suggestions to fix problem 1:
1. Decrement the counter once for each table that we close, rather than setting it to 0.
2. Store the number of open tables for the channel in the rli object, and decrease the counter by this number instead of reset it to 0.

Problem 2: use atomic operations unconditionally, both to read and to write.
[1 Jul 2015 15:26] Sven Sandberg
Posted by developer:
 
In case solution 2 is used for problem 1, suggest to refine the checks in CHANGE MASTER TO and STOP SLAVE, so that it reports a warning only if *the affected channel* has open temporary tables. It is safe to use CHANGE MASTER TO and STOP SLAVE on a channel that does not have open temporary tables even if some other channel has open temporary tables.
[14 Sep 2015 16:28] Jon Stephens
Thank you for your bug report. This issue has already been fixed in the latest released version of that product, which you can download at

  http://www.mysql.com/downloads/
[14 Sep 2015 16:54] Jon Stephens
Documented fix in the 5.7.9 and 5.8.0 changelogs as follows:

    The status variable Slave_open_temp_tables keeps track of the
    number of temporary tables that are opened by the replication
    slave. If multi-source is enabled, it is the total number of
    temporary tables for all channels. This fix addresses the
    following issues relating to this variable:

      ·RESET SLAVE FOR CHANNEL channel forced the value of
      Slave_open_temp_tables to 0; in the event that some other
      replication channel had open temporary tables which were
      later dropped, the value wrapped around to a negative value
      (1-2³²). This also cause spurious or missed warnings when
      issuing a STOP SLAVE or CHANGE MASTER TO statement.

      ·The internal function that modifies Slave_open_temp_tables
      in such cases relies on two incorrect assumptions:

        1. The variable is updated by only one thread when
        multi-threaded slaves are not enabled, which is not true
        in the case of multi-source replication.

        2. Non-atomic operations are safe with a single writer and
        multiple readers, which is not necessarily true for all
        platforms supported by MySQL.

Closed.