Bug #96644 Set read_only on a master waiting for semi-sync ACK blocked on global read lock
Submitted: 25 Aug 2019 11:58 Modified: 13 Sep 2019 9:59
Reporter: Przemyslaw Malkowski Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7.26, 5.7.27 OS:Any
Assigned to: CPU Architecture:Any

[25 Aug 2019 11:58] Przemyslaw Malkowski
Description:
When a semi-sync master lost connection with it's slave, thus is waiting for the rpl_semi_sync_master_timeout with a commit, it's impossible to set the [super_]read_only variable before the timeout. 

This makes it problematic with failover scripts when we need to promote a new master, but also make sure the old one stops accepting writes. In a high consistency required environments, where the timeouts is often seen much higher then default 10s, it makes it really difficult.

How to repeat:
Set up a simple semi-sync master-slave setup and block the communication later with iptables, as follows:

$ dbdeployer deploy replication --semi-sync --nodes=2 5.7.26
Installing and starting master
. sandbox server started
Installing and starting slave1
. sandbox server started
$HOME/sandboxes/rsandbox_5_7_26/initialize_slaves
initializing slave 1
Replication directory installed in $HOME/sandboxes/rsandbox_5_7_26
run 'dbdeployer usage multiple' for basic instructions'

master [localhost:19228] {msandbox} ((none)) > show global variables like 'rpl%';
+-------------------------------------------+------------+
| Variable_name                             | Value      |
+-------------------------------------------+------------+
| rpl_semi_sync_master_enabled              | ON         |
| rpl_semi_sync_master_timeout              | 10000      |
| rpl_semi_sync_master_trace_level          | 32         |
| rpl_semi_sync_master_wait_for_slave_count | 1          |
| rpl_semi_sync_master_wait_no_slave        | ON         |
| rpl_semi_sync_master_wait_point           | AFTER_SYNC |
| rpl_stop_slave_timeout                    | 31536000   |
+-------------------------------------------+------------+
7 rows in set (0.01 sec)

master [localhost:19228] {msandbox} ((none)) > show slave hosts;
+-----------+--------+-------+-----------+--------------------------------------+
| Server_id | Host   | Port  | Master_id | Slave_UUID                           |
+-----------+--------+-------+-----------+--------------------------------------+
|       200 | node-2 | 19229 |       100 | 00019229-2222-2222-2222-222222222222 |
+-----------+--------+-------+-----------+--------------------------------------+
1 row in set (0.00 sec)

master [localhost:19228] {msandbox} ((none)) > use test
Database changed

master [localhost:19228] {msandbox} (test) > create table t1 (id int primary key auto_increment, a datetime);
Query OK, 0 rows affected (0.04 sec)

master [localhost:19228] {msandbox} (test) > select @@port;
+--------+
| @@port |
+--------+
|  19228 |
+--------+
1 row in set (0.00 sec)

master [localhost:19228] {msandbox} ((none)) > show variables like '%read_only';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| innodb_read_only      | OFF   |
| read_only             | OFF   |
| super_read_only       | OFF   |
| transaction_read_only | OFF   |
| tx_read_only          | OFF   |
+-----------------------+-------+
5 rows in set (0.00 sec)

master [localhost:19228] {msandbox} (test) > set global rpl_semi_sync_master_timeout=20000;
Query OK, 0 rows affected (0.00 sec)

# iptables -I INPUT -p tcp -d 127.0.0.1 --dport 19228 -j DROP

-- session 1
master [localhost:19228] {msandbox} (test) > insert into t1 set a=now();
Query OK, 1 row affected (20.01 sec)

-- session 2
master [localhost:19228] {msandbox} ((none)) > set global super_read_only=1;
Query OK, 0 rows affected (15.80 sec)

-- session 3
master [localhost:19228] {msandbox} ((none)) > show processlist;
+----+----------+-----------------+------+-------------+------+---------------------------------------------------------------+------------------------------+
| Id | User     | Host            | db   | Command     | Time | State                                                         | Info                         |
+----+----------+-----------------+------+-------------+------+---------------------------------------------------------------+------------------------------+
|  2 | rsandbox | localhost:41340 | NULL | Binlog Dump |   72 | Master has sent all binlog to slave; waiting for more updates | NULL                         |
|  3 | msandbox | localhost       | test | Query       |   20 | Waiting for semi-sync ACK from slave                          | insert into t1 set a=now()   |
|  4 | msandbox | localhost       | NULL | Query       |    0 | starting                                                      | show processlist             |
|  5 | msandbox | localhost       | NULL | Query       |   16 | Waiting for global read lock                                  | set global super_read_only=1 |
+----+----------+-----------------+------+-------------+------+---------------------------------------------------------------+------------------------------+
4 rows in set (0.01 sec

The same happens with normal "read_only":

master [localhost:19228] {msandbox} ((none)) > show processlist;
+----+----------+-----------------+------+-------------+------+---------------------------------------------------------------+----------------------------+
| Id | User     | Host            | db   | Command     | Time | State                                                         | Info                       |
+----+----------+-----------------+------+-------------+------+---------------------------------------------------------------+----------------------------+
|  3 | msandbox | localhost       | test | Query       |   19 | Waiting for semi-sync ACK from slave                          | insert into t1 set a=now() |
|  4 | msandbox | localhost       | NULL | Query       |    0 | starting                                                      | show processlist           |
|  5 | msandbox | localhost       | NULL | Query       |    8 | Waiting for global read lock                                  | set global read_only=1     |
|  7 | rsandbox | localhost:42016 | NULL | Binlog Dump |   86 | Master has sent all binlog to slave; waiting for more updates | NULL                       |
+----+----------+-----------------+------+-------------+------+---------------------------------------------------------------+----------------------------+
4 rows in set (0.00 sec)

Killing the pending writing thread does not work out, as it keeps waiting for the previously set timeout, so there seems to be a no workaround to set down the rpl_semi_sync_master_timeout to  unblock the master. An example view from an instance with long timeout and kill attempt:

master [localhost:19228] {msandbox} (test) > show processlist;
+----+----------+-----------------+------+-------------+------+---------------------------------------------------------------+----------------------------+
| Id | User     | Host            | db   | Command     | Time | State                                                         | Info                       |
+----+----------+-----------------+------+-------------+------+---------------------------------------------------------------+----------------------------+
|  3 | msandbox | localhost       | test | Killed      |  145 | Waiting for semi-sync ACK from slave                          | insert into t1 set a=now() |
|  8 | rsandbox | localhost:44390 | NULL | Binlog Dump |  161 | Master has sent all binlog to slave; waiting for more updates | NULL                       |
|  9 | msandbox | localhost       | test | Query       |   70 | query end                                                     | insert into t1 set a=now() |
| 10 | msandbox | localhost       | test | Query       |    0 | starting                                                      | show processlist           |
+----+----------+-----------------+------+-------------+------+---------------------------------------------------------------+----------------------------+
4 rows in set (0.00 sec)

Suggested fix:
This behaviour is documented in https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_read_only as:

"The attempt blocks while other clients have any ongoing statement, active LOCK TABLES WRITE, or ongoing commit, until the locks are released and the statements and transactions end."

However, there should be a workaround here to forcibly set the master to read only (with the consequence that all pending writes will be lost). Otherwise, it is just unsafe to use long rpl_semi_sync_master_timeout values, as we may end up with that long downtime.
[13 Sep 2019 9:59] MySQL Verification Team
Hello Przemyslaw,

Thank you for the report.

regards,
Umesh