Bug #79865 semi-sync can't run normally when many connections
Submitted: 7 Jan 2016 3:34 Modified: 13 Dec 2016 13:49
Reporter: steven cai Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.7.10, 5.7.12 OS:Any
Assigned to: CPU Architecture:Any
Tags: semi-sync select listener FD_SET

[7 Jan 2016 3:34] steven cai
Description:
From 5.7,semi-sync add Ack_receiver thread for listening slave ack,which use select(). But select() can only listen socket fd between 1 and __FD_SET_SIZE(my os is 1024),  when socket fd is bigger than __FD_SET_SIZE, select() has no effect, and can never get  ack from slave,then semi-sync can't run normally.even more,select() use array store fds, when use FD_SET store fd which is bigger than __FD_SET_SIZE, array will overflow,so mysqld may crash。

How to repeat:
==========
1.use mysql-test-run.pl with --max-connections=2048
2.make sure your os max open files is more than 1024, my setting is 65536
==========

#Want to skip this test from daily Valgrind execution
--source include/no_valgrind_without_big.inc
source include/have_semisync_plugin.inc;
source include/not_embedded.inc;
source include/have_innodb.inc;
source include/master-slave.inc;
source include/not_gtid_enabled.inc;

# Suppress warnings that might be generated during the test
disable_query_log;
call mtr.add_suppression("Timeout waiting for reply of binlog");
call mtr.add_suppression("Read semi-sync reply");
call mtr.add_suppression("Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT.");
call mtr.add_suppression("Semi-sync master failed on net_flush() before waiting for slave reply");
call mtr.add_suppression("Master server does not support semi-sync");
call mtr.add_suppression("Semi-sync slave .* reply");
call mtr.add_suppression("Slave SQL.*Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group");call mtr.add_suppression("Sending passwords in plain text without SSL/TLS is extremely insecure");
call mtr.add_suppression("Sending passwords in plain text without SSL/TLS is extremely insecure");
call mtr.add_suppression("Storing MySQL user name or password information in the master");
enable_query_log;

let $engine_type= InnoDB;

--echo #
--echo # start  semi-sync replication for master
--echo #

connection master;
echo [ on master ];
let $master_timeout_old= query_get_value(SHOW VARIABLES LIKE 'rpl_semi_sync_master_timeout', Value, 1);
let $master_max_connection_old= query_get_value(SHOW VARIABLES LIKE 'max_connections', Value, 1);
let $master_max_user_connection_old= query_get_value(SHOW VARIABLES LIKE 'max_user_connections', Value, 1);

set global rpl_semi_sync_master_timeout=10000;
set global max_connections= 2048;
set global max_user_connections= 2048;
echo [ enable semi-sync on master ]; 
set global rpl_semi_sync_master_enabled = 1;
show variables like 'rpl_semi_sync_master_enabled';

--echo #
--echo ############# create more than 1024 connection
--echo #
connection master;
let $count=1024;
while ($count)
{
    connect (conn$count,127.0.0.1,root,,);
    dec $count;
}

--echo #
--echo # start  semi-sync replication for slave
--echo #
connection slave;
source include/stop_slave.inc;
echo [ on slave ];

let $value= query_get_value(show variables like 'rpl_semi_sync_slave_enabled', Value, 1);
if ($value == No such row)
{
  disable_query_log;
  enable_query_log;
}

echo [ default state of semi-sync on slave should be OFF ];
show variables like 'rpl_semi_sync_slave_enabled';

echo [ enable semi-sync on slave ];
set global rpl_semi_sync_slave_enabled = 1;
show variables like 'rpl_semi_sync_slave_enabled';
source include/start_slave.inc;

--echo #
--echo # show semi-sync status before insert
--echo #
connection master;
echo [ on master ];
show status like 'Rpl_semi_sync_master_clients';
show status like 'Rpl_semi_sync_master_status';
show status like 'Rpl_semi_sync_master_no_tx';
show status like 'Rpl_semi_sync_master_yes_tx';

--echo #
--echo #  insert 100 rows
--echo #
let $i= 100;
disable_query_log;
eval create table t1 (a int) engine=$engine_type;
while ($i)
{
  eval insert into t1 values ($i);
  dec $i;
}
enable_query_log;

--sleep 10
sync_slave_with_master;
echo [slave has been sync with master];
echo [on slave];
select count(distinct a) from t1;
select min(a) from t1;
select max(a) from t1;

--echo #
--echo # after insert 100 rows
--echo
connection master;
echo [ on master ];
show status like 'Rpl_semi_sync_master_clients';
show status like 'Rpl_semi_sync_master_status';
show status like 'Rpl_semi_sync_master_no_tx';
show status like 'Rpl_semi_sync_master_yes_tx';

--echo #
--echo # clean up
--echo #
connection master;
echo [ on master ];
enable_query_log;
eval set global rpl_semi_sync_master_timeout= $master_timeout_old;
eval set global max_connections= $master_max_connection_old;
eval set global max_user_connections= $master_max_user_connection_old;
drop table t1;
set global rpl_semi_sync_master_enabled= 0;

connection slave;
echo [ on slave ];
set global rpl_semi_sync_slave_enabled= 0;

--source include/rpl_end.inc

Suggested fix:
we can use poll or epoll for listening slave ack. In fact, I use poll for listening slave ack, then semi-sync can run ok.
[13 Jun 2016 12:51] MySQL Verification Team
Another test case to reproduce this problem. I can reproduce it on CentOS 6, bue cannot repoduce it on CentOS 7.

(0) Confirm open files settion grater than 1024. (esp.centos6)
(1) Prepare a mysql sandbox environment & MySQL 5.7.x generic tarball.
(2) Make Master:Slave replication environment with MySQL sandbox command.

shell> make_replication_sandbox 5.7.13

(3) make semi-replicatioin setting with sandbox command;

./use_all "INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
SET GLOBAL rpl_semi_sync_master_enabled = 1;
SET GLOBAL rpl_semi_sync_slave_enabled = 1;"

(4) re-start slaves; with this command.

./use_all stop slave;
./use_all start slave;

(5) enlarge max_connections settiongs.

./use_all set global max_connections=2048;

(6) Login to master and create table, schema and keep this connection.

./m test
CREATE TABLE t1(i1 int not null primary key, v2 varchar(20)) engine = innodb;
create database mysqlslap;

(7) make 'slap' script by copying 'use' script of mysql sandbox

cp ./use ./slap

(8) Change two line of 'slap' as belows.

[ -z "$MYSQL_EDITOR" ] && MYSQL_EDITOR="$BASEDIR/bin/mysqlslap"
   $MYSQL_EDITOR --defaults-file=$MY_CNF $MYCLIENT_OPTIONS "$@" --query="SELECT sleep(180);" --concurrency=1600 --iterations=1

(9) stop slaves

./use_all stop slave

(10) start slaves with 10-sec delay.

sleep 10;./use_all start slave

(11) before stat slaves (within 10-sec) run 'slap' script.

./slap

(12) during slap keep 1600-connection, run insert command on (6) connection.

insert into t1 values(1,'a');

(13) the problem will occurr in CentOS6 environment.
[14 Jun 2016 5:12] MySQL Verification Team
Hello Chuck cai,

Thank you for the report and test case.

Thanks,
Umesh
[20 Jun 2016 6:30] MySQL Verification Team
In my test case, inserting row triggers the all semi-sync=OFF ! 
and didn't occurred on CentOS7 with the same procedure.
[5 Jul 2016 9:24] MySQL Verification Team
Bug #82059 marked as duplicate of this
[26 Sep 2016 6:58] Libing Song
Bug#82207 is a dup of this
[27 Sep 2016 5:56] Daniël van Eeden
Bug #83013 marked as a duplicate of this
[13 Dec 2016 13:49] Erlend Dahl
[9 Dec 2016 7:57] David Moss 

Thank you for your feedback, this has been fixed in upcoming versions and the
following was added to the 5.7.17 and 8.0.1 changelogs:
Using semisynchronous replication was not possible with more than 1024
simultaneous connections.