| Bug #79865 | semi-sync can't run normally when many connections | ||
|---|---|---|---|
| Submitted: | 7 Jan 2016 3:34 | Modified: | 13 Dec 2016 13:49 |
| Reporter: | steven cai | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
| Version: | 5.7.10, 5.7.12 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | semi-sync select listener FD_SET | ||
[13 Jun 2016 12:51]
MySQL Verification Team
Another test case to reproduce this problem. I can reproduce it on CentOS 6, bue cannot repoduce it on CentOS 7. (0) Confirm open files settion grater than 1024. (esp.centos6) (1) Prepare a mysql sandbox environment & MySQL 5.7.x generic tarball. (2) Make Master:Slave replication environment with MySQL sandbox command. shell> make_replication_sandbox 5.7.13 (3) make semi-replicatioin setting with sandbox command; ./use_all "INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so'; INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so'; SET GLOBAL rpl_semi_sync_master_enabled = 1; SET GLOBAL rpl_semi_sync_slave_enabled = 1;" (4) re-start slaves; with this command. ./use_all stop slave; ./use_all start slave; (5) enlarge max_connections settiongs. ./use_all set global max_connections=2048; (6) Login to master and create table, schema and keep this connection. ./m test CREATE TABLE t1(i1 int not null primary key, v2 varchar(20)) engine = innodb; create database mysqlslap; (7) make 'slap' script by copying 'use' script of mysql sandbox cp ./use ./slap (8) Change two line of 'slap' as belows. [ -z "$MYSQL_EDITOR" ] && MYSQL_EDITOR="$BASEDIR/bin/mysqlslap" $MYSQL_EDITOR --defaults-file=$MY_CNF $MYCLIENT_OPTIONS "$@" --query="SELECT sleep(180);" --concurrency=1600 --iterations=1 (9) stop slaves ./use_all stop slave (10) start slaves with 10-sec delay. sleep 10;./use_all start slave (11) before stat slaves (within 10-sec) run 'slap' script. ./slap (12) during slap keep 1600-connection, run insert command on (6) connection. insert into t1 values(1,'a'); (13) the problem will occurr in CentOS6 environment.
[14 Jun 2016 5:12]
MySQL Verification Team
Hello Chuck cai, Thank you for the report and test case. Thanks, Umesh
[20 Jun 2016 6:30]
MySQL Verification Team
In my test case, inserting row triggers the all semi-sync=OFF ! and didn't occurred on CentOS7 with the same procedure.
[5 Jul 2016 9:24]
MySQL Verification Team
Bug #82059 marked as duplicate of this
[26 Sep 2016 6:58]
Libing Song
Bug#82207 is a dup of this
[27 Sep 2016 5:56]
Daniël van Eeden
Bug #83013 marked as a duplicate of this
[13 Dec 2016 13:49]
Erlend Dahl
[9 Dec 2016 7:57] David Moss Thank you for your feedback, this has been fixed in upcoming versions and the following was added to the 5.7.17 and 8.0.1 changelogs: Using semisynchronous replication was not possible with more than 1024 simultaneous connections.

Description: From 5.7,semi-sync add Ack_receiver thread for listening slave ack,which use select(). But select() can only listen socket fd between 1 and __FD_SET_SIZE(my os is 1024), when socket fd is bigger than __FD_SET_SIZE, select() has no effect, and can never get ack from slave,then semi-sync can't run normally.even more,select() use array store fds, when use FD_SET store fd which is bigger than __FD_SET_SIZE, array will overflow,so mysqld may crash。 How to repeat: ========== 1.use mysql-test-run.pl with --max-connections=2048 2.make sure your os max open files is more than 1024, my setting is 65536 ========== #Want to skip this test from daily Valgrind execution --source include/no_valgrind_without_big.inc source include/have_semisync_plugin.inc; source include/not_embedded.inc; source include/have_innodb.inc; source include/master-slave.inc; source include/not_gtid_enabled.inc; # Suppress warnings that might be generated during the test disable_query_log; call mtr.add_suppression("Timeout waiting for reply of binlog"); call mtr.add_suppression("Read semi-sync reply"); call mtr.add_suppression("Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT."); call mtr.add_suppression("Semi-sync master failed on net_flush() before waiting for slave reply"); call mtr.add_suppression("Master server does not support semi-sync"); call mtr.add_suppression("Semi-sync slave .* reply"); call mtr.add_suppression("Slave SQL.*Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group");call mtr.add_suppression("Sending passwords in plain text without SSL/TLS is extremely insecure"); call mtr.add_suppression("Sending passwords in plain text without SSL/TLS is extremely insecure"); call mtr.add_suppression("Storing MySQL user name or password information in the master"); enable_query_log; let $engine_type= InnoDB; --echo # --echo # start semi-sync replication for master --echo # connection master; echo [ on master ]; let $master_timeout_old= query_get_value(SHOW VARIABLES LIKE 'rpl_semi_sync_master_timeout', Value, 1); let $master_max_connection_old= query_get_value(SHOW VARIABLES LIKE 'max_connections', Value, 1); let $master_max_user_connection_old= query_get_value(SHOW VARIABLES LIKE 'max_user_connections', Value, 1); set global rpl_semi_sync_master_timeout=10000; set global max_connections= 2048; set global max_user_connections= 2048; echo [ enable semi-sync on master ]; set global rpl_semi_sync_master_enabled = 1; show variables like 'rpl_semi_sync_master_enabled'; --echo # --echo ############# create more than 1024 connection --echo # connection master; let $count=1024; while ($count) { connect (conn$count,127.0.0.1,root,,); dec $count; } --echo # --echo # start semi-sync replication for slave --echo # connection slave; source include/stop_slave.inc; echo [ on slave ]; let $value= query_get_value(show variables like 'rpl_semi_sync_slave_enabled', Value, 1); if ($value == No such row) { disable_query_log; enable_query_log; } echo [ default state of semi-sync on slave should be OFF ]; show variables like 'rpl_semi_sync_slave_enabled'; echo [ enable semi-sync on slave ]; set global rpl_semi_sync_slave_enabled = 1; show variables like 'rpl_semi_sync_slave_enabled'; source include/start_slave.inc; --echo # --echo # show semi-sync status before insert --echo # connection master; echo [ on master ]; show status like 'Rpl_semi_sync_master_clients'; show status like 'Rpl_semi_sync_master_status'; show status like 'Rpl_semi_sync_master_no_tx'; show status like 'Rpl_semi_sync_master_yes_tx'; --echo # --echo # insert 100 rows --echo # let $i= 100; disable_query_log; eval create table t1 (a int) engine=$engine_type; while ($i) { eval insert into t1 values ($i); dec $i; } enable_query_log; --sleep 10 sync_slave_with_master; echo [slave has been sync with master]; echo [on slave]; select count(distinct a) from t1; select min(a) from t1; select max(a) from t1; --echo # --echo # after insert 100 rows --echo connection master; echo [ on master ]; show status like 'Rpl_semi_sync_master_clients'; show status like 'Rpl_semi_sync_master_status'; show status like 'Rpl_semi_sync_master_no_tx'; show status like 'Rpl_semi_sync_master_yes_tx'; --echo # --echo # clean up --echo # connection master; echo [ on master ]; enable_query_log; eval set global rpl_semi_sync_master_timeout= $master_timeout_old; eval set global max_connections= $master_max_connection_old; eval set global max_user_connections= $master_max_user_connection_old; drop table t1; set global rpl_semi_sync_master_enabled= 0; connection slave; echo [ on slave ]; set global rpl_semi_sync_slave_enabled= 0; --source include/rpl_end.inc Suggested fix: we can use poll or epoll for listening slave ack. In fact, I use poll for listening slave ack, then semi-sync can run ok.