Bug #43460 | STOP SLAVE unable to complete when slave thread is trying to reconnect to master | ||
---|---|---|---|
Submitted: | 6 Mar 2009 16:49 | Modified: | 23 Jan 2012 15:22 |
Reporter: | Jasmin Letendre | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.1.30, 5.1.32 | OS: | Windows |
Assigned to: | Sven Sandberg | CPU Architecture: | Any |
Tags: | Delay, start slave, stop slave, timeout |
[6 Mar 2009 16:49]
Jasmin Letendre
[6 Mar 2009 20:17]
Sveta Smirnova
Thank you for the report. I can not repeat delay provided. Also version 5.1.30 as a bit old. Please upgrade to current version 5.1.32, try with it and if problem still exists provide output of SHOW SLAVE STATUS \G before running STOP SLAVE
[6 Mar 2009 22:29]
Jasmin Letendre
Thanks for the quick response! I have updated to the latest release (5.1.32) and the problem is still there. Here are my logs. You'll see that I have call 'show process list' and 'show slave status' twice. The first pair is before I shutdown my master server. The second pair is after the shutdown. mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.0.140 Master_User: root Master_Port: 3306 Connect_Retry: 60 Master_Log_File: JAZZMAN2-bin.000113 Read_Master_Log_Pos: 16258 Relay_Log_File: JAZZMAN2-relay-bin.000003 Relay_Log_Pos: 254 Relay_Master_Log_File: JAZZMAN2-bin.000113 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 16258 Relay_Log_Space: 15636 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: 1 row in set (0.00 sec) mysql> show processlist \G *************************** 1. row *************************** Id: 8 User: root Host: localhost:3514 db: NULL Command: Query Time: 0 State: NULL Info: show processlist *************************** 2. row *************************** Id: 9 User: system user Host: db: NULL Command: Connect Time: 13 State: Waiting for master to send event Info: NULL *************************** 3. row *************************** Id: 10 User: system user Host: db: NULL Command: Connect Time: 13 State: Has read all relay log; waiting for the slave I/O thread to update it Info: NULL 3 rows in set (0.00 sec) mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Reconnecting after a failed master event read Master_Host: 10.0.0.140 Master_User: root Master_Port: 3306 Connect_Retry: 60 Master_Log_File: JAZZMAN2-bin.000113 Read_Master_Log_Pos: 16258 Relay_Log_File: JAZZMAN2-relay-bin.000003 Relay_Log_Pos: 254 Relay_Master_Log_File: JAZZMAN2-bin.000113 Slave_IO_Running: No Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 16258 Relay_Log_Space: 15636 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 2003 Last_IO_Error: error reconnecting to master 'root@10.0.0.140:3306' - retry-time: 60 retries: 86400 Last_SQL_Errno: 0 Last_SQL_Error: 1 row in set (0.00 sec) mysql> show processlist \G *************************** 1. row *************************** Id: 8 User: root Host: localhost:3514 db: NULL Command: Query Time: 0 State: NULL Info: show processlist *************************** 2. row *************************** Id: 9 User: system user Host: db: NULL Command: Connect Time: 45 State: Reconnecting after a failed master event read Info: NULL *************************** 3. row *************************** Id: 10 User: system user Host: db: NULL Command: Connect Time: 45 State: Has read all relay log; waiting for the slave I/O thread to update it Info: NULL 3 rows in set (0.00 sec) mysql> stop slave; Query OK, 0 rows affected (32.69 sec)
[9 Mar 2009 16:55]
MySQL Verification Team
I couldn't repeat this issue too on Windows Vista x64 running latest source servers.
[9 Mar 2009 17:18]
Jasmin Letendre
Hi, I've tried on 3 different machines. 2 were running WinXP 64 bits and 1 was running WinXP 32 bits. They had the same problem. Is it possible that the problem is related to the way I setup replication. To enable replication, I only changed my config file (that was generated by the MySQL Server Instance Configuration Wizard) by adding the following 2 lines: server-id=1 log-bin=JAZZMAN2_D-bin I then setup replication live by executing the following actions: - mysqldump --user=root --password=root --host=localhost --port=3307 --single-transaction --master-data=1 my_db > my_db_dump.sql - CHANGE MASTER TO MASTER_HOST='localhost', MASTER_PORT=3307, MASTER_USER='root', MASTER_PASSWORD='root'; - mysql --user=root --password=root --host=localhost --port=3306 my_db < my_db_dump.sql - START SLAVE; I have verified that replication is working correctly up to the point where I stop the master server. Just in case it could be useful, here is my config file content (I removed the comments to make it shorter). --- Start of config file [client] port=3306 [mysql] default-character-set=latin1 [mysqld] port=3306 basedir="C:/Program Files (x86)/MySQL/MySQL Server 5.1/" datadir="C:/Documents and Settings/All Users/Application Data/MySQL/MySQL Server 5.1/Data/" default-character-set=latin1 default-storage-engine=INNODB sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION" max_connections=160 query_cache_size=31M table_cache=320 tmp_table_size=35M thread_cache_size=8 myisam_max_sort_file_size=100G myisam_sort_buffer_size=51M key_buffer_size=47M read_buffer_size=64K read_rnd_buffer_size=256K sort_buffer_size=256K innodb_additional_mem_pool_size=2M innodb_flush_log_at_trx_commit=1 innodb_log_buffer_size=1M innodb_buffer_pool_size=91M innodb_log_file_size=19M innodb_thread_concurrency=10 server-id=1 log-bin=JAZZMAN2_D-bin --- End of config file
[31 Mar 2009 14:55]
Susanne Ebrecht
This is the vise versa to bug #43217. Unfortunately the damage is much higher when we would let stop the slave immediately. Data lost would be the follow. If you want to get an immediate stop of the slave please use KILL. Will you have same problem when you are using KILL?
[2 Apr 2009 19:15]
Jasmin Letendre
Using KILL has exactly the same behaviour as using STOP SLAVE in my case. I don't think my bug is the vice versa to bug #43217. #43217 seems to be about waiting for a whole transaction to replicate before stopping. For my bug, there is nothing at all being replicated. The STOP SLAVE is not waiting on some replicating statement, but rather on the network timeout of the slave IO thread trying to reconnect to the master server. It seems to me like if the Slave IO thread has called connect() synchronously with a timeout of around 1 minute to reconnect to the master and this call is not interrupted when we issue STOP SLAVE. Is it possible that something like that is going on?
[3 Apr 2009 11:42]
Susanne Ebrecht
Which setup do you using? a) master and slave on same machine also mysql CLI on same machine b) master and slave on different machines and where did you run CLI? c) master and slave on different machines by using different OS and where did you run CLI? Did you connect by using "localhost" or by using IP-Adress "127.0.0.1"? Or how did you connect ... a) to master b) to slave c) between master and slave
[6 Apr 2009 15:17]
Jasmin Letendre
a)b)c) I have tested pretty much all the combinations and they all have the problem, as long as the DB servers are on Windows. For the DB server, I have tried: -both master and slave on same Windows host (bug) -master and slave on different Windows host (bug) -master and slave on same RedHat host (works well) -master and slave on different RedHat hosts (works well) -master and slave on same CentOS host (works well) -master and slave on different CentOS hosts (works well) For the CLI, I have tried (all of them from a Windows host): -mysql command line tool (bug) -mysql C client library (bug) -MyODBC (bug) -QueryBrowser (bug) The slave is always connecting to the master using the master's IP address. The client connects to the server using either localhost or the server's IP address.
[13 Jul 2009 7:04]
Sveta Smirnova
Thank you for the feedback. Verified as described: host:~/sveta/mysql-5.1.36-win32> ./bin/mysqld --defaults-file=my-small.ini --basedir=. --datadir=./data_slave/ --log-error --port=33512 --socket=/tmp/mysql_ssmirnova_slave.sock --server_id=2 --relay-log=slave-relay & [2] 3528 host:~/sveta/mysql-5.1.36-win32> ./bin/mysql -P33512 -uroot change master to master_port=3333, master_user='repl', master_password='replrepl', master_='local'; start slave; stop slave; select now(); start slave; select now(); stop slave; select now(); \q now() 2009-07-13 08:02:39 now() 2009-07-13 08:02:39 now() 2009-07-13 08:03:40
[17 Jul 2009 7:47]
Sveta Smirnova
With master-connect-retry=3 it runs forever.
[25 Aug 2009 13:47]
Jasmin Letendre
Any update on this? Any chance this will get fixed in a subsequent release of 5.1?
[20 Jan 2010 16:07]
C F
* bump * Seems to be experiencing similar behavior (master not available, long (>5min wait times)) +-------------------------+----------------------+ | Variable_name | Value | +-------------------------+----------------------+ | version | 5.1.37-2~bpo50+1-log | | version_comment | (Debian) | | version_compile_machine | i486 | | version_compile_os | debian-linux-gnu | +-------------------------+----------------------+
[12 May 2010 8:21]
Meiji KIMURA
I can reproduce this behavior on MySQL 5.0.58. I looked into this in detail with Debugger and Debug version of MySQL. This block took a time. (terminate_slave_thread() function in slave.cc) while (*slave_running) // Should always be true { DBUG_PRINT("loop", ("killing slave thread")); KICK_SLAVE(thd); /* There is a small chance that slave thread might miss the first alarm. To protect againts it, resend the signal until it reacts */ struct timespec abstime; set_timespec(abstime,2); pthread_cond_timedwait(term_cond, cond_lock, &abstime); }
[1 Dec 2010 3:13]
Zhenxing He
See also BUG#58546
[15 Jul 2011 3:02]
Meiji KIMURA
After fixing BUG#58546 , I check the behavior under MySQL 5.5.13 on Win64. But the behavior is same as before.
[23 Jan 2012 14:40]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at http://dev.mysql.com/doc/en/installing-source.html
[23 Jan 2012 14:42]
Jon Stephens
Documented as follows in the 5.5.21 changelog: On Windows slave hosts, STOP SLAVE took a overly long time to complete when the master was down. Closed.
[23 Jan 2012 15:22]
Jasmin Letendre
Thanks! Will test as soon as the package is out.
[7 Feb 2012 8:55]
Jon Stephens
Fix also documented in the 5.6.5 changelog. Status unchanged.