Bug #73039 | relay_log_recovery KO when CHANGE MASTER without FILE and SQL_THREAD not started | ||
---|---|---|---|
Submitted: | 18 Jun 2014 14:15 | Modified: | 29 Oct 2014 16:25 |
Reporter: | Jean-François Gagné | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.6.19, 5.6.20, 5.6.22 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | Crash-safe replication |
[18 Jun 2014 14:15]
Jean-François Gagné
[19 Jun 2014 11:17]
MySQL Verification Team
Hello Jean, Thank you for the bug report and test case. Verified as described. Thanks, Umesh
[19 Jun 2014 11:18]
MySQL Verification Team
// How to steps for dev's [root@cluster-repo mysql-advanced-5.6.19]# more 73039M.cnf 73039S.cnf :::::::::::::: 73039M.cnf :::::::::::::: #Master [mysqld] user = root pid-file = /tmp/master/mysqld.pid socket = /tmp/master/mysql.sock port = 3306 basedir = /data/ushastry/server/mysql-advanced-5.6.19 datadir = /tmp/master tmpdir = /tmp log-bin = master-bin log-bin-index = master-bin.index server-id = 1 :::::::::::::: 73039S.cnf :::::::::::::: #Slave [mysqld] user = root pid-file = /tmp/slave/mysqld.pid socket = /tmp/slave/mysql.sock port = 3307 basedir = /data/ushastry/server/mysql-advanced-5.6.19 datadir = /tmp/slave tmpdir = /tmp relay-log-index = slave-relay-bin.index relay-log = slave-relay-bin server-id = 2 relay-log-recovery=1 // Initialize scripts/mysql_install_db --defaults-file=./73039M.cnf scripts/mysql_install_db --defaults-file=./73039S.cnf // Start [root@cluster-repo mysql-advanced-5.6.19]# bin/mysqld_safe --defaults-file=./73039M.cnf & [root@cluster-repo mysql-advanced-5.6.19]# bin/mysqld_safe --defaults-file=./73039S.cnf & [root@cluster-repo mysql-advanced-5.6.19]# ps aux|grep mysqld root 1966 0.0 0.0 106228 1400 pts/1 S 07:11 0:00 /bin/sh bin/mysqld_safe --defaults-file=./73039M.cnf root 2159 0.2 7.3 1167832 451080 pts/1 Sl 07:11 0:01 /data/ushastry/server/mysql-advanced-5.6.19/bin/mysqld --defaults-file=./73039M.cnf --basedir=/data/ushastry/server/mysql-advanced-5.6.19 --datadir=/tmp/master --plugin-dir=/data/ushastry/server/mysql-advanced-5.6.19/lib/plugin --user=root --log-error=/tmp/master/cluster-repo.err --pid-file=/tmp/master/mysqld.pid --socket=/tmp/master/mysql.sock --port=3306 root 2185 0.0 0.0 106228 1396 pts/1 S 07:13 0:00 /bin/sh bin/mysqld_safe --defaults-file=./73039S.cnf root 2390 0.2 7.3 1108360 450444 pts/1 Sl 07:13 0:01 /data/ushastry/server/mysql-advanced-5.6.19/bin/mysqld --defaults-file=./73039S.cnf --basedir=/data/ushastry/server/mysql-advanced-5.6.19 --datadir=/tmp/slave --plugin-dir=/data/ushastry/server/mysql-advanced-5.6.19/lib/plugin --user=root --log-error=/tmp/slave/cluster-repo.err --pid-file=/tmp/slave/mysqld.pid --socket=/tmp/slave/mysql.sock --port=3307 root 2475 0.0 0.0 103256 824 pts/3 S+ 07:20 0:00 grep mysqld // Setup replication * Create replication use on master * Point slave to master slave>CHANGE MASTER TO MASTER_HOST='127.0.0.1', MASTER_PORT=3306, MASTER_USER='repl', MASTER_PASSWORD='slavepass'; slave>START SLAVE IO_THREAD; // Start provisioning data on master cat << EOF | sudo bin/mysql -u root -S /tmp/master/mysql.sock CREATE DATABASE test_jfg; CREATE TABLE test_jfg.t1 ( userid BIGINT PRIMARY KEY, password BIGINT DEFAULT NULL); EOF seq -f "INSERT into test_jfg.t1 VALUES (%.0f, NULL);" 1 1000000 | sudo bin/mysql -u root -S /tmp/master/mysql.sock // Kill slave ( Not required to wait for the master to complete provisioning) [root@cluster-repo mysql-advanced-5.6.19]# kill -9 2185 2390 // Restart slave [root@cluster-repo mysql-advanced-5.6.19]# bin/mysqld_safe --defaults-file=./73039S.cnf & [1] 2486 [root@cluster-repo mysql-advanced-5.6.19]# 140621 07:23:12 mysqld_safe Logging to '/tmp/slave/cluster-repo.err'. 140621 07:23:13 mysqld_safe Starting mysqld daemon with databases from /tmp/slave ## Check slave status SHOW SLAVE STATUS\G .. .. Last_SQL_Errno: 1062 Last_SQL_Error: Error 'Duplicate entry '%-test-' for key 'PRIMARY'' on query. Default database: 'mysql'. Query: 'INSERT INTO db SELECT * FROM tmp_db WHERE @had_db_table=0;'
[10 Sep 2014 15:38]
MySQL Verification Team
Hello Jean-François, Imho that duplicate key error in my earlier note was mostly due to "bin-log" generated during initial setup on master: // Cross checked for tmp_db, and confirmed that it was generated at the time of initialization master post mysql_install_db call. bin/mysqlbinlog /tmp/master/master-bin.000001 |grep tmp_db CREATE TEMPORARY TABLE tmp_db LIKE db; INSERT INTO tmp_db VALUES ('%','test','','Y','Y','Y','Y','Y','Y','N','Y','Y','Y','Y','Y','Y','Y','Y','N','N','Y','Y'); INSERT INTO tmp_db VALUES ('%','test\_%','','Y','Y','Y','Y','Y','Y','N','Y','Y','Y','Y','Y','Y','Y','Y','N','N','Y','Y'); INSERT INTO db SELECT * FROM tmp_db WHERE @had_db_table=0; DROP TEMPORARY TABLE `tmp_db` /* generated by server */ Then,instructed slave to connect without "MASTER_FILE and MASTER_POS", which means it is replicating from the initial position: 2014-09-12 10:52:53 31297 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='127. 0.0.1', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. 2014-09-12 10:53:00 31297 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER a nd PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information. 2014-09-12 10:53:00 31297 [Note] Slave I/O thread: connected to master 'repl@127.0.0.1:3306',replication started in log 'FIRST' at position 4 Any chance you can rerun your steps and please share the exact duplicate key error that you are seeing? Also, how did you "initialize" master instance, was the bin log enabled when mysql_install_db used? If you are seeing "Last_SQL_Error: Error 'Duplicate entry '%-test-' for key 'PRIMARY'' on query. Default database: 'mysql'. Query: 'INSERT INTO db SELECT * FROM tmp_db WHERE @had_db_table=0;'" then it is expected behavior because then the duplicate key is valid because it is conflicting with existing records in slave's mysql.db. if you are seeing DUP key issue in real table i.e in test_jfg.t1, then could you please share exact steps, conf file etc? Btw, I'm not seeing the DUP key error with below steps both in real table and in mysql.db(i.e when log-bin disabled when initializing master with mysql_install_db): [root@cluster-repo mysql-advanced-5.6.19]# scripts/mysql_install_db --basedir=/data/ushastry/server/mysql-advanced-5.6.19 --datadir=/tmp/master [root@cluster-repo mysql-advanced-5.6.19]# scripts/mysql_install_db --basedir=/data/ushastry/server/mysql-advanced-5.6.19 --datadir=/tmp/slave ## relay-log-recovery is not enabled on slave initially [root@cluster-repo mysql-advanced-5.6.19]# [root@cluster-repo mysql-advanced-5.6.19]# bin/mysqld_safe --defaults-file=./73039M.cnf & [1] 31747 [root@cluster-repo mysql-advanced-5.6.19]# 140912 11:10:36 mysqld_safe Logging to '/tmp/master/cluster-repo.err'. 140912 11:10:37 mysqld_safe Starting mysqld daemon with databases from /tmp/master [root@cluster-repo mysql-advanced-5.6.19]# vi 73039S.cnf [root@cluster-repo mysql-advanced-5.6.19]# bin/mysqld_safe --defaults-file=./73039S.cnf & [2] 31967 [root@cluster-repo mysql-advanced-5.6.19]# 140912 11:11:11 mysqld_safe Logging to '/tmp/slave/cluster-repo.err'. 140912 11:11:11 mysqld_safe Starting mysqld daemon with databases from /tmp/slave #### Setup 2 new instances, setup replication, CHANGE MASTER TO without a MASTER_FILE and MASTER_POS Slave> mysql> CHANGE MASTER TO -> MASTER_HOST='127.0.0.1', -> MASTER_PORT=3306, -> MASTER_USER='repl', -> MASTER_PASSWORD='slavepass'; Query OK, 0 rows affected (0.03 sec) // Start provisioning on master Master> at << EOF | sudo bin/mysql -u root -S /tmp/master/mysql.sock CREATE DATABASE test_jfg; CREATE TABLE test_jfg.t1 ( userid BIGINT PRIMARY KEY, password BIGINT DEFAULT NULL); EOF seq -f "INSERT into test_jfg.t1 VALUES (%.0f, NULL);" 1 1000000 | sudo bin/mysql -u root -S /tmp/master/mysql.sock // mysql> START SLAVE IO_THREAD; Query OK, 0 rows affected (0.00 sec) ========================================== // Kill -9 slave, start slave with relay-log-recovery=1 2014-09-12 11:18:15 32421 [Note] /data/ushastry/server/mysql-advanced-5.6.19/bin/mysqld: ready for connections. Version: '5.6.19-enterprise-commercial-advanced' socket: '/tmp/slave/mysql.sock' port: 3307 MySQL Enterprise Server - Advanced Edition (Commercial) 2014-09-12 11:18:15 32421 [Warning] Slave SQL: If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0 2014-09-12 11:18:15 32421 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './slave-relay-bin.000001' position: 4 Repeated provisioning for number of times but no duplicate key error reported.. Thanks, Umesh
[11 Sep 2014 13:44]
Jean-François Gagné
Hi Umesh, this is how I reproduce this bug: 1. Setup 2 MySQL, one to be a master (with binlogs), the other to be a slave; 2. On the slave: "STOP SLAVE; DROP DATABASE IF EXISTS test_jfg; RESET SLAVE ALL;"; 3. On the master: "DROP DATABASE IF EXISTS test_jfg; RESET MASTER;"; 4. On the master: "CREATE DATABASE test_jfg;"; 5. On the slave: "CHANGE MASTER TO MASTER_HOST='<master_hostname>', MASTER_USER='<user>', MASTER_PASSWORD='<password>'; START SLAVE IO_THREAD;"; 6. On the slave: "pkill -9 mysqld"; 7. On the slave: start mysqld with relay_log_recovery = 1. And you will have an error in a "SHOW SLAVE STATUS": Last_SQL_Error: Error 'Can't create database 'test_jfg'; database exists' on query. Default database: 'test_jfg'. Query: 'CREATE DATABASE test_jfg' Test done with MySQL 5.6.20. JFG
[12 Sep 2014 8:04]
MySQL Verification Team
Thank you for the feedback, steps, we are unable to repeat as your reported. Please could provide conf details from both master/slave, all the relay logs for further investigations? I'm sharing the steps performed at our end for this in 73039.txt.
[12 Sep 2014 8:04]
MySQL Verification Team
test results
Attachment: 73039.txt (text/plain), 16.12 KiB.
[16 Sep 2014 19:16]
Sveta Smirnova
Jean-François, please send us output of SHOW SLAVE STATUS\G, taken before and after you `kill -9` slave server.
[18 Sep 2014 16:32]
Jean-François Gagné
Hi Umesh If I use exactly your commands, I do not reproduce the bug. But if I start the slave adding those options, I do: --master-info-repository=TABLE --relay-log-info-repository=TABLE Thanks, JFG
[19 Sep 2014 6:15]
MySQL Verification Team
Thank you Jean for the feedback, I'm able to reproduce the issue now. Inclusion of --master-info-repository=TABLE --relay-log-info-repository=TABLE did the trick and able to see the DDL in 2nd 4th relay logs(which triggered Error 'Can't create database 'test_jfg'; database exists' on query. Default database: 'test_jfg'. Query: 'CREATE DATABASE test_jfg') Thanks, Umesh
[19 Sep 2014 6:16]
MySQL Verification Team
test results
Attachment: 73039_modified.txt (text/plain), 17.51 KiB.
[27 Oct 2014 17:59]
David Moss
Thanks for your report, the following was added to the 5.6.22 and 5.7.6 release notes with commit 4526: When restarting MySQL with relay_log_recovery enabled to recover from a crash, if the SQL thread had never been started, the position from which to start recovery was not correctly initialized because Relay_Master_Log_File was missing. This fix ensures that in such a situation each of the relay logs, starting from the first relay log file, is searched for a rotate event from the master, which specifies where replication started from. This rotate event is then used to set the SQL thread's Relay_Master_Log_File and Relay_Log_Pos and recovery continues as normal.
[10 Dec 2014 14:43]
Laurynas Biveinis
$ bzr log -r 6204 ------------------------------------------------------------ revno: 6204 committer: Sujatha Sivakumar <sujatha.sivakumar@oracle.com> branch nick: Bug19021091_mysql-5.6 timestamp: Thu 2014-10-16 11:22:41 +0530 message: Bug#19021091: RELAY_LOG_RECOVERY FAILS WHEN CHANGE MASTER WITHOUT FILE AND SQL_THD NOT STARTED Problem: ======== If, on an empty database: - CHANGE MASTER TO without a MASTER_FILE and MASTER_POS is used - the IO_THREAD is started WITHOUT starting the SQL_THREAD - MySQL crashed - MySQL is restarted with relay_log_recovery = 1. Crash recovery will not work as expected: - The IO_THREAD position will NOT be initialized to the SQL_THREAD position, - SQL_THREAD position will NOT be initialized to the new relay log. It looks like, when the SQL_THREAD does not have a Relay_Master_Log_File, relay_log_recovery does not work. Analysis: ======== When relay-log-recovery is set existing relay logs are ignored and IO_THREAD positions will be initialized to SQL_THREAD's positions and SQL_THREAD positions will be set to newly generated relay log. The old relay logs will be purged by the normal purge mechanism. There can be a special case where sql thread has never started all, in such case SQL_THREAD's positions will not be initialized. Existing recovery mechanism is tightly coupled with Relay_Master_Log_File's value if the file name is not set recovery will not happen. Fix: === During recovery if Relay_Master_Log_File is not set open the relaylog index file. Read the relay logs from the first relay log parse each of these files and look for Rotate event from the master which will specify from where replication has started. Extract master_log_file and master_log_pos from master's rotate event and set it as SQL_THREAD's Relay_Master_Log_File and position. $ bzr log -r 6205 ------------------------------------------------------------ revno: 6205 committer: Sujatha Sivakumar <sujatha.sivakumar@oracle.com> branch nick: Bug19021091_mysql-5.6 timestamp: Thu 2014-10-16 14:06:37 +0530 message: Bug#19021091: RELAY_LOG_RECOVERY FAILS WHEN CHANGE MASTER WITHOUT FILE AND SQL_THD NOT STARTED Fixing a post push compilation issue.