Bug #77187 | START SLAVE no error if sql thread running + GLOBAL SQL_SLAVE_SKIP_COUNTER > 1 | ||
---|---|---|---|
Submitted: | 29 May 2015 6:37 | Modified: | 29 Jun 2015 14:53 |
Reporter: | Simon Mudd (OCA) | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.7.7rc, 5.7.8 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | 5.7, replication, sql_slave_skip_counter, start slave |
[29 May 2015 6:37]
Simon Mudd
[29 May 2015 6:56]
Simon Mudd
To rephrase what I said: This may break scripts which expect to get an error if the SQL thread is running if you try to start it. Now you get a warning which is not the same so you need to change the script's behaviour and 5.6 and 5.7 behaviour has changed.
[29 May 2015 7:26]
Simon Mudd
So to add some thoughts after talking with colleagues about this. If you have 1 channel: then current 5.6 behaviour makes sense. Try to maintain it. If you have > 1 replication channel then running start slave with GLOBAL SQL_SLAVE_SKIP_COUNTER > 1 is ambiguous, so probably should be disallowed. Probably you should be forced to specify the channel you wish to skip statements on. [ Even if you skip a statement on channels which are stopped, if you have more than 1 stopped channel, unless the behaviour is very clearly defined, this is like jumping into a minefield as it may not be clear which channel will be used. Basically don't allow that. ] I had a brief look at the 5.7 documentation regarding slave start and didn't notice any special comments regarding this type of scenario. So one on "error handling when multiple channels are in use" ... might be handy. Normally you don't want to skip statements, but sometimes this is necessary if for example replication delay is more important than slave consistency. So a section on what to do if replication breaks would be helpful for people who don't see this much and then think: "what am I going to do now?" Given the above I still think the new 5.7 behaviour is wrong, at least if a single channel is involved and documentation should be as clear as possible about this type of situation.
[29 May 2015 7:36]
MySQL Verification Team
Hello Simon, Thank you for the report. I'm not sure whether this is intended behavior, but 5.6 generate error where as 5.7 just warns and which contradicts manual quote - "This statement is valid only when the slave threads are not running. Otherwise, it produces an error." https://dev.mysql.com/doc/refman/5.7/en/set-global-sql-slave-skip-counter.html Thanks, Umesh
[29 May 2015 7:37]
MySQL Verification Team
/ 5.6.26 (Setup simple master-slave replication) // On running Slave slave> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE; ERROR 1198 (HY000): This operation cannot be performed with a running slave; run STOP SLAVE first Query OK, 0 rows affected, 1 warning (0.00 sec) slave> show warnings; +-------+------+--------------------------+ | Level | Code | Message | +-------+------+--------------------------+ | Note | 1254 | Slave is already running | +-------+------+--------------------------+ 1 row in set (0.00 sec) // 5.7.8 (Setup simple master-slave replication) // On running slave slave> SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE; Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected, 1 warning (0.00 sec) slave> show warnings; +-------+------+------------------------------------------------------------+ | Level | Code | Message | +-------+------+------------------------------------------------------------+ | Note | 3083 | Replication thread(s) for channel '' are already runnning. | +-------+------+------------------------------------------------------------+ 1 row in set (0.01 sec)
[29 Jun 2015 14:12]
Sven Sandberg
Posted by developer: Thank you for highlighting this issue. However, this is expected behavior. We changed it in 5.7 so that: - sql_slave_skip_counter behaves like most other replication slave variables. - sql_slave_skip_counter works with multi-source replication. The logic is now described in https://dev.mysql.com/doc/refman/5.7/en/replication-options-slave.html#sysvar_sql_slave_sk... : "Setting the option has no immediate effect. The variable applies to the next START SLAVE command; the next START SLAVE command will also change the value back to 0. When this variable is set to a non-zero value and there are multiple replication channels configured, the START SLAVE command can only be used with the FOR CHANNEL channel clause." We are sorry this broke your script. However, you can fix it as follows: instead of checking if SET GLOBAL SQL_SLAVE_SKIP_COUNTER returns an error, check if START SLAVE returns a warning.
[29 Jun 2015 14:53]
Simon Mudd
Don't worry about breaking my script. That wasn't really the point. It was the change in behaviour in a major version release without clear documentation indicating the behavioural change and what the DBA could expect from the MySQL 5.7+ server. You have addressed both those issues so I'll go and look at adjusting my "convenient 1-liner script" to be a bit smarter and both 5.6 and 5.7+ aware.