Bug #88373 | Renaming a column breaks replication from 5.7 to 8.0 because of impl. collation | ||
---|---|---|---|
Submitted: | 6 Nov 2017 9:39 | Modified: | 29 Mar 2018 9:51 |
Reporter: | Daniël van Eeden (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 8.0.3 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[6 Nov 2017 9:39]
Daniël van Eeden
[6 Nov 2017 12:41]
MySQL Verification Team
Hello Daniël, Thank you for the report and feedback. I tried to reproduce issue using 5.7.18/5.7.20(master - non-gtid with default settings) -> 8.0.3(slave - non-gtid with default settings) with provided table, later by altering column on master(before that added few rows in order to break - I followed to see if it helps in reproducing duplicate issues http://mysqlserverteam.com/mysql-8-0-collations-the-devil-is-in-the-details/). Could you please provide exact data(subset of data in INSERT.. format) which could help me trigger this issue at my end? Thank you! Thanks, Umesh
[6 Nov 2017 15:01]
Daniël van Eeden
looks like my fix doesn't really work.. I can't share the real data but it looked similar to this: foobar@example.com (0x666F6F626172406578616D706C652E636F6D) foobar@example.comً (0x666F6F626172406578616D706C652E636F6DD98B)
[6 Nov 2017 15:11]
Daniël van Eeden
Also related to: Bug #80792 ALTER TABLE should support ALGORITHM=METADATA (this is a metadata only change on the 5.7 master, but not on the 8.0 slave)
[7 Nov 2017 6:34]
MySQL Verification Team
For point (2): https://bugs.mysql.com/bug.php?id=46239 (mysql doesn't show what collation was used if that collation is the default)
[7 Nov 2017 9:08]
MySQL Verification Team
Thank you, finally observed this at my end. i) 5.7.18(master) -> 8.0.3(slave) - I'm not seeing any issues i.e ALTER happily replicates etc no issues. ii) 5.7.18(master) -> 5.7.18(slave), upgrade slave to 8.0.3 and then ALTER on master triggers broken replication Last_SQL_Error: Error 'Duplicate entry 'foobar@example.comً' for key 'id_foo'' on query. Default database: 'test'. Query: 'ALTER TABLE `foobar` CHANGE `foo` `old_foo` varchar(100) CHARACTER SET utf8mb4 NOT NULL' Thanks, Umesh
[7 Nov 2017 9:22]
MySQL Verification Team
test results
Attachment: 88373_5.7.18_8.0.3.results (application/octet-stream, text), 18.45 KiB.
[15 Dec 2017 12:17]
Pedro Figueiredo
Posted by developer: The described behaviour is also identified when applying above provided steps to a single server. Steps to reproduce ------------------ 1) Run mysqld with version 5.7 2) Create a database CREATE DATABASE collation_test; USE collation_test; 3) Create table CREATE TABLE `foobar` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `foo` varchar(100) CHARACTER SET utf8mb4 NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `id_foo` (`foo`) ) ENGINE=InnoDB AUTO_INCREMENT=12345 DEFAULT CHARSET=latin1; 4) Insert provided data INSERT INTO foobar (foo) VALUES (0x666F6F626172406578616D706C652E636F6D); INSERT INTO foobar (foo) VALUES (0x666F6F626172406578616D706C652E636F6DD98B); 5) Upgrade mysqld to version 8.0, attaching the 8.0 mysqld to the existent setup and running 'mysql_upgrade' for that instance, e.g. $ mysql_upgrade -u root --socket=/tmp/mysqld.sock 8) Change the column with the statement ALTER TABLE `foobar` CHANGE `foo` `old_foo` varchar(100) CHARACTER SET utf8mb4 NOT NULL; Expected result --------------- The expected result would be the changed row (behaviour verified with no upgrade): mysql> ALTER TABLE `foobar` CHANGE `foo` `old_foo` varchar(100) CHARACTER SET utf8mb4 NOT NULL; Query OK, 0 rows affected (0.02 sec) Obtained result --------------- The actual result is an error due to duplicate keys: mysql> ALTER TABLE `foobar` CHANGE `foo` `old_foo` varchar(100) CHARACTER SET utf8mb4 NOT NULL; ERROR 1062 (23000): Duplicate entry 'foobar@example.comً' for key 'id_foo'
[9 Jan 2018 20:52]
Sven Sandberg
Posted by developer: This affects: - column collation in CREATE TABLE and ALTER TABLE, - table collation in CREATE TABLE and ALTER TABLE, - database collation in CREATE DATABASE and ALTER DATABASE, - string literal collation, which may appear in any SQL statement that can contain expressions. Run the following test case in 5.7 and 8.0 to see the difference: --echo # Output from SHOW SHOW COLLATION WHERE `Default` = 'Yes' AND `Charset` = 'utf8mb4'; SHOW CHARACTER SET LIKE 'utf8mb4'; --echo # Column character set CREATE TABLE t1 (x VARCHAR(100) CHARACTER SET utf8mb4); SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 't1'; ALTER TABLE t1 CHANGE x x VARCHAR(100) CHARACTER SET utf8mb4; SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 't1'; ALTER TABLE t1 MODIFY x VARCHAR(100) CHARACTER SET utf8mb4; SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 't1'; --echo # Table character set CREATE TABLE t2 (x INT) CHARACTER SET utf8mb4; SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 't2'; ALTER TABLE t2 CHARACTER SET utf8mb4; SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 't2'; --echo # Database character set CREATE DATABASE d CHARACTER SET utf8mb4; SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME = 'd'; ALTER DATABASE d CHARACTER SET utf8mb4; SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME = 'd'; --echo # String literal character set CREATE TABLE t3 (a INT); # The special character is 0xD98B. it equals ' ' in the utf8mb4_general_ci # collation but is different from ' ' in the utf8mb4_0900_ai_ci collation. INSERT INTO t3 SELECT _utf8mb4'ً' = _utf8mb4' '; SELECT * FROM t3;
[9 Jan 2018 21:12]
Sven Sandberg
Posted by developer: In the test case, note that it applies also to normal INSERT, not just INSERT ... SELECT: INSERT INTO t3 VALUES (_utf8mb4'ً' = _utf8mb4' '); (Apparently bugdb masks the special characters. Edit this in a hexeditor and replace the question marks by 0xD98B).
[29 Mar 2018 9:51]
Margaret Fisher
Posted by developer: Added this into the changelog entry for Bug#27389878, which now reads: A new system variable, default_collation_for_utf8mb48, enables setting the default collation for the utf8mb4 character set. This variable is primarily intended to support replication from a MySQL 5.7 or older master server to a MySQL 8.0 slave server, or group replication with a MySQL 5.7 primary node and one or more MySQL 8.0 secondaries. The value of the variable is replicated to the slave so that the slave can correctly process data originating from a master with a different default collation for utf8mb4. It may be helpful because the default collation for utf8mb4 in MySQL 5.7 is utf8mb4_general_ci but utf8mb4_0900_ai_ci in MySQL 8.0. The variable value determines the default utf8mb4 collation for the following statements: * SHOW COLLATION and SHOW CHARACTER SET. * CREATE TABLE and ALTER TABLE having a CHARACTER SET utf8mb4 clause without a COLLATION clause, either for the table character set or for a column character set. * CREATE DATABASE and ALTER DATABASE having a CHARACTER SET utf8mb4 clause without a COLLATION clause. * Any statement containing a string literal of the form _utf8mb4'some text' without a COLLATE clause. - Also added to https://dev.mysql.com/doc/refman/8.0/en/replication-upgrade.html The change to the default collation for the utf8mb4 character set can also result in replication failure on an upgraded slave. The default collation for utf8mb4 in MySQL 5.7 is utf8mb4_general_ci, but it is utf8mb4_0900_ai_ci in MySQL 8.0. This can result in duplicate key errors for replicated CREATE TABLE, ALTER TABLE, CREATE DATABASE, and ALTER DATABASE statements. To prevent this issue, on the master, specify the default_collation_for_utf8mb4 system variable to name the default collation that the master uses for utf8mb4. The value of this system variable is replicated to the slave, and the slave uses it to identify the master's default collation for utf8mb4 and therefore to process the replicated data correctly.
[3 Apr 2018 11:41]
Simon Mudd
Can you please confirm in which version of 8.0 this is fixed? Given this is reported in 8.0.3 it's good if you mention the version where we can use (and set) this variable if needed. Thanks.
[15 May 2018 12:43]
Margaret Fisher
Posted by developer: The new system variable is added in 8.0.11. However, following discussions during implementation, the system variable has actually now been implemented for internal use and does not need to be set by the user. The new description reads: For internal use by replication. This system variable is set to the default collation for the utf8mb4 character set. The value of the variable is replicated from a master to a slave so that the slave can correctly process data originating from a master with a different default collation for utf8mb4. The variable is primarily intended to support replication from a MySQL 5.7 or older master server to a MySQL 8.0 slave server, or group replication with a MySQL 5.7 primary node and one or more MySQL 8.0 secondaries. The default collation for utf8mb4 in MySQL 5.7 is utf8mb4_general_ci but utf8mb4_0900_ai_ci in MySQL 8.0. The variable is not present in releases earlier than MySQL 8.0, so if the slave does not receive a value for the variable, it assumes the master is from an earlier release and sets the value to the previous default collation utf8mb4_general_ci. - So I've removed this from the upgrade docs as the user doesn't need to specify it, it should "just work" and remove the duplicate key errors.