Bug #88373 Renaming a column breaks replication from 5.7 to 8.0 because of impl. collation
Submitted: 6 Nov 2017 9:39 Modified: 29 Mar 2018 9:51
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:8.0.3 OS:Any
Assigned to: CPU Architecture:Any

[6 Nov 2017 9:39] Daniël van Eeden
Description:
Setup:
 - Master running 5.7.18
 - Slave running 8.0.3

Table (as shown on 5.7 by show create table):
CREATE TABLE `foobar` (
 `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
 `foo` varchar(100) CHARACTER SET utf8mb4 NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id_foo` (`foo`),
) ENGINE=InnoDB AUTO_INCREMENT=12345 DEFAULT CHARSET=latin1;

Then run:
ALTER TABLE `foobar` CHANGE `foo` `old_foo` varchar(100) CHARACTER SET utf8mb4 NOT NULL;

Expected result:
 - A renamed column

Actual result:
 - Replication broken with duplicate key error

Note:
 - 8.0 has ALTER TABLE .. RENAME COLUMN, but as the master runs 5.7 that's not available yet.

mysql80> show collation where `Default`='Yes' and Charset='utf8mb4';
+--------------------+---------+-----+---------+----------+---------+---------------+
| Collation          | Charset | Id  | Default | Compiled | Sortlen | Pad_attribute |
+--------------------+---------+-----+---------+----------+---------+---------------+
| utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes     | Yes      |       0 | NO PAD        |
+--------------------+---------+-----+---------+----------+---------+---------------+
1 row in set (0.00 sec)

myql57> show collation where `Default`='Yes' and Charset='utf8mb4';
+--------------------+---------+----+---------+----------+---------+
| Collation          | Charset | Id | Default | Compiled | Sortlen |
+--------------------+---------+----+---------+----------+---------+
| utf8mb4_general_ci | utf8mb4 | 45 | Yes     | Yes      |       1 |
+--------------------+---------+----+---------+----------+---------+
1 row in set (0.00 sec)

The show create table output on 8.0 shows utf8mb4_general_ci is used on this column. That's the 5.7 default (so hidden from show create table output), but is not the default on 8.0.

As no collation is specified the 8.0.3 machine applies the default (utf8mb4_0900_ai_ci) which is different to the upstream master.

The 'fix' is to use:
- SET GLOBAL collation_server = 'utf8mb4_general_ci';
- SET PERSIST collation_server = 'utf8mb4_general_ci';

It would be nice if this would work out of the box.
Also this should be mentioned on https://dev.mysql.com/doc/refman/8.0/en/replication-upgrade.html

How to repeat:
See description.

Suggested fix:
1. Put the default collation in the binlogs for alter statements.
2. Always show the collation even if it is the default (helps to create a MODIFY COLUMN statement without side-effects)
3. Detect that the statement comes from a 5.7 master and apply 5.7 defaults.
4. Make the default
[6 Nov 2017 12:41] MySQL Verification Team
Hello Daniël,

Thank you for the report and feedback.
I tried to reproduce issue using 5.7.18/5.7.20(master - non-gtid with default settings) -> 8.0.3(slave - non-gtid with default settings) with provided table, later by altering column on master(before that added few rows in order to break - I followed to see if it helps in reproducing duplicate issues http://mysqlserverteam.com/mysql-8-0-collations-the-devil-is-in-the-details/).

Could you please provide exact data(subset of data in INSERT.. format) which could help me trigger this issue at my end? Thank you!

Thanks,
Umesh
[6 Nov 2017 15:01] Daniël van Eeden
looks like my fix doesn't really work..

I can't share the real data but it looked similar to this:
foobar@example.com (0x666F6F626172406578616D706C652E636F6D)
foobar@example.comً  (0x666F6F626172406578616D706C652E636F6DD98B)
[6 Nov 2017 15:11] Daniël van Eeden
Also related to:
Bug #80792 	ALTER TABLE should support ALGORITHM=METADATA

(this is a metadata only change on the 5.7 master, but not on the 8.0 slave)
[7 Nov 2017 6:34] MySQL Verification Team
For point (2):

https://bugs.mysql.com/bug.php?id=46239 
(mysql doesn't show what collation was used if that collation is the default)
[7 Nov 2017 9:08] MySQL Verification Team
Thank you, finally observed this at my end.

i) 5.7.18(master) -> 8.0.3(slave) - I'm not seeing any issues i.e ALTER happily replicates etc no issues.
ii) 5.7.18(master) -> 5.7.18(slave), upgrade slave to 8.0.3 and then ALTER on master triggers broken replication

              Last_SQL_Error: Error 'Duplicate entry 'foobar@example.comً' for key 'id_foo'' on query. Default database: 'test'. Query: 'ALTER TABLE `foobar` CHANGE `foo` `old_foo` varchar(100) CHARACTER SET utf8mb4 NOT NULL'

Thanks,
Umesh
[7 Nov 2017 9:22] MySQL Verification Team
test results

Attachment: 88373_5.7.18_8.0.3.results (application/octet-stream, text), 18.45 KiB.

[15 Dec 2017 12:17] Pedro Figueiredo
Posted by developer:
 
The described behaviour is also identified when applying above provided steps to a single server.

Steps to reproduce
------------------
1) Run mysqld with version 5.7
2) Create a database
	
    CREATE DATABASE collation_test;
    USE collation_test;

3) Create table

    CREATE TABLE `foobar` (
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `foo` varchar(100) CHARACTER SET utf8mb4 NOT NULL,
    PRIMARY KEY (`id`),
    UNIQUE KEY `id_foo` (`foo`)
    ) ENGINE=InnoDB AUTO_INCREMENT=12345 DEFAULT CHARSET=latin1;

4) Insert provided data

    INSERT INTO foobar (foo) 
           VALUES (0x666F6F626172406578616D706C652E636F6D);
    INSERT INTO foobar (foo) 
           VALUES (0x666F6F626172406578616D706C652E636F6DD98B);
 
5) Upgrade mysqld to version 8.0, attaching the 8.0 mysqld to 
   the existent setup and running 'mysql_upgrade' for that 
   instance, e.g.

   $ mysql_upgrade -u root --socket=/tmp/mysqld.sock

8) Change the column with the statement

    ALTER TABLE `foobar` CHANGE `foo` `old_foo` varchar(100) 
    CHARACTER SET utf8mb4 NOT NULL;

Expected result
---------------
The expected result would be the changed row (behaviour verified
with no upgrade):

    mysql> ALTER TABLE `foobar` CHANGE `foo` `old_foo` varchar(100)
           CHARACTER SET utf8mb4 NOT NULL;
    Query OK, 0 rows affected (0.02 sec)

Obtained result
---------------
The actual result is an error due to duplicate keys:

    mysql> ALTER TABLE `foobar` CHANGE `foo` `old_foo` varchar(100)
           CHARACTER SET utf8mb4 NOT NULL;                 
    ERROR 1062 (23000): Duplicate entry 'foobar@example.comً' for key 'id_foo'
[9 Jan 2018 20:52] Sven Sandberg
Posted by developer:
 
This affects:
- column collation in CREATE TABLE and ALTER TABLE,
- table collation in CREATE TABLE and ALTER TABLE,
- database collation in CREATE DATABASE and ALTER DATABASE,
- string literal collation, which may appear in any SQL statement that can
  contain expressions.

Run the following test case in 5.7 and 8.0 to see the difference:

--echo # Output from SHOW
SHOW COLLATION WHERE `Default` = 'Yes' AND `Charset` = 'utf8mb4';
SHOW CHARACTER SET LIKE 'utf8mb4';

--echo # Column character set
CREATE TABLE t1 (x VARCHAR(100) CHARACTER SET utf8mb4);
SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 't1';
ALTER TABLE t1 CHANGE x x VARCHAR(100) CHARACTER SET utf8mb4;
SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 't1';
ALTER TABLE t1 MODIFY x VARCHAR(100) CHARACTER SET utf8mb4;
SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 't1';

--echo # Table character set
CREATE TABLE t2 (x INT) CHARACTER SET utf8mb4;
SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 't2';
ALTER TABLE t2 CHARACTER SET utf8mb4;
SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 't2';

--echo # Database character set
CREATE DATABASE d CHARACTER SET utf8mb4;
SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME = 'd';
ALTER DATABASE d CHARACTER SET utf8mb4;
SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME = 'd';

--echo # String literal character set
CREATE TABLE t3 (a INT);
# The special character is 0xD98B. it equals ' ' in the utf8mb4_general_ci
# collation but is different from ' ' in the utf8mb4_0900_ai_ci collation.
INSERT INTO t3 SELECT _utf8mb4'ً' = _utf8mb4' ';
SELECT * FROM t3;
[9 Jan 2018 21:12] Sven Sandberg
Posted by developer:
 
In the test case, note that it applies also to normal INSERT, not just INSERT ... SELECT:

INSERT INTO t3 VALUES (_utf8mb4'ً' = _utf8mb4' ');

(Apparently bugdb masks the special characters. Edit this in a hexeditor and replace the question marks by 0xD98B).
[29 Mar 2018 9:51] Margaret Fisher
Posted by developer:
 
Added this into the changelog entry for Bug#27389878, which now reads:

A new system variable, default_collation_for_utf8mb48, enables
setting the default collation for the utf8mb4 character set. This
variable is primarily intended to support replication from a MySQL
5.7 or older master server to a MySQL 8.0 slave server, or group
replication with a MySQL 5.7 primary node and one or more MySQL 8.0
secondaries. The value of the variable is replicated to the
slave so that the slave can correctly process data originating
from a master with a different default collation for
utf8mb4. It may be helpful because the default collation for
utf8mb4 in MySQL 5.7 is utf8mb4_general_ci but utf8mb4_0900_ai_ci in
MySQL 8.0.

The variable value determines the default utf8mb4 collation for the
following statements:
* SHOW COLLATION and SHOW CHARACTER SET.
* CREATE TABLE and ALTER TABLE having a CHARACTER SET utf8mb4 clause
  without a COLLATION clause, either for the table character set or for
  a column character set.
* CREATE DATABASE and ALTER DATABASE having a CHARACTER SET utf8mb4
  clause without a COLLATION clause.
* Any statement containing a string literal of the form _utf8mb4'some
  text' without a COLLATE clause.

- Also added to
https://dev.mysql.com/doc/refman/8.0/en/replication-upgrade.html
The change to the default collation for the utf8mb4 character set can also result in replication failure on an upgraded slave. The default collation for utf8mb4 in MySQL 5.7 is utf8mb4_general_ci, but it is utf8mb4_0900_ai_ci in MySQL 8.0. This can result in duplicate key errors for replicated CREATE TABLE, ALTER TABLE, CREATE DATABASE, and ALTER DATABASE statements. To prevent this issue, on the master, specify the default_collation_for_utf8mb4 system variable to name the default collation that the master uses for utf8mb4. The value of this system variable is replicated to the slave, and the slave uses it to identify the master's default collation for utf8mb4 and therefore to process the replicated data correctly.
[3 Apr 2018 11:41] Simon Mudd
Can you please confirm in which version of 8.0 this is fixed? Given this is reported in 8.0.3 it's good if you mention the version where we can use (and set) this variable if needed.

Thanks.
[15 May 2018 12:43] Margaret Fisher
Posted by developer:
 
The new system variable is added in 8.0.11. However, following discussions during implementation, the system variable has actually now been implemented for internal use and does not need to be set by the user. The new description reads:

For internal use by replication. This system variable is set to the default
collation for the utf8mb4 character set. The value of the variable is
replicated from a master to a slave so that the slave can correctly process
data originating from a master with a different default collation for
utf8mb4. The variable is primarily intended to support replication from a
MySQL 5.7 or older master server to a MySQL 8.0 slave server, or group
replication with a MySQL 5.7 primary node and one or more MySQL 8.0
secondaries. The default collation for utf8mb4 in MySQL 5.7 is
utf8mb4_general_ci but utf8mb4_0900_ai_ci in MySQL 8.0. The variable is not
present in releases earlier than MySQL 8.0, so if the slave does not receive
a value for the variable, it assumes the master is from an earlier release
and sets the value to the previous default collation utf8mb4_general_ci.

- So I've removed this from the upgrade docs as the user doesn't need to specify it, it should "just work" and remove the duplicate key errors.