Bug #78758 Inserting (invalid) utf8 data into 5.6.25 master breaks 5.7 slave
Submitted: 8 Oct 2015 13:01 Modified: 1 Jul 2016 10:41
Reporter: Manuel Arostegui Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.7.9 OS:CentOS
Assigned to: CPU Architecture:Any

[8 Oct 2015 13:01] Manuel Arostegui
Description:
Table definition

CREATE TABLE `t1` (
  `id` int(10) unsigned NOT NULL,
  `name` varchar(255) NOT NULL,
  `ids` int(10) unsigned DEFAULT NULL,
  `date1` date DEFAULT NULL,
  `date2` date DEFAULT NULL,
  `period` tinyint(3) unsigned DEFAULT NULL,
  `archived` datetime DEFAULT NULL,
  `value` mediumblob,
  PRIMARY KEY (`id`,`name`),
  KEY `index_period_archived` (`period`,`archived`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

When trying to insert the following via LOAD DATA on the replication thread we get broken replication:

Error 'Invalid utf8 character string: '"x'' on query. Default database: 'db'. Query: 'LOAD DATA INFILE '../tmp/SQL_LOAD-43156f1f-65c0-11e5-baa0-f0921c000d58-186068095-2.data' REPLACE INTO  TABLE `t1` FIELDS TERMINATED BY '\t' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\r\n' (`id`, `ids`, `date1`, `date2`, `period`, `archived`, `name`, `value`)'

What we have in the master is:

*************************** 1. row ***************************
         id: 83
       name: Actions_actions
        ids: 1
      date1: 2015-10-07
      date2: 2015-10-07
     period: 1
   archived: 2015-10-08 07:09:42
      value: x��[n�

Note: the rest of "value" data has been truncated for the bug report.

SQL mode in both servers is set to:
+---------------+--------------------------------------------+
| Variable_name | Value                                      |
+---------------+--------------------------------------------+
| sql_mode      | STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION |
+---------------+--------------------------------------------+
1 row in set (0.00 sec)

In the 5.6.24 server the statement is accepted and inserted and when it gets replicated to 5.7.9 it isn't and breaks.

How to repeat:
Get a 5.6 master and attach a 5.7 slave.
Insert invalid utf8 data into an utf8 table.

Suggested fix:
5.7 should be either aware that the upstream is 5.6 and be lenient or 5.6 should not accept invalid data.
[9 Oct 2015 7:14] MySQL Verification Team
Hello Manuel Arostegui,

Thank you for the report.
I could not repeat this issue with 5.6.27->5.7(5.6.27->5.7.8/9/10) using the dummy data. Could you please provide load file used in your test case along with configuration file from master/slave? You may mark as private after uploading requested data.

Thanks,
Umesh
[9 Oct 2015 7:53] Manuel Arostegui
Information given and marked as private.
Thank you!
[9 Oct 2015 7:56] MySQL Verification Team
Thank you for the conf file, I'll be waiting for the load file from your end.

regards,
Umesh
[12 Oct 2015 7:10] MySQL Verification Team
When they upload their data,  try to use the binary charset by specifying CHARACTER SET clause for LOAD DATA INFILE.
https://dev.mysql.com/doc/refman/5.7/en/load-data.html

"If the contents of the input file use a character set that differs from the default, it is usually preferable to specify the character set of the file by using the CHARACTER SET clause. A character set of binary specifies “no conversion.”"
[14 Oct 2015 7:01] Manuel Arostegui
I have attached the file
[15 Oct 2015 6:17] MySQL Verification Team
Thank you for the requested data.
Confirmed that 5.7.9 slave(with master 5.6.24, 5.6.25 and 5.6.27) is affected.
But as Shane suggested, if CHARACTER SET binary is used with the LOAD statement, data is replicated properly and not breaking replication(pls see the attached file details).

Thanks,
Umesh
[15 Oct 2015 6:18] MySQL Verification Team
test results

Attachment: 78758.results (application/octet-stream, text), 32.21 KiB.

[16 Oct 2015 8:18] Manuel Arostegui
Thanks for confirming the bug and for providing the workaround. 
At this point we cannot go for the workaround for some reasons. 

The main point is to make sure we have consistency when inserting (in)valid data into 5.6 master that has a 5.7 slave. 
If 5.6 accepts that invalid data, 5.7 should do as well.

Thank you
[5 Nov 2015 3:38] Libing Song
Posted by developer:
 
A simplified test case for it, replication process has no problem, it happens just because 5.7 server considers the data is invalid.:

SET SESSION character_set_database = utf8;

CREATE TABLE `t1` (
  `id` int(10) unsigned NOT NULL,
  `name` varchar(255) charset utf8 NOT NULL,
  `ids` int(10) unsigned DEFAULT NULL,
  `date1` date DEFAULT NULL,
  `date2` date DEFAULT NULL,
  `period` tinyint(3) unsigned DEFAULT NULL,
  `archived` datetime DEFAULT NULL,
  `value` VARCHAR(1024) CHARSET utf8,
  PRIMARY KEY (`id`,`name`),
  KEY `index_period_archived` (`period`,`archived`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

LOAD DATA INFILE '../../std_data/1.data' REPLACE INTO  TABLE `t1` FIELDS TERMINATED BY '\t' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\r\n' (`id`, `ids`, `date1`, `date2`, `period`, `archived`, `name`, `value`);
[1 Jul 2016 10:41] Erlend Dahl
This has now been fixed in 5.5.51 and 5.6.32.
[28 Aug 2016 3:03] Fakhruddin Hamid
I observed the same issue in 5.6.32
Our setup
Server - 5.6.31
Slave - 5.6.32 (failed for load data file with invalid utf8 char)
other slave - 5.6.31, 5.6.25 (passed without errors)

One observation:
If I ran the query manually from the failed replication, it goes smooth and inserts all records but same query fails as part of replication.

For now I have downgraded my mysql version to 5.6.31 on affected slave.
[28 Aug 2016 3:07] Fakhruddin Hamid
Error 'Invalid utf8 character string: '"Lesly N'' on query. Default database: '********'. Query: 'LOAD DATA INFILE '/data/tmp/SQL_LOAD-ad1949c5-a002-11e4-9b98-22000a128c33-2-12594.data' IGNORE INTO  TABLE `**********` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`Customer_Email`, `Customer_UUID`, `Customer_Name`, `Sub_Plan_Items`, `Product_SKU`, `Return_Notified_Date`, `Cancellation_Date`, `Day_Left`, `Valid_Till_Date`, `FT1`, `FT7`, `FT10`, `Last_GoodBye_Status`, `Last_GoodBye_Date`, `Last_GoodBye_Balance`)'

Replication failed with above error, but I ran the same query as below it has no problem.

mysql> use <database>
mysql> LOAD DATA INFILE '/data/tmp/SQL_LOAD-ad1949c5-a002-11e4-9b98-22000a128c33-2-12594.data' IGNORE INTO  TABLE `**********` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`Customer_Email`, `Customer_UUID`, `Customer_Name`, `Sub_Plan_Items`, `Product_SKU`, `Return_Notified_Date`, `Cancellation_Date`, `Day_Left`, `Valid_Till_Date`, `FT1`, `FT7`, `FT10`, `Last_GoodBye_Status`, `Last_GoodBye_Date`, `Last_GoodBye_Balance`);
[28 Aug 2016 5:29] Venkatesh Duggirala
Post by the Developer:
======================

Hello Fakhruddin,

The original bug is happening while Master(5.6) is inserting the invalid Utf8 data. Since 5.7's behavior is to reject the invalid data, replication breaks.
As the reporter suggested above and after we internally decided, we made 5.6 (starting from 5.6.32) also behaves the same way as 5.7 i.e., reject the invalid Utf8 data. Now 5.6.32->5.7 replication, on Master itself the invalid
data will be rejected.

In your case, it seems when you are executing LOAD DATA directly on the slave (5.6.32), you are saying it is working  fine. That means it is *not* invalid utf8 data but if it is coming through replication channel, it is becoming invalid utf8. Is my understand correct?   If so, it is looks different bug than the original reported issue of this bug.

Could you please open another bug with all the information to reproduce the issue. 

Thanks & Regards,
Venkatesh.
[28 Aug 2016 6:06] Fakhruddin Hamid
Hi Venkatesh,

Yes your understanding is correct, but the error says that there is "invalid utf8 char"

Let me file a new bug for this.

-Fakhruddin Hamid