| Bug #78758 | Inserting (invalid) utf8 data into 5.6.25 master breaks 5.7 slave | ||
|---|---|---|---|
| Submitted: | 8 Oct 2015 13:01 | Modified: | 1 Jul 2016 10:41 | 
| Reporter: | Manuel Arostegui | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S2 (Serious) | 
| Version: | 5.7.9 | OS: | CentOS | 
| Assigned to: | CPU Architecture: | Any | |
   [9 Oct 2015 7:14]
   MySQL Verification Team        
  Hello Manuel Arostegui, Thank you for the report. I could not repeat this issue with 5.6.27->5.7(5.6.27->5.7.8/9/10) using the dummy data. Could you please provide load file used in your test case along with configuration file from master/slave? You may mark as private after uploading requested data. Thanks, Umesh
   [9 Oct 2015 7:53]
   Manuel Arostegui        
  Information given and marked as private. Thank you!
   [9 Oct 2015 7:56]
   MySQL Verification Team        
  Thank you for the conf file, I'll be waiting for the load file from your end. regards, Umesh
   [12 Oct 2015 7:10]
   MySQL Verification Team        
  When they upload their data, try to use the binary charset by specifying CHARACTER SET clause for LOAD DATA INFILE. https://dev.mysql.com/doc/refman/5.7/en/load-data.html "If the contents of the input file use a character set that differs from the default, it is usually preferable to specify the character set of the file by using the CHARACTER SET clause. A character set of binary specifies “no conversion.”"
   [14 Oct 2015 7:01]
   Manuel Arostegui        
  I have attached the file
   [15 Oct 2015 6:17]
   MySQL Verification Team        
  Thank you for the requested data. Confirmed that 5.7.9 slave(with master 5.6.24, 5.6.25 and 5.6.27) is affected. But as Shane suggested, if CHARACTER SET binary is used with the LOAD statement, data is replicated properly and not breaking replication(pls see the attached file details). Thanks, Umesh
   [15 Oct 2015 6:18]
   MySQL Verification Team        
  test results
Attachment: 78758.results (application/octet-stream, text), 32.21 KiB.
   [16 Oct 2015 8:18]
   Manuel Arostegui        
  Thanks for confirming the bug and for providing the workaround. At this point we cannot go for the workaround for some reasons. The main point is to make sure we have consistency when inserting (in)valid data into 5.6 master that has a 5.7 slave. If 5.6 accepts that invalid data, 5.7 should do as well. Thank you
   [5 Nov 2015 3:38]
   Libing Song        
  Posted by developer: A simplified test case for it, replication process has no problem, it happens just because 5.7 server considers the data is invalid.: SET SESSION character_set_database = utf8; CREATE TABLE `t1` ( `id` int(10) unsigned NOT NULL, `name` varchar(255) charset utf8 NOT NULL, `ids` int(10) unsigned DEFAULT NULL, `date1` date DEFAULT NULL, `date2` date DEFAULT NULL, `period` tinyint(3) unsigned DEFAULT NULL, `archived` datetime DEFAULT NULL, `value` VARCHAR(1024) CHARSET utf8, PRIMARY KEY (`id`,`name`), KEY `index_period_archived` (`period`,`archived`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; LOAD DATA INFILE '../../std_data/1.data' REPLACE INTO TABLE `t1` FIELDS TERMINATED BY '\t' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\r\n' (`id`, `ids`, `date1`, `date2`, `period`, `archived`, `name`, `value`);
   [1 Jul 2016 10:41]
   Erlend Dahl        
  This has now been fixed in 5.5.51 and 5.6.32.
   [28 Aug 2016 3:03]
   Fakhruddin Hamid        
  I observed the same issue in 5.6.32 Our setup Server - 5.6.31 Slave - 5.6.32 (failed for load data file with invalid utf8 char) other slave - 5.6.31, 5.6.25 (passed without errors) One observation: If I ran the query manually from the failed replication, it goes smooth and inserts all records but same query fails as part of replication. For now I have downgraded my mysql version to 5.6.31 on affected slave.
   [28 Aug 2016 3:07]
   Fakhruddin Hamid        
  Error 'Invalid utf8 character string: '"Lesly N'' on query. Default database: '********'. Query: 'LOAD DATA INFILE '/data/tmp/SQL_LOAD-ad1949c5-a002-11e4-9b98-22000a128c33-2-12594.data' IGNORE INTO TABLE `**********` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`Customer_Email`, `Customer_UUID`, `Customer_Name`, `Sub_Plan_Items`, `Product_SKU`, `Return_Notified_Date`, `Cancellation_Date`, `Day_Left`, `Valid_Till_Date`, `FT1`, `FT7`, `FT10`, `Last_GoodBye_Status`, `Last_GoodBye_Date`, `Last_GoodBye_Balance`)' Replication failed with above error, but I ran the same query as below it has no problem. mysql> use <database> mysql> LOAD DATA INFILE '/data/tmp/SQL_LOAD-ad1949c5-a002-11e4-9b98-22000a128c33-2-12594.data' IGNORE INTO TABLE `**********` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`Customer_Email`, `Customer_UUID`, `Customer_Name`, `Sub_Plan_Items`, `Product_SKU`, `Return_Notified_Date`, `Cancellation_Date`, `Day_Left`, `Valid_Till_Date`, `FT1`, `FT7`, `FT10`, `Last_GoodBye_Status`, `Last_GoodBye_Date`, `Last_GoodBye_Balance`);
   [28 Aug 2016 5:29]
   Venkatesh Duggirala        
  Post by the Developer: ====================== Hello Fakhruddin, The original bug is happening while Master(5.6) is inserting the invalid Utf8 data. Since 5.7's behavior is to reject the invalid data, replication breaks. As the reporter suggested above and after we internally decided, we made 5.6 (starting from 5.6.32) also behaves the same way as 5.7 i.e., reject the invalid Utf8 data. Now 5.6.32->5.7 replication, on Master itself the invalid data will be rejected. In your case, it seems when you are executing LOAD DATA directly on the slave (5.6.32), you are saying it is working fine. That means it is *not* invalid utf8 data but if it is coming through replication channel, it is becoming invalid utf8. Is my understand correct? If so, it is looks different bug than the original reported issue of this bug. Could you please open another bug with all the information to reproduce the issue. Thanks & Regards, Venkatesh.
   [28 Aug 2016 6:06]
   Fakhruddin Hamid        
  Hi Venkatesh, Yes your understanding is correct, but the error says that there is "invalid utf8 char" Let me file a new bug for this. -Fakhruddin Hamid


Description: Table definition CREATE TABLE `t1` ( `id` int(10) unsigned NOT NULL, `name` varchar(255) NOT NULL, `ids` int(10) unsigned DEFAULT NULL, `date1` date DEFAULT NULL, `date2` date DEFAULT NULL, `period` tinyint(3) unsigned DEFAULT NULL, `archived` datetime DEFAULT NULL, `value` mediumblob, PRIMARY KEY (`id`,`name`), KEY `index_period_archived` (`period`,`archived`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 When trying to insert the following via LOAD DATA on the replication thread we get broken replication: Error 'Invalid utf8 character string: '"x'' on query. Default database: 'db'. Query: 'LOAD DATA INFILE '../tmp/SQL_LOAD-43156f1f-65c0-11e5-baa0-f0921c000d58-186068095-2.data' REPLACE INTO TABLE `t1` FIELDS TERMINATED BY '\t' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\r\n' (`id`, `ids`, `date1`, `date2`, `period`, `archived`, `name`, `value`)' What we have in the master is: *************************** 1. row *************************** id: 83 name: Actions_actions ids: 1 date1: 2015-10-07 date2: 2015-10-07 period: 1 archived: 2015-10-08 07:09:42 value: x��[n� Note: the rest of "value" data has been truncated for the bug report. SQL mode in both servers is set to: +---------------+--------------------------------------------+ | Variable_name | Value | +---------------+--------------------------------------------+ | sql_mode | STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION | +---------------+--------------------------------------------+ 1 row in set (0.00 sec) In the 5.6.24 server the statement is accepted and inserted and when it gets replicated to 5.7.9 it isn't and breaks. How to repeat: Get a 5.6 master and attach a 5.7 slave. Insert invalid utf8 data into an utf8 table. Suggested fix: 5.7 should be either aware that the upstream is 5.6 and be lenient or 5.6 should not accept invalid data.