Bug #83950 LOAD DATA INFILE fails with an escape character followed by a multi-byte one
Submitted: 24 Nov 2016 10:58 Modified: 27 Jan 2017 16:15
Reporter: Valeriy Kravchuk Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.7.16, 8.0.0, 5.6.34 OS:Any
Assigned to: CPU Architecture:Any
Tags: charset, LOAD DATA

[24 Nov 2016 10:58] Valeriy Kravchuk
Description:
In recent versions LOAD DATA INFILE fails to load data with an escape character followed by a valid multi-byte utf8 character. 

See https://jira.mariadb.org/browse/MDEV-11343 for the original report (by  Alexander Barkov) and more details on possible root cause.

How to repeat:
[openxs@fc23 5.7]$ echo "\ä" >/tmp/test.txt
[openxs@fc23 5.7]$ cat /tmp/test.txt
\ä
[openxs@fc23 5.7]$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 18420
Server version: 5.7.16 MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> DROP TABLE IF EXISTS t1;
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8);
Query OK, 0 rows affected (0.36 sec)

mysql> LOAD DATA INFILE '/tmp/test.txt' INTO TABLE t1 CHARACTER SET utf8;
ERROR 1300 (HY000): Invalid utf8 character string: ''
mysql> insert into t1 values('\ä');
Query OK, 1 row affected (0.05 sec)

mysql> select * from t1;
+------+
| a    |
+------+
| ä    |
+------+
1 row in set (0.00 sec)

As you can see, INSERT proceses this case without any problems.

Suggested fix:
Fix LOAD DATA to ignore initial \ in this case and accept valid data the same way as INSERT does.
[24 Nov 2016 11:58] Umesh Shastry
Hello Valerii,

Thank you for the report and test case.
Verified as described.

Thanks,
Umesh
[24 Nov 2016 12:03] Umesh Shastry
-- 5.6.34 - affected

[umshastr@hod03]/export/umesh/server/binaries/GABuilds/mysql-5.6.34: bin/mysql -uroot -S /tmp/mysql_ushastry.sock                                                                                 Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.34-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

root@localhost [(none)]> use test
Database changed
root@localhost [test]> LOAD DATA INFILE '/tmp/test.txt' INTO TABLE t1 CHARACTER SET utf8;
ERROR 1300 (HY000): Invalid utf8 character string: ''
root@localhost [test]> insert into t1 values('\ä');
Query OK, 1 row affected (0.00 sec)

root@localhost [test]> select * from t1;
+------+
| a    |
+------+
| ä    |
+------+
1 row in set (0.00 sec)

root@localhost [test]> \s
--------------
bin/mysql  Ver 14.14 Distrib 5.6.34, for linux-glibc2.5 (x86_64) using  EditLine wrapper

Connection id:          1
Current database:       test
Current user:           root@localhost
SSL:                    Not in use
Current pager:          more
Using outfile:          ''
Using delimiter:        ;
Server version:         5.6.34-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    latin1
Db     characterset:    latin1
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /tmp/mysql_ushastry.sock
Uptime:                 31 sec

Threads: 1  Questions: 11  Slow queries: 0  Opens: 68  Flush tables: 1  Open tables: 61  Queries per second avg: 0.354
--------------

root@localhost [test]> show variables like 'sql_mode';
+---------------+------------------------+
| Variable_name | Value                  |
+---------------+------------------------+
| sql_mode      | NO_ENGINE_SUBSTITUTION |
+---------------+------------------------+
1 row in set (0.00 sec)
[2 Jan 2017 10:33] Valeriy Kravchuk
It seems to be still repeatable with 5.7.17:

openxs@ao756:~/dbs/5.7$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.17 MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> \! cat /tmp/test.txt
\ä
mysql> DROP TABLE IF EXISTS t1;
Query OK, 0 rows affected (0,18 sec)

mysql> CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8);
Query OK, 0 rows affected (0,30 sec)

mysql> LOAD DATA LOCAL INFILE '/tmp/test.txt' INTO TABLE t1 CHARACTER SET utf8;
ERROR 1300 (HY000): Invalid utf8 character string: ''
[2 Jan 2017 11:07] Valeriy Kravchuk
Seems to be fixed in 5.5.54 though:

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.5.54    |
+-----------+
1 row in set (0.00 sec)

mysql> show create table t\G
*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE `t` (
  `a` varchar(10) CHARACTER SET utf8 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> \! cat /tmp/test.txt
\ä
mysql> LOAD DATA LOCAL INFILE '/tmp/test.txt' INTO TABLE t CHARACTER SET utf8;
Query OK, 1 row affected (0.06 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select * from t;
+------+
| a    |
+------+
| ä    |
+------+
1 row in set (0.00 sec)
[27 Jan 2017 16:15] Paul Dubois
Posted by developer:
 
Noted in 8.0.1 changelog.

LOAD DATA failed to accept multibyte characters that followed an
escape sequence.
[27 Jan 2017 16:32] Jean-François Gagné
Thanks for fixing that in 8.0.1.
Will that be fixed in 5.6 and 5.7 ?
Thanks,
JFG
[13 Jun 2017 5:53] Umesh Shastry
Re-verification on 5.7.18 as requested internally

Attachment: 83950_5.7.18.results (application/octet-stream, text), 3.11 KiB.