Bug #76237 LOAD DATA INFILE ignores a specific row silently under Db Charset is utf8
Submitted: 10 Mar 2015 9:21 Modified: 17 Apr 2015 14:48
Reporter: Tsubasa Tanaka (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.7.5, 5.7.6, 5.7.7 OS:Linux (CentOS 6.5)
Assigned to: Tor Didriksen CPU Architecture:Any
Tags: LOAD DATA INFILE, utf8

[10 Mar 2015 9:21] Tsubasa Tanaka
Description:
MySQL 5.7's "LOAD DATA INFILE" statement ignores a specific row.
Please see "How to repeat".
I'll provide sample data for reproducing.

How to repeat:
mysql57> SELECT @@version;
+---------------+
| @@version     |
+---------------+
| 5.7.6-m16-log |
+---------------+
1 row in set (0.00 sec)

mysql57> create table t1 (val text);
Query OK, 0 rows affected (0.09 sec)

mysql57> LOAD DATA INFILE '/tmp/test.txt' INTO TABLE t1;
Query OK, 0 rows affected (0.02 sec)

mysql57> SELECT * FROM t1;
Empty set (0.01 sec)

mysql56> SELECT @@version;
+------------+
| @@version  |
+------------+
| 5.6.23-log |
+------------+
1 row in set (0.00 sec)

mysql56> create table t1 (val text);
Query OK, 0 rows affected (0.05 sec)

mysql56> LOAD DATA INFILE '/tmp/test.txt' INTO TABLE t1;
Query OK, 1 row affected (0.01 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

mysql56> SELECT * FROM t1;
.. # snip #
1 row in set (0.00 sec)
[10 Mar 2015 11:34] MySQL Verification Team
Thank you for the report.
I just tried on latest build and not seeing this issue:

root@localhost [(none)] > SELECT @@version;
+-----------------------------------------+
| @@version                               |
+-----------------------------------------+
| 5.7.7-rc-enterprise-commercial-advanced |
+-----------------------------------------+
1 row in set (0.00 sec)

root@localhost [(none)] > use test
Database changed
root@localhost [test] > create table t1 (val text);
Query OK, 0 rows affected (0.01 sec)

root@localhost [test] > LOAD DATA INFILE '/tmp/file' INTO TABLE t1;
Query OK, 1 row affected (0.00 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

root@localhost [test] > SELECT * FROM t1;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| val                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "RT @niouzechun: ç§?ã?¯ãƒ?ッピーエンド厨ã?ªã‚“ã? ã?‘ã‚Œã?©ã€?アンãƒ?ッピーエンドや惨劇的ã?ªã?Šè©±ã‚’ã?‚ã?¾ã‚Šç©?極的ã?«æ‘‚å?–ã?—ã?ªã?„ç?†ç”±ã?¯ã€?ç?¾å®Ÿã?®äººç”Ÿã?¯åŸºæœ¬çš„ã?«ã?†ã?¾ã??ã?„ã?‹ã?ªã?„ã?—ä¸?平等ã? ã?—ç?†ä¸?å°½ã? ã?—è¾›ã?„                                                                                                                                                                                                                                                                                           |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Let me check with  5.7.5/5.7.6 builds and confirm this.
[10 Mar 2015 16:14] MySQL Verification Team
C:\dbs>c:\dbs\5.7\bin\mysql -uroot --port=3570 --prompt="mysql 5.7 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.7-rc Source distribution PULL: 2015.MAR04

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.7 > use test
Database changed
mysql 5.7 > create table t1 (val text);
Query OK, 0 rows affected (0.19 sec)

mysql 5.7 >  LOAD DATA INFILE 'c:/tmp/test.txt' INTO TABLE t1;
Query OK, 1 row affected (0.05 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

mysql 5.7 > SELECT * FROM t1;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------+
| val
               |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------+
| "RT @niouzechun: ç§?ã?¯ãƒ?ッãƒ?ーã?¨ãƒ³ãƒ?å?¨ã?ªã??ã? ã??ã??ã?©ã??ã?¢ãƒ³ãƒ?ッãƒ?ーã?¨ãƒ³ãƒ?ã??惨å??ç??ã?ªã??話ã??ã??ã?¾ã??ç©?極ç??ã?«æ??å??ã??ã?ªã??ç??ç?±ã?¯ã??ç?¾å®?ã?®äººç??ã?¯å?ºæ?¬ç??ã?«ã??ã?¾ã??ã??ã??ã?ªã??ã??ä¸?å¹³ç­?ã? ã??ç??ä¸?å
°½ã? ã??è¾?ã?? |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------+
1 row in set (0.00 sec)

mysql 5.7 >
[10 Mar 2015 16:17] MySQL Verification Team
C:\dbs>5.7\bin\mysql -uroot
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 27
Server version: 5.7.5-m15 MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

root@localhost [(none)] > USE d
Database changed
root@localhost [d] > create table t1 (val text);
Query OK, 0 rows affected (0.34 sec)

root@localhost [d] >  LOAD DATA INFILE 'c:/tmp/test.txt' INTO TABLE t1;
Query OK, 1 row affected (0.11 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

root@localhost [d] > SELECT * FROM t1;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------+
| val
               |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------+
| "RT @niouzechun: ç§?ã?¯ãƒ?ッãƒ?ーã?¨ãƒ³ãƒ?å?¨ã?ªã??ã? ã??ã??ã?©ã??ã?¢ãƒ³ãƒ?ッãƒ?ーã?¨ãƒ³ãƒ?ã??惨å??ç??ã?ªã??話ã??ã??ã?¾ã??ç©?極ç??ã?«æ??å??ã??ã?ªã??ç??ç?±ã?¯ã??ç?¾å®?ã?®äººç??ã?¯å?ºæ?¬ç??ã?«ã??ã?¾ã??ã??ã??ã?ªã??ã??ä¸?å¹³ç­?ã? ã??ç??ä¸?å
°½ã? ã??è¾?ã?? |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------+
1 row in set (0.00 sec)
[10 Mar 2015 16:48] Tsubasa Tanaka
Hi Umesh, Miguel,

I confirmed this problem can't appear under the character set of latin1, I'm sorry for I forgot to tell you the character set which I was using.

Can you check using utf8?

mysql57> create database d1 charset latin1;
Query OK, 1 row affected (0.00 sec)

mysql57> use d1
Database changed

mysql57> create table t1 (val text);
Query OK, 0 rows affected (0.04 sec)

mysql57> status
--------------
/usr/mysql/5.7.6/bin/mysql  Ver 14.14 Distrib 5.7.6-m16, for Linux (x86_64) using  EditLine wrapper

Connection id:          3
Current database:       d1
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.7.6-m16-log Source distribution
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    latin1
Db     characterset:    latin1
Client characterset:    latin1
Conn.  characterset:    latin1
UNIX socket:            /usr/mysql/5.7.6/data/mysql.sock
Uptime:                 5 min 11 sec

Threads: 1  Questions: 60  Slow queries: 0  Opens: 114  Flush tables: 1  Open tables: 104  Queries per second avg: 0.192
--------------

mysql57> LOAD DATA INFILE '/tmp/test.txt' INTO TABLE d1.t1;
Query OK, 1 row affected (0.01 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

mysql57> SELECT * FROM t1;
.. ## snip, row is loaded correctly ##
1 row in set (0.00 sec)

mysql57> create database d2 charset utf8;
Query OK, 1 row affected (0.00 sec)

mysql57> use d2
Database changed

mysql57> create table t1 (val text);
Query OK, 0 rows affected (0.04 sec)

mysql57> LOAD DATA INFILE '/tmp/test.txt' INTO TABLE d1.t1;
Query OK, 0 rows affected (0.00 sec)

mysql57> status
--------------
/usr/mysql/5.7.6/bin/mysql  Ver 14.14 Distrib 5.7.6-m16, for Linux (x86_64) using  EditLine wrapper

Connection id:          3
Current database:       d2
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.7.6-m16-log Source distribution
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    latin1
Db     characterset:    utf8
Client characterset:    latin1
Conn.  characterset:    latin1
UNIX socket:            /usr/mysql/5.7.6/data/mysql.sock
Uptime:                 4 min 39 sec

Threads: 1  Questions: 54  Slow queries: 0  Opens: 113  Flush tables: 1  Open tables: 103  Queries per second avg: 0.193
--------------

mysql57> SELECT * FROM t1;
Empty set (0.00 sec)

And I confirm this behavior depends on "Db charset".

mysql57> use d1;
Database changed
mysql57> SELECT @@character_set_database;
+--------------------------+
| @@character_set_database |
+--------------------------+
| latin1                   |
+--------------------------+
1 row in set (0.00 sec)

mysql57> LOAD DATA INFILE '/tmp/test.txt' INTO TABLE d1.t1;
Query OK, 1 row affected (0.01 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

mysql57> use d2;
Database changed

mysql57> SELECT @@character_set_database;
+--------------------------+
| @@character_set_database |
+--------------------------+
| utf8                     |
+--------------------------+
1 row in set (0.00 sec)

mysql57> LOAD DATA INFILE '/tmp/test.txt' INTO TABLE d1.t1;
Query OK, 0 rows affected (0.00 sec)

Thanks,
[10 Mar 2015 16:49] Tsubasa Tanaka
I updated Synopsis and Category.
[11 Mar 2015 9:11] MySQL Verification Team
Thank you for the feedback, tanaka-san.
Observed similar behavior at my end.
Imho server uses the character set indicated by the character_set_database system variable to interpret the information in the file and in that case it should not have skipped for utf8

[umshastr@hod03]/export/umesh/mysql-5.7.7: bin/mysql -uroot -p -S /tmp/mysql_ushastry.sock
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.7-rc-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)
..
,
root@localhost [(none)] > use test
root@localhost [test] > create database d1 charset latin1;
Query OK, 1 row affected (0.00 sec)

root@localhost [test] >  use d1
Database changed
root@localhost [d1] > create table t1 (val text);
Query OK, 0 rows affected (0.00 sec)

root@localhost [d1] > LOAD DATA INFILE '/tmp/file' INTO TABLE t1;
Query OK, 1 row affected (0.00 sec)   <--------------- stored
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

root@localhost [d1] > create database d2 charset utf8;
Query OK, 1 row affected (0.00 sec)

root@localhost [d1] > use d2
Database changed
root@localhost [d2] > create table t1 (val text);
Query OK, 0 rows affected (0.01 sec)

root@localhost [d2] > LOAD DATA INFILE '/tmp/file' INTO TABLE t1;
Query OK, 0 rows affected (0.00 sec) <--------------- skipped

root@localhost [d2] > SELECT @@character_set_database;
+--------------------------+
| @@character_set_database |
+--------------------------+
| utf8                     |
+--------------------------+
1 row in set (0.00 sec)

root@localhost [d2] > \s
--------------
bin/mysql  Ver 14.14 Distrib 5.7.7-rc, for Linux (x86_64) using  EditLine wrapper

Connection id:          2
Current database:       d2
Current user:           root@localhost
SSL:                    Cipher in use is DHE-RSA-AES256-SHA
Current pager:          more
Using outfile:          ''
Using delimiter:        ;
Server version:         5.7.7-rc-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    latin1
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /tmp/mysql_ushastry.sock
Uptime:                 2 min 41 sec

Threads: 1  Questions: 28  Slow queries: 0  Opens: 110  Flush tables: 1  Open tables: 103  Queries per second avg: 0.173
--------------

root@localhost [d2] > use d1
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
root@localhost [d1] > \s
--------------
bin/mysql  Ver 14.14 Distrib 5.7.7-rc, for Linux (x86_64) using  EditLine wrapper

Connection id:          2
Current database:       d1
Current user:           root@localhost
SSL:                    Cipher in use is DHE-RSA-AES256-SHA
Current pager:          more
Using outfile:          ''
Using delimiter:        ;
Server version:         5.7.7-rc-enterprise-commercial-advanced MySQL Enterprise Server - Advanced Edition (Commercial)
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    latin1
Db     characterset:    latin1
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /tmp/mysql_ushastry.sock
Uptime:                 2 min 55 sec

Threads: 1  Questions: 38  Slow queries: 0  Opens: 110  Flush tables: 1  Open tables: 103  Queries per second avg: 0.217
[26 Mar 2015 8:31] Tor Didriksen
Posted by developer:
 
Regression source
Bug#14653594 "INSTALL PLUGIN" HANGS SERVER WITH INVALID UTF8 CHARACTERS IN PATH

The patch for that bug set an internal error flag, without calling any of the error reporting routines.
Try running the same input data with a debug server, it will assert.

Note that the "real" bug exists in older versions as well:
an internal buffer used for parsing the input is too small.
Mysql 5.6 and below will fallback to reading byte-by-byte, rather than as three-byte chunks in utf8.
[17 Apr 2015 14:48] Paul DuBois
Noted in 5.7.8, 5.8.0 changelogs.

For multibyte character sets, LOAD DATA could fail to allocate space
correctly and ignore input rows.
[20 Jun 2016 16:49] Paul DuBois
Posted by developer:
 
Noted in 5.5.51, 5.6.32 changelogs.