Bug #31677 SELECT INTO OUTFILE never escapes multibyte character
Submitted: 17 Oct 2007 23:10 Modified: 19 Aug 2009 16:53
Reporter: Gleb Shchepa Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Parser Severity:S3 (Non-critical)
Version:5.0 and up OS:Any
Assigned to: Marc ALFF CPU Architecture:Any

[17 Oct 2007 23:10] Gleb Shchepa
Description:
Occurrences of the FIELD ENCLOSED BY, FIELD ESCAPED BY etc. character within a field value should be escaped by prefixing them with the ESCAPED BY character.

However the server never does it if the character (FIELD ENCLOSED BY etc.) is multibyte.

So, symmetrical LOAD DATA INFILE statement loads corrupted data.

How to repeat:
mysql> SELECT VERSION();
+--------------+
| VERSION()    |
+--------------+
| 5.0.52-debug | 
+--------------+
1 row in set (0.00 sec)

mysql> CREATE TABLE t1 (c1 VARCHAR(256)) CHARACTER SET utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> SET NAMES latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO t1 VALUES ('-а-');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM t1;
+------+
| c1   |
+------+
| -а-  | 
+------+
1 row in set (0.00 sec)

mysql> TRUNCATE t1;
Query OK, 0 rows affected (0.00 sec)

mysql> LOAD DATA INFILE 'x.txt' INTO TABLE t1 FIELDS ESCAPED BY 'а';
Query OK, 1 row affected (0.00 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

mysql> SELECT * FROM t1;
+------+
| c1   |
+------+
| -ц│- | 
+------+
1 row in set (0.00 sec)
[17 Oct 2007 23:12] Gleb Shchepa
test case

Attachment: 31677.test (application/octet-stream, text), 720 bytes.

[16 Nov 2007 20:17] Gleb Shchepa
Test case above is not precise. See correct test below.

Also there is a test for single-byte 8bit termination/enclosed character and
single-byte data: LOAD DATA INFILE is 8bit clear, but SELECT INTO OUTFILE is not, so encode/decode process in destructive (data corruption):

mysql> SELECT VERSION();
+--------------+
| VERSION()    |
+--------------+
| 5.0.52-debug | 
+--------------+
1 row in set (0.00 sec)

mysql> 
mysql> SET NAMES utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> SET character_set_database= utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> 
mysql> CREATE TABLE t1 (c1 VARCHAR(256), c2 VARCHAR(256)) CHARACTER SET utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO t1 VALUES (0xC381, 0xC381);
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM t1;
+------+------+
| c1   | c2   |
+------+------+
| ц   | ц   | 
+------+------+
1 row in set (0.00 sec)

mysql> SELECT * INTO OUTFILE 't.txt' FROM t1;
Query OK, 1 row affected (0.00 sec)

mysql> TRUNCATE t1;
Query OK, 0 rows affected (0.00 sec)

#
# Escaping is not necessary, correct result:
#

mysql> LOAD DATA INFILE 't.txt' INTO TABLE t1;
Query OK, 1 row affected (0.00 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

mysql> SELECT * FROM t1;
+------+------+
| c1   | c2   |
+------+------+
| ц   | ц   | 
+------+------+
1 row in set (0.00 sec)

#########################################################

mysql> SELECT * INTO OUTFILE 't.txt' FIELDS ENCLOSED BY 0xC3 TERMINATED BY 0x81 FROM t1;
Query OK, 1 row affected (0.00 sec)

mysql> TRUNCATE t1;
Query OK, 0 rows affected (0.00 sec)

#
# Escaping is required, incorrect result:
#

mysql> LOAD DATA INFILE 't.txt' INTO TABLE t1 FIELDS ENCLOSED BY 0xC3 TERMINATED BY 0x81;
Query OK, 1 row affected, 1 warning (0.00 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 1

mysql> SELECT * FROM t1;
+--------+------+
| c1     | c2   |
+--------+------+
| ццц | NULL | 
+--------+------+
1 row in set (0.00 sec)

#########################################################

#
# Data encoding is not 8bit clear:
#

mysql> CREATE TABLE t1 (c1 VARCHAR(256));
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO t1 VALUES (0xC3);
Query OK, 1 row affected (0.00 sec)

mysql> SELECT HEX(c1) FROM t1;
+---------+
| HEX(c1) |
+---------+
| C3      | 
+---------+
1 row in set (0.00 sec)

mysql> SELECT * INTO OUTFILE 't.txt' FIELDS ENCLOSED BY 0xC3 FROM t1;
Query OK, 1 row affected (0.00 sec)

mysql> TRUNCATE t1;
Query OK, 0 rows affected (0.00 sec)

mysql> 
mysql> LOAD DATA INFILE 't.txt' INTO TABLE t1 FIELDS ENCLOSED BY 0xC3;
Query OK, 1 row affected (0.00 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

mysql> SELECT HEX(c1) FROM t1;
+---------+
| HEX(c1) |
+---------+
| C3C30A  | 
+---------+
1 row in set (0.00 sec)
[16 Nov 2007 20:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/37979

ChangeSet@1.2566, 2007-11-16 23:44:55+04:00, gshchepa@gleb.loc +4 -0
  Fixed bug #31677.
  1. Multi-byte characters was never escaped, because they
     was always skipped before escaping process.
  
  2. 8bit escape characters, termination and enclosed characters
     were silently ignored by SELECT INTO query, but LOAD DATA INFILE
     algorithm is 8bit-clear, so data was corrupted during 
     encoding.
[19 Nov 2007 13:39] Sergei Golubchik
This will be fixed in 5.1 after bug#30946 is fixed
[19 Aug 2009 16:35] Marc ALFF
With the following script:
------------------
select version();
drop table if exists t1;

SET NAMES utf8;
SET character_set_database= utf8;
CREATE TABLE t1 (c1 VARCHAR(256), c2 VARCHAR(256)) CHARACTER SET utf8;
INSERT INTO t1 VALUES (0xC381, 0xC381);
SELECT * FROM t1;
SELECT * INTO OUTFILE 'ta.txt' FROM t1;
TRUNCATE t1;
LOAD DATA INFILE 'ta.txt' INTO TABLE t1;
SELECT * FROM t1;

SELECT * INTO OUTFILE 'tb.txt' FIELDS ENCLOSED BY 0xC3 TERMINATED BY 0x81 FROM t1;
TRUNCATE t1;
LOAD DATA INFILE 'tb.txt' INTO TABLE t1 FIELDS ENCLOSED BY 0xC3 TERMINATED BY 0x81;
SELECT * FROM t1;

drop table t1;

------------------
the result is with 5.4.4:

select version();
version()
5.4.4-alpha-debug-log
drop table if exists t1;
Warnings:
Note    1051    Unknown table 't1'
SET NAMES utf8;
SET character_set_database= utf8;
CREATE TABLE t1 (c1 VARCHAR(256), c2 VARCHAR(256)) CHARACTER SET utf8;
INSERT INTO t1 VALUES (0xC381, 0xC381);
SELECT * FROM t1;
c1      c2
Á       Á
SELECT * INTO OUTFILE 'ta.txt' FROM t1;
TRUNCATE t1;
LOAD DATA INFILE 'ta.txt' INTO TABLE t1;
SELECT * FROM t1;
c1      c2
Á       Á
SELECT * INTO OUTFILE 'tb.txt' FIELDS ENCLOSED BY 0xC3 TERMINATED BY 0x81 FROM t1;
Warnings:
Warning 1638    Non-ASCII separator arguments are not fully supported
TRUNCATE t1;
LOAD DATA INFILE 'tb.txt' INTO TABLE t1 FIELDS ENCLOSED BY 0xC3 TERMINATED BY 0x81;
Warnings:
Warning 1638    Non-ASCII separator arguments are not fully supported
Warning 1261    Row 1 doesn't contain data for all columns
SELECT * FROM t1;
c1      c2
ÁÁÁ     NULL
drop table t1;