Bug #32533 SELECT INTO/LOAD DATA INFILE with FIELDS ENCLOSED BY 8bit char corrupts data
Submitted: 20 Nov 2007 15:57 Modified: 14 Jan 2008 19:05
Reporter: Gleb Shchepa Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.0 and up OS:Any
Assigned to: Gleb Shchepa CPU Architecture:Any

[20 Nov 2007 15:57] Gleb Shchepa
Description:
SELECT INTO OUTFILE does not escape 8bit ENCLOSED BY characters, but LOAD DATA INFILE is trying to un-escapes them, so SELECT INTO/LOAD DATA INFILE pair corrupts data.

4.1 is not affected.

How to repeat:
mysql> SELECT VERSION();
+--------------+
| VERSION()    |
+--------------+
| 5.0.52-debug | 
+--------------+
1 row in set (0.00 sec)

mysql> CREATE TABLE t1 (c1 VARCHAR(256));
Query OK, 0 rows affected (0.05 sec)

mysql> INSERT INTO t1 VALUES (0xC3);
Query OK, 1 row affected (0.01 sec)

mysql> SELECT HEX(c1) FROM t1;
+---------+
| HEX(c1) |
+---------+
| C3      | 
+---------+
1 row in set (0.01 sec)

mysql> SELECT * INTO OUTFILE 't.txt' FIELDS ENCLOSED BY 0xC3 FROM t1;
Query OK, 1 row affected (0.00 sec)

mysql> TRUNCATE t1;
Query OK, 0 rows affected (0.00 sec)

mysql> LOAD DATA INFILE 't.txt' INTO TABLE t1 FIELDS ENCLOSED BY 0xC3;
Query OK, 1 row affected (0.00 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

mysql> SELECT HEX(c1) FROM t1;
+---------+
| HEX(c1) |
+---------+
| C3C30A  | 
+---------+
1 row in set (0.00 sec)

Suggested fix:
Make SELECT INTO OUTFILE algorithm 8bit-clean.
[20 Nov 2007 16:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/38150

ChangeSet@1.2566, 2007-11-20 20:15:20+04:00, gshchepa@gleb.loc +3 -0
  Fixed bug #32533.
  8bit escape characters, termination and enclosed characters
  were silently ignored by SELECT INTO query, but LOAD DATA INFILE
  algorithm is 8bit-clean, so data was corrupted during 
  encoding.
[14 Dec 2007 8:14] Bugs System
Pushed into 5.0.54
[14 Dec 2007 8:17] Bugs System
Pushed into 5.1.23-rc
[14 Dec 2007 8:20] Bugs System
Pushed into 6.0.5-alpha
[14 Jan 2008 19:05] Jon Stephens
Documented in the 5.0.54, 5.1.23, and 6.0.5 changelogs as follows:

        Using SELECT INTO OUTFILE with 8-bit
        ENCLOSED BY characters led to corrupted data
        when the data was reloaded using LOAD DATA INFILE. This was
        because SELECT INTO OUTFILE failed to escape
        the 8-bit characters.