Bug #29294 Special characters and SELECT INTO OUTFILE/LOAD DATA INFILE
Submitted: 22 Jun 2007 6:16 Modified: 19 Jul 2007 18:53
Reporter: Lachezar Balev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.1.17, 5.0, 4.1 OS:Linux (FC5)
Assigned to: Gleb Shchepa

[22 Jun 2007 6:16] Lachezar Balev
Description:
I noticed that the result of SELECT INTO OUTFILE and then LOAD DATA INFILE can be inconsistent. This can be observed when fields are enclosed by a character, that is special when escaped, e.g. 'r'.

The documentation states:

To avoid ambiguity, occurrences of the ENCLOSED BY character within a field value can be doubled and are interpreted as a single instance of the character.

But this is not exactly what happens when selecting into outfile. The character is escaped and this damages data when importing back.

CREATE TABLE `test` 
(
  `id` int(20) DEFAULT NULL,
  `name` varchar(255) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into test values (1, 'Lucho'), (2, 'Lily'), (3, 'Kiro');
select id, name into outfile '/tmp/test.txt' fields enclosed by 'r' from test;
truncate test;

Now we look at the dump file to see the enclosed by 'r' fields:
# cat test.txt
r1r     rLuchor
r2r     rLilyr
r3r     rKi\ror

The 'r' in Kiro is not doubled but rather is escaped.

Thus when we import back:
load data infile '/tmp/test.txt'  into table test fields enclosed by 'r';

data (Kiro) is damaged:
+------+-------+
| id   | name  |
+------+-------+
|    1 | Lucho |
|    2 | Lily  |
o  | 3 | Ki
+------+-------+

How to repeat:
The problem can be always repeated.

Suggested fix:
Maybe the characters should be doubled.
[22 Jun 2007 8:00] Sveta Smirnova
Thank you for the report.

Verified as described. All versions are affected.
[27 Jun 2007 18:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/29772

ChangeSet@1.2671, 2007-06-27 23:37:13+05:00, gshchepa@gleb.loc +5 -0
  Fixed bug #29294.
  Actually there are was 2 different bugs:
  1. SELECT INTO OUTFILE ... FIELDS ENCLOSED BY <special character>
  escaped strings with <special character> inside with the value from
  an ESCAPED BY clause. So, for the FIELDS ENCLOSED BY 'r' the string
  of value "...r..." was encoded as "...\\r..." (here the value of
  ESCAPED BY is default '\\' character). On the LOAD DATA query this
  string was correctly decoded as "...\r...", not as the "...r...".
  2. LOAD DATA INFILE was trying to "unescape" strings like "\\r"
  as the '\r' ignoring the fact that the 'r' character is an escaped
  value of ENCLOSED BY clause and must be decoded as "r".
[28 Jun 2007 8:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/29820

ChangeSet@1.2671, 2007-06-28 13:54:55+05:00, gshchepa@gleb.loc +5 -0
  Fixed bug #29294.
  Actually there were 2 different bugs:
  
  1. The `SELECT INTO OUTFILE ... FIELDS ENCLOSED BY 'r' ' statement
  encoded the "r" string to a 2 byte string of value "\\r".
  The LOAD DATA statement decoded this string to a 1 byte string of
  value "\r" instead of "r".
  The same error also happened with the FIELDS ENCLOSED BY clause
  followed by special characters: 'n', 't', 'r', 'b', '0', 'Z' and 'N'.
  
  NOTE, changed behaviour:
  Now the `SELECT INTO OUTFILE' statement with the `FIELDS ENCLOSED BY'
  clause followed by special character encodes this special character
  itself by doubling it ("r" --> "rr"), not by prepending it with an
  escape character.
  
  2. The `LOAD DATA INFILE ... FIELDS ESCAPED BY 'r' ' statement
  decoded 2 byte string "rr" to a 1 byte string of value "\r"
  instead of "r".
  The same error also happened with the FIELDS ESCAPED BY clause
  followed by special characters: 'n', 't', 'r', 'b', '0', 'Z' and 'N'.
[29 Jun 2007 11:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/29940

ChangeSet@1.2671, 2007-06-29 16:40:46+05:00, gshchepa@gleb.loc +4 -0
  Fixed bug #29294.
  The `SELECT 'r' INTO OUTFILE ... FIELDS ENCLOSED BY 'r' ' statement
  encoded the 'r' string to a 4 byte string of value x'725c7272'
  (sequence of 4 characters: r\rr).
  The LOAD DATA statement decoded this string to a 1 byte string of
  value x'0d' (ASCII Carriage Return character) instead of the original
  'r' character.
  The same error also happened with the FIELDS ENCLOSED BY clause
  followed by special characters: 'n', 't', 'r', 'b', '0', 'Z' and 'N'.
  
  NOTE 1: This is a result of the undocumented feature: the LOAD DATA INFILE
  recognises 2-byte input sequences like \n, \t, \r and \Z in addition
  to documented 2-byte sequences: \0 and \N. This feature should be
  documented (here backspace character is a default ESCAPED BY character,
  in the real-life example it may be any ESCAPED BY character).
  
  NOTE 2, changed behaviour:
  Now the `SELECT INTO OUTFILE' statement with the `FIELDS ENCLOSED BY'
  clause followed by one of: 'n', 't', 'r', 'b', '0', 'Z' or 'N' characters
  encodes this special character itself by doubling it ('r' --> 'rr'),
  not by prepending it with an escape character.
[3 Jul 2007 23:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/30184

ChangeSet@1.2671, 2007-07-03 19:37:46+05:00, gshchepa@gleb.loc +5 -0
  Fixed bug #29294.
  The `SELECT 'r' INTO OUTFILE ... FIELDS ENCLOSED BY 'r' ' statement
  encoded the 'r' string to a 4 byte string of value x'725c7272'
  (sequence of 4 characters: r\rr).
  The LOAD DATA statement decoded this string to a 1 byte string of
  value x'0d' (ASCII Carriage Return character) instead of the original
  'r' character.
  The same error also happened with the FIELDS ENCLOSED BY clause
  followed by special characters: 'n', 't', 'r', 'b', '0', 'Z' and 'N'.
  
  NOTE 1: This is a result of the undocumented feature: the LOAD DATA INFILE
  recognises 2-byte input sequences like \n, \t, \r and \Z in addition
  to documented 2-byte sequences: \0 and \N. This feature should be
  documented (here backspace character is a default ESCAPED BY character,
  in the real-life example it may be any ESCAPED BY character).
  
  NOTE 2, changed behaviour:
  Now the `SELECT INTO OUTFILE' statement with the `FIELDS ENCLOSED BY'
  clause followed by one of: 'n', 't', 'r', 'b', '0', 'Z' or 'N' characters
  encodes this special character itself by doubling it ('r' --> 'rr'),
  not by prepending it with an escape character.
[3 Jul 2007 23:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/30223

ChangeSet@1.2672, 2007-07-03 21:45:20+05:00, gshchepa@gleb.loc +2 -0
  loaddata.result, loaddata.test:
    Test case update for bug #29294.
[8 Jul 2007 17:28] Bugs System
Pushed into 5.1.21-beta
[8 Jul 2007 17:30] Bugs System
Pushed into 5.0.46
[8 Jul 2007 17:33] Bugs System
Pushed into 4.1.24
[19 Jul 2007 18:53] Paul Dubois
Noted in 4.1.24, 5.0.46, 5.1.21 changelogs.

SELECT ... INTO OUTFILE followed by LOAD DATA could result in
garbled characters when the FIELDS ENCLOSED BY clause named
a delimiter of '0', 'b', 'n', 'r', 't', 'N, or 'Z' due to an
interaction of character encoding and doubling for data values
containing the enclosed-by character.
[19 Jul 2007 18:55] Paul Dubois
Also updated the LOAD DATA section to note the additional escape sequences that are recognized for input values.