Bug #29294 | Special characters and SELECT INTO OUTFILE/LOAD DATA INFILE | ||
---|---|---|---|
Submitted: | 22 Jun 2007 6:16 | Modified: | 19 Jul 2007 18:53 |
Reporter: | Lachezar Balev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 5.1.17, 5.0, 4.1 | OS: | Linux (FC5) |
Assigned to: | Gleb Shchepa | CPU Architecture: | Any |
[22 Jun 2007 6:16]
Lachezar Balev
[22 Jun 2007 8:00]
Sveta Smirnova
Thank you for the report. Verified as described. All versions are affected.
[27 Jun 2007 18:38]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/29772 ChangeSet@1.2671, 2007-06-27 23:37:13+05:00, gshchepa@gleb.loc +5 -0 Fixed bug #29294. Actually there are was 2 different bugs: 1. SELECT INTO OUTFILE ... FIELDS ENCLOSED BY <special character> escaped strings with <special character> inside with the value from an ESCAPED BY clause. So, for the FIELDS ENCLOSED BY 'r' the string of value "...r..." was encoded as "...\\r..." (here the value of ESCAPED BY is default '\\' character). On the LOAD DATA query this string was correctly decoded as "...\r...", not as the "...r...". 2. LOAD DATA INFILE was trying to "unescape" strings like "\\r" as the '\r' ignoring the fact that the 'r' character is an escaped value of ENCLOSED BY clause and must be decoded as "r".
[28 Jun 2007 8:55]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/29820 ChangeSet@1.2671, 2007-06-28 13:54:55+05:00, gshchepa@gleb.loc +5 -0 Fixed bug #29294. Actually there were 2 different bugs: 1. The `SELECT INTO OUTFILE ... FIELDS ENCLOSED BY 'r' ' statement encoded the "r" string to a 2 byte string of value "\\r". The LOAD DATA statement decoded this string to a 1 byte string of value "\r" instead of "r". The same error also happened with the FIELDS ENCLOSED BY clause followed by special characters: 'n', 't', 'r', 'b', '0', 'Z' and 'N'. NOTE, changed behaviour: Now the `SELECT INTO OUTFILE' statement with the `FIELDS ENCLOSED BY' clause followed by special character encodes this special character itself by doubling it ("r" --> "rr"), not by prepending it with an escape character. 2. The `LOAD DATA INFILE ... FIELDS ESCAPED BY 'r' ' statement decoded 2 byte string "rr" to a 1 byte string of value "\r" instead of "r". The same error also happened with the FIELDS ESCAPED BY clause followed by special characters: 'n', 't', 'r', 'b', '0', 'Z' and 'N'.
[29 Jun 2007 11:40]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/29940 ChangeSet@1.2671, 2007-06-29 16:40:46+05:00, gshchepa@gleb.loc +4 -0 Fixed bug #29294. The `SELECT 'r' INTO OUTFILE ... FIELDS ENCLOSED BY 'r' ' statement encoded the 'r' string to a 4 byte string of value x'725c7272' (sequence of 4 characters: r\rr). The LOAD DATA statement decoded this string to a 1 byte string of value x'0d' (ASCII Carriage Return character) instead of the original 'r' character. The same error also happened with the FIELDS ENCLOSED BY clause followed by special characters: 'n', 't', 'r', 'b', '0', 'Z' and 'N'. NOTE 1: This is a result of the undocumented feature: the LOAD DATA INFILE recognises 2-byte input sequences like \n, \t, \r and \Z in addition to documented 2-byte sequences: \0 and \N. This feature should be documented (here backspace character is a default ESCAPED BY character, in the real-life example it may be any ESCAPED BY character). NOTE 2, changed behaviour: Now the `SELECT INTO OUTFILE' statement with the `FIELDS ENCLOSED BY' clause followed by one of: 'n', 't', 'r', 'b', '0', 'Z' or 'N' characters encodes this special character itself by doubling it ('r' --> 'rr'), not by prepending it with an escape character.
[3 Jul 2007 23:35]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/30184 ChangeSet@1.2671, 2007-07-03 19:37:46+05:00, gshchepa@gleb.loc +5 -0 Fixed bug #29294. The `SELECT 'r' INTO OUTFILE ... FIELDS ENCLOSED BY 'r' ' statement encoded the 'r' string to a 4 byte string of value x'725c7272' (sequence of 4 characters: r\rr). The LOAD DATA statement decoded this string to a 1 byte string of value x'0d' (ASCII Carriage Return character) instead of the original 'r' character. The same error also happened with the FIELDS ENCLOSED BY clause followed by special characters: 'n', 't', 'r', 'b', '0', 'Z' and 'N'. NOTE 1: This is a result of the undocumented feature: the LOAD DATA INFILE recognises 2-byte input sequences like \n, \t, \r and \Z in addition to documented 2-byte sequences: \0 and \N. This feature should be documented (here backspace character is a default ESCAPED BY character, in the real-life example it may be any ESCAPED BY character). NOTE 2, changed behaviour: Now the `SELECT INTO OUTFILE' statement with the `FIELDS ENCLOSED BY' clause followed by one of: 'n', 't', 'r', 'b', '0', 'Z' or 'N' characters encodes this special character itself by doubling it ('r' --> 'rr'), not by prepending it with an escape character.
[3 Jul 2007 23:49]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/30223 ChangeSet@1.2672, 2007-07-03 21:45:20+05:00, gshchepa@gleb.loc +2 -0 loaddata.result, loaddata.test: Test case update for bug #29294.
[8 Jul 2007 17:28]
Bugs System
Pushed into 5.1.21-beta
[8 Jul 2007 17:30]
Bugs System
Pushed into 5.0.46
[8 Jul 2007 17:33]
Bugs System
Pushed into 4.1.24
[19 Jul 2007 18:53]
Paul DuBois
Noted in 4.1.24, 5.0.46, 5.1.21 changelogs. SELECT ... INTO OUTFILE followed by LOAD DATA could result in garbled characters when the FIELDS ENCLOSED BY clause named a delimiter of '0', 'b', 'n', 'r', 't', 'N, or 'Z' due to an interaction of character encoding and doubling for data values containing the enclosed-by character.
[19 Jul 2007 18:55]
Paul DuBois
Also updated the LOAD DATA section to note the additional escape sequences that are recognized for input values.