Bug #50923 LOAD DATA INFILE replaces \b with backspace character
Submitted: 4 Feb 2010 16:32 Modified: 5 Feb 2010 17:32
Reporter: Miguel K Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S2 (Serious)
Version:5.1.41 OS:Windows
Assigned to: Paul DuBois CPU Architecture:Any

[4 Feb 2010 16:32] Miguel K
Description:
I'm not sure if this is a bug or a problem with the documentation.

The documentation for LOAD DATA INFILE says:
"Interpret occurrences of tab, newline, or “\” preceded by “\” as literal characters that are part of field values."

It says nothing about replacing '\b', or other simple letters after a backslash.

-------------

Using LOAD DATA INFILE to import text that has \b in it causes corruption of the text loaded in the database.

In Query Browser, the \b is replaced by ASCII 08, which is the backspace character.

This text
'a\ble'
is transformed to ASCII binary:
61 08 6b 65

To avoid this problem, either the original text must be transformed (\b to \\b) or the escape character changed. 

How to repeat:
create a text file with these three lines:
one
a\ble
t\wo

CREATE TABLE  testbackslash (`word` varchar(20) NOT NULL) ENGINE=MyISAM;

LOAD DATA INFILE 'C:/temp/test.txt'
INTO TABLE testbackslash
LINES TERMINATED BY '\r\n';

SELECT * FROM testbackslash;

Suggested fix:
Either warn us in the LOAD DATA INFILE documentation that any occurrence of \b (and I'm not sure what other characters) will not result in '\b' in the database or change the functionality.
[4 Feb 2010 17:25] Valeriy Kravchuk
Our manual (http://dev.mysql.com/doc/refman/5.1/en/load-data.html) clearly describe \b as a special case:

"Some two-character sequences that are exceptions, where the first character is the escape character. These sequences are shown in the following table (using “\” for the escape character). The rules for NULL  handling are described later in this section.
\0 	An ASCII NUL (0x00) character
\b 	A backspace character"
...

Please, check.
[4 Feb 2010 17:59] Miguel K
Thanks.  Sorry I overlooked it.

Maybe we can keep others from being confused by editing this line in the summary further up the page:
"Interpret occurrences of tab, newline, or “\” preceded by “\” as literal characters that are part of field values. "

This suggested replacement isn't that good, but maybe something like:
“\” before tab, newline and some letters will have special meaning

By leaving it ambiguous like my new sentence, people will know to keep looking for more info elsewhere.  I saw the current text and took the information literally to mean that \b was not a special case.

Cheers
[4 Feb 2010 18:16] Valeriy Kravchuk
OK. Maybe our Documentation team will invent some better text to warn users about other special character sequences starting with \.
[5 Feb 2010 17:32] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.

Updated text:

Interpret characters preceded by the escape character "\" as escape
sequences. For example, "\t", "\n", and "\\" signify tab, newline,
and backslash, respectively. See the discussion of FIELDS ESCAPED BY
later for the full list of escape sequences.