Bug #36750 LOAD XML doesn't understand new line (feed) characters in multi line text fields
Submitted: 16 May 2008 8:18 Modified: 5 Aug 2009 8:46
Reporter: John Ericson Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: XML functions Severity:S2 (Serious)
Version:6.0.4-alpha,5.4 OS:Any (Win XP SP2 English, Linux)
Assigned to: Alexey Botchkov CPU Architecture:Any
Tags: import, Line Break, line feed, load, load xml, new line, TEXT FIELD, XML

[16 May 2008 8:18] John Ericson
Description:
When I try to import an XML-file that I earlier exported using mysqldump --xml the line feeds in multi line text fields get replaced with 3 white spaces instead of line breaks. This means that a restore from XML won't recover the data in the same state as it was when exported.

If I compare the binary data in the text field I can see that the new line character "0D 0A" has been replaced with "20 20 20".

Here is the complete string in binary form of the original text:
6C 69 6E 65 31 0D 0A 6C 69 6E 65 32 0D 0A 6C 69 6E 65 33

Here is the new one after LOAD XML:
6C 69 6e 65 31 20 20 20 6C 69 6E 65 32 20 20 20 6C 69 6E 65 33

Versions:
mysql.exe      Ver 14.14 Distrib 6.0.4-alpha, for Win32 (ia32)
mysqldump.exe  Ver 10.13 Distrib 6.0.4-alpha, for Win32 (ia32)

How to repeat:
I have developed a minimal scenario to prove the bug. It consists of:
1) Creating a new database with a table.
2) Inserting one line of data.
3) Export as XML.
4) Removing the data line.
5) Importing the XML.
6) See bug!

-- 1) Creating a new database with a table.
create database `test_of_xml_import`;

use `test_of_xml_import`;

create table `table1` (
  `id` int(11) not null,
  `text` text,
  primary key (`id`)
) engine=MyISAM default charset=latin1;

-- 2) Inserting one line of data.
insert into `table1` values (1, 'line1\r\nline2\r\nline3');

-- In MySQL Query Browser you can see the line feeds. If you right click the text field and chose
-- "View Field in Popup Editor" you can also see how it looks on multi lines.
select * from `table1`;

3) Export as XML.
"c:\Program Files\MySQL\MySQL Server 6.0\bin\mysqldump.exe" --xml -u root -p test_of_xml_import > test_of_xml_import.xml

4) Removing the data line.

5) Importing the XML.
-- How to load XML data into already existing structure.
-- XML file should be in the same directory.
"c:\Program Files\MySQL\MySQL Server 6.0\bin\mysql.exe" -D test_of_xml_import -uroot -p -e "load xml local infile 'test_of_xml_import.xml' into table table1"

-- When I load the data the \r\n (line feeds) gets removed, instead I get 3 white spaces.
-- Binary wise the text field looks like this, after load XML:
-- 6C 69 6e 65 31 20 20 20 6C 69 6E 65 32 20 20 20 6C 69 6E 65 33 

-- It should look like this (as it did when I inserted the line above):
-- 6C 69 6E 65 31 0D 0A 6C 69 6E 65 32 0D 0A 6C 69 6E 65 33

Suggested fix:
Probably have a look at how the code interprets new lines.
[16 May 2008 8:20] John Ericson
Script for creating test database and instructions for exporting and import XML. Same as in bug text

Attachment: test_of_xml_import.sql (application/octet-stream, text), 1.23 KiB.

[16 May 2008 19:33] Sveta Smirnova
Thank you for the report.

Verified as described.
[30 May 2008 8:24] Sveta Smirnova
Workaround: manually edit resulting XML file.
[30 May 2008 8:35] John Ericson
Could you describe how a valid newline is written in the XML-file? I did some attempts at converting the newline characters but didn't managed to get them to actually break lines in the database.
[31 May 2008 13:47] Sveta Smirnova
Thank you for the feedback. Seems I was wrong and nothing works.
[2 Jun 2008 18:48] Sveta Smirnova
Workaround which did not worK: edit XML file.

Workaround which works:

mysql> update table1 set text = replace(text, '  ', '\r\n');
[3 Jun 2008 7:51] John Ericson
That would probably work! Although replacing '  ' is a bit risky as it might be used in other circumstances. But it could work well if you replace the newlines in a text editor with something like '<MyOwnNewLineThingy />' before you restore the database. Then you could replace this tag with the valid newline without risking replacing unintended whitespaces.
[18 Jun 2009 13:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/76569

3372 Alexey Botchkov	2009-06-18
      Bug#36750      LOAD XML doesn't understand new line (feed) characters in multi line text fields
          Reading values of the XML tags, we turn linefeeds and tabs into spaces with the
          my_tospace() call.
      
      per-file comments:
        mysql-test/r/loadxml.result
      Bug#36750      LOAD XML doesn't understand new line (feed) characters in multi line text fields
           test result completed
      
        mysql-test/t/loadxml.test
      Bug#36750      LOAD XML doesn't understand new line (feed) characters in multi line text fields
           test result added
      
        sql/sql_load.cc
      Bug#36750      LOAD XML doesn't understand new line (feed) characters in multi line text fields
           add original characters to the 'value', not distorted with the my_tospace()
[8 Jul 2009 10:37] Alexander Barkov
http://lists.mysql.com/commits/76569 is Ok to push.
[30 Jul 2009 12:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/79638

2836 Alexey Botchkov	2009-07-30
      Bug#36750      LOAD XML doesn't understand new line (feed) characters in multi line text fields
                Reading values of the XML tags, we turn linefeeds and tabs into spaces with the
                my_tospace() call.
            
      per-file comments:
        mysql-test/r/loadxml.result
      Bug#36750      LOAD XML doesn't understand new line (feed) characters in multi line text fields
           test result completed
      
        mysql-test/t/loadxml.test
      Bug#36750      LOAD XML doesn't understand new line (feed) characters in multi line text fields
           test result added
      
        sql/sql_load.cc
      Bug#36750      LOAD XML doesn't understand new line (feed) characters in multi line text fields
           add original characters to the 'value', not distorted with the my_tospace()
[30 Jul 2009 13:07] Alexey Botchkov
pushed into azalea tree
[4 Aug 2009 19:50] Bugs System
Pushed into 5.4.4-alpha (revid:alik@sun.com-20090804194615-h40sa098mx4z49qg) (version source revid:iggy@mysql.com-20090731204544-7nio1afvg0dmzs7g) (merge vers: 5.4.4-alpha) (pib:11)
[5 Aug 2009 8:46] Jon Stephens
Documented in the 5.4.4 changelog as follows:

        Using LOAD XML to import an XML file created with mysqldump
        --xml did not work correctly because linefeeds in the content of
        the original XML were converted into spaces (that is, the
        sequence 0D 0A within XML elements was rewritten as 20 20 20).
[12 Aug 2009 22:56] Paul DuBois
Noted in 5.4.2 changelog because next 5.4 version will be 5.4.2 and not 5.4.4.
[15 Aug 2009 2:12] Paul DuBois
Ignore previous comment about 5.4.2.