Bug #40814 CSV engine does not parse \X characters when they occur in unquoted fields
Submitted: 18 Nov 2008 9:48 Modified: 7 Mar 2010 18:19
Reporter: V Venkateswaran Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: CSV Severity:S3 (Non-critical)
Version:5.1, 6.0 bzr OS:Any
Assigned to: CPU Architecture:Any

[18 Nov 2008 9:48] V Venkateswaran
Description:
When a .CSV file for table in the CSV engine contains
\X characters as part of unquoted fields, e.g.

2,naraya\nan

\n is not interpreted as a new line (it is however interpreted as a
newline in a quoted field).

How to repeat:
Consider the following .CSV file contents

1,"naraya\nan"
2,naraya\nan

for the following table definition

mysql> show create table t_csv;
+-------+------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                               |
+-------+------------------------------------------------------------------------------------------------------------+
| t_csv | CREATE TABLE `t_csv` (
  `i` int(11) NOT NULL,
  `n` char(50) NOT NULL
) ENGINE=CSV DEFAULT CHARSET=latin1 | 
+-------+------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

The difference in the output for the first field enclosed in quotes
and the second field not enclosed in quotes is as below

mysql> select * from t_csv;
+---+------------+
| i | n          |
+---+------------+
| 1 | naraya
an  | 
| 2 | naraya\nan | 
+---+------------+
2 rows in set (0.01 sec)

Suggested fix:
ha_tina::find_current_row contains the logic that tokenizes a row
from the .CSV file into the respective field values.

The logic that handles unquoted field values simply copies the field
value and does not do any of the processing being done for the quoted
fields.

The relevant code for the same is below

Note that the code copies all the characters until the end of the current
field is reached marked by a , into the field buffer.

else 
    {
      for(; curr_offset < end_offset; curr_offset++)
      {
        curr_char= file_buff->get_value(curr_offset);
        if (curr_char == ',')
        {
          curr_offset++;       // Skip the ,
          break;
        }
        buffer.append(curr_char);
      }
 }

should be changed to match the logic for the quoted fields
[19 Nov 2008 7:15] Sveta Smirnova
Thank you for the report.

How did you insert not quoted field in the CSV table?
[19 Nov 2008 7:41] V Venkateswaran
Thank you for taking a look at this bug

I used a hard-coded .CSV file

There is a issue similar to this in
5.0

http://bugs.mysql.com/bug.php?id=39616

These issues are not the same because 39616 does
not deal with \X characters
[19 Nov 2008 9:44] Sveta Smirnova
Thank you for the feedback.

Verified as described. Workaround: include quotes manually.
[6 Dec 2008 16:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60812

2774 V Narayanan	2008-12-06
      Bug#40814 CSV engine does not parse \X characters when they occur in unquoted fields
      
      When a .CSV file for table in the CSV engine contains
      \X characters as part of unquoted fields, e.g.
      
      2,naraya\nan
      
      \n is not interpreted as a new line (it is however interpreted as a
      newline in a quoted field).
      
      The old algorithm copied the entire value for a unquoted field without
      parsing the \X characters. 
      
      The new algorithm adds the capability to handle \X characters in the 
      unquoted fields of a .CSV file.
[15 Jan 2009 10:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63333

2967 V Narayanan	2009-01-15
      Bug#40814 CSV engine does not parse \X characters when they occur in unquoted fields
            
      When a .CSV file for table in the CSV engine contains
      \X characters as part of unquoted fields, e.g.
            
      2,naraya\nan
            
      \n is not interpreted as a new line (it is however interpreted as a
      newline in a quoted field).
            
      The old algorithm copied the entire value for a unquoted field without
      parsing the \X characters. 
            
      The new algorithm adds the capability to handle \X characters in the 
      unquoted fields of a .CSV file.
[20 Jan 2009 19:01] Bugs System
Pushed into 6.0.10-alpha (revid:joro@sun.com-20090119171328-2hemf2ndc1dxl0et) (version source revid:timothy.smith@sun.com-20090116165151-xtp5e4z6qsmxyvy0) (merge vers: 6.0.10-alpha) (pib:6)
[2 Feb 2009 14:43] Tony Bedford
An entry was added to the 6.0.10 changelog:

The CSV engine did not parse '\X' characters when they occurred in unquoted fields.
[3 Dec 2009 11:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/92662

2926 V Narayanan	2009-12-03
      Bug#40814 CSV engine does not parse \X characters when they occur in unquoted fields
          
      When a .CSV file for table in the CSV engine contains
      \X characters as part of unquoted fields, e.g.
          
      2,naraya\nan
          
      \n is not interpreted as a new line (it is however interpreted as a
      newline in a quoted field).
          
      The old algorithm copied the entire value for a unquoted field without
      parsing the \X characters. 
          
      The new algorithm adds the capability to handle \X characters in the 
      unquoted fields of a .CSV file.
     @ mysql-test/r/csv.result
        Bug#40814 CSV engine does not parse \X characters when they occur in unquoted fields
        
        Contains additional test output corresponding to the new 
        tests added.
     @ mysql-test/t/csv.test
        Bug#40814 CSV engine does not parse \X characters when they occur in unquoted fields
        
        Contains additional tests for testing the behaviour of the CSV 
        storage engine when the fields are not enclosed in quotes and
        contain \X characters.
     @ storage/csv/ha_tina.cc
        Bug#40814 CSV engine does not parse \X characters when they occur in unquoted fields
        
        Changes the parsing logic of the rows in a CSV file, to parse
        \X characters that might be present in the unquoted fields.
[11 Dec 2009 6:13] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091211061044-0i9oqtbk80oy5sqi) (version source revid:alik@sun.com-20091211060707-u1xhd0oob0k2ss7z) (merge vers: 6.0.14-alpha) (pib:13)
[11 Dec 2009 6:14] Bugs System
Pushed into 5.6.0-beta (revid:alik@sun.com-20091211060338-g18x2pr4yj260ara) (version source revid:svoj@sun.com-20091209110334-svuqinej60i1rw0m) (merge vers: 5.6.0-beta) (pib:13)
[11 Dec 2009 19:44] Paul DuBois
Noted in 5.6.0 changelog.

Already fixed in 6.0.x.
[6 Mar 2010 11:09] Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:vvaintroub@mysql.com-20091211201717-03qf8ckwiw0np80p) (merge vers: 5.6.0-beta) (pib:16)
[7 Mar 2010 18:19] Paul DuBois
Moved 5.6.0 changelog entry to 5.5.3.