Bug #38091 LOAD DATA LOCAL INFILE using source column references loads sequential columns
Submitted: 14 Jul 2008 5:14 Modified: 7 Aug 2008 4:19
Reporter: Rusty Weise Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:5.1.23 OS:Windows (Vista Enterprise)
Assigned to: CPU Architecture:Any

[14 Jul 2008 5:14] Rusty Weise
Description:
I am attempting to do a load of a csv, with optional quotes, and a header row in the general form:

"col1","col2","col3 #","column #4","col5","column six status","col/field #7","col8","col9","col10","column 11"..."column int thirty five"

I want to load into a table with identically matching column names for columns: 1, 2, 3, 4, 6, 8, 11, and 35

Note that some columns have reserved names and spaces.

I have attempted the following query:

LOAD DATA LOCAL INFILE 'c:/mysql/data/mydatabase/mydata.csv' INTO
TABLE mydatabase.mytable
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES (`col1`,`col2`,`col3 #`,`column #4`,`column number six`,`col8`,`column 11`,`column in thirty five`);

and receive a number of warnings reading that the data in row 1, 2, 3...n has been truncated, and the columns which have actually been loaded are the concurrent columns in the csv, not the specified columns in the query.

I have since attempted to simply create the column names as well formed identifiers (eg column1, column2, column3,...,column35) and yet I still get sequential loads of data, not the specified columns in the query. 

I believe I have done due diligence on this bug, and it appears to be a regression as other users in the usenet group comp.databases.mysql, and on the MySQL forums report that it works for them using version 5.0.x and older.

How to repeat:
1. Create a table which matches a subset of colums as specified in a CSV
2. form a query using "LOAD DATA LOCAL INFILE" with the expected, sparse column references
3. execute the query and note the resultant set of values contains all sequential columns in the CSV attempted to be loaded into the table

It does not appear to matter if back-ticks are used to qualify reserved words and/or spaces, and it does not appear to work as documented if well-formed source and destination columns are used.
[14 Jul 2008 9:02] Susanne Ebrecht
Many thanks for reporting a bug.

Did you set sql_mode=ANSI_QUOTES before executing the query?

If not, please give it a try and let us know if this will solve your problem.
[17 Jul 2008 12:58] Sveta Smirnova
Thank you for the feedback.

Please provide output of SHOW CREATE TABLE mytable and first 5 lines from your text file (would be better if you attach them to the report as a separate file).
[19 Jul 2008 20:35] Rusty Weise
source data

Attachment: mydatabase.csv (application/vnd.ms-excel, text), 4.47 KiB.

[19 Jul 2008 20:36] Rusty Weise
steps performed to create db, create table, set sql_mode, load data, and resultant set

Attachment: bug38091.txt (text/plain), 1.50 KiB.

[19 Jul 2008 20:39] Rusty Weise
Note: in the source data, the number of the column in which the data is stored is embedded in the data:

col#1,column #2,stat col #3,TableColumn4,column5,column6,col/field/number7,col8,col9,col10,col 11,col12,col 13
1,stat col2,300,41234,this is a var char 550,"varchar,610","another, varchar, 750",boring8,boring9,boring10,1100,more chars with spaces 12,char13
2,"stat,col2",333,41234,this is a var char 551,"varchar,611","another, varchar, 751",boring8,boring9,boring10,1101,more chars with spaces 12,char13
[21 Jul 2008 21:06] Sveta Smirnova
Thank you for the feedback.

If I understood you correctly you want 1 row from the attached csv to be loades as:

mysql> select * from mytable limit 1\G   
*************************** 1. row ***************************
            col#1: 1
        column #2: stat col2
 status column #3: 300
     TableColumn4: 0
          column5: NULL
          column6: NULL
col/field/number7: boring9
             col8: NULL
             col9: NULL
            col10: boring10
           col 11: 1100
            col12: more chars with spaces 12
           col 13: char13
1 row in set (0.00 sec)

But you get results like:

mysql> select * from mytable limit 1\G*************************** 1. row ***************************
            col#1: 1
        column #2: stat col2
 status column #3: 300
     TableColumn4: 41234
          column5: NULL
          column6: NULL
col/field/number7: this is a var char 550
             col8: NULL
             col9: NULL
            col10: varchar,610
           col 11: 0
            col12: boring8
           col 13: boring9
1 row in set (0.01 sec)

If this is your concern this is not a bug: see http://dev.mysql.com/doc/refman/5.1/en/load-data.html for details how mysqld uses list of columns.

If I understood you in a wrong way, please provide expected and real results of the query.
[22 Jul 2008 5:27] Rusty Weise
Oh, bummer, I blew it on the table create by including all the columns of the source data *doah*.

You can see that the source data has 13 columns.  The table has only 9, and they are referenced by column name in the LOAD DATA LOCAL INFILE query.  Of _course_ it will stuff NULLs in the missing data, and it appears that the column-referenced LOAD worked as designed in my previous example.  Here, I want to do an ETL from 13 fields to 9.

I will attach a file with the _correct_ process of creating the toy db, table, performing the query, and resultant set.  You will notice that columns 1 - 9 are used in sequential fashion, as opposed to the columns that I specified in the query, 1-4,7,and 10-13.
[22 Jul 2008 5:28] Rusty Weise
Create db, table, populate wtih LOAD DATA query, and query for resultant set.

Attachment: bug38091_correct_table.txt (text/plain), 1.28 KiB.

[31 Jul 2008 19:39] Sveta Smirnova
Thank you for the feedback.

You have lines like following:

1, 'stat col2', 300, 41234, 'this is a var char 550', 'varchar,610', 0, 'boring8', 'boring9'

Notice fields are really terminated by ", " (comma and space) and not "," (comma), so mysqld ignores quotes inside fields.
[7 Aug 2008 4:19] Rusty Weise
I continue to be unclear, and considering this is now marked as "Not a Bug", I don't expect to achieve resolution to the original problem statement.  The data, to which you are referring, is simply a copy of row data from the MySQL Query Browser resultset when I did a, "select * from mytable limit 5"; the source data is well formatted with comma delimited values with optional double-quotes.

In as clear of a description as I am able to muster, the problem statement is:

My source data is a comma separated value (csv) file with optional double-quote field data modifiers and is comprised of 50 unique columns.  The column names are defined with identifiers that contain spaces, reserved characters, and reserved words.

My destination table is comprised of a subset of those 50 columns; some of which contains special characters, spaces, and reserved words; but match the source column identifiers verbatim.  The order of appearance of the columns in the table does not match the order of appearance in the source csv.

Because the order of the source column data is not predictable, I wish to use the "LOAD DATA LOCAL INFILE" command and specify the column names in the source table, which match those in the destination table _identically_ in order to update the destination table by column name reference.

The "LOAD DATA LOCAL INFILE" command appears to only load the data in the cells in order of presentation in the source csv.  That is to say, if there were 5 columns in the source data and I specified the name of the odd columns to load, I would instead only get the first through third column data regardless of the name of the columns.

I don't want to be redundant any more than I already have here, so please refer to comment #0 where I explain the source data table design, and the desired goals.  Please, explain how I can be any clearer as to what the root of the problem is and how I might work around this problem using native MySQL commands.