| Bug #10510 | LOAD DATA INFILE and Skipping Columns | ||
|---|---|---|---|
| Submitted: | 10 May 2005 13:52 | Modified: | 18 Jul 2005 18:25 |
| Reporter: | Dr. No | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Documentation | Severity: | S3 (Non-critical) |
| Version: | 4.1 | OS: | Windows (Windows) |
| Assigned to: | Paul DuBois | CPU Architecture: | Any |
[10 May 2005 14:03]
MySQL Verification Team
According the Manual: http://dev.mysql.com/doc/mysql/en/load-data.html By default, when no column list is provided at the end of the LOAD DATA INFILE statement, input lines are expected to contain a field for each table column. If you want to load only some of a table's columns, specify a column list: mysql> LOAD DATA INFILE 'persondata.txt' -> INTO TABLE persondata (col1,col2,...);
[10 May 2005 14:11]
Dr. No
Well, when entering this I followed a suggestion from Lachlan Mulcahy from MySql AB. See http://forums.mysql.com/read.php?10,25180,25316#msg-25316 This was a feature request and I really believe that this functionality is missing and probably many users are struggling with this. Although one can leave out columns at the end of an input file it is not possible to leave out intermediate columns. When dealing with frequent uploads of Gigabytes of data it is not really an option to create temporary files in order to strip out the not needed columns or to create temporary tables with all columns. It would mean endless I/O operation on the server. I really think you should reopen this issue.
[10 May 2005 14:23]
Guilhem Bichot
Hello! In fact, what you are asking for already exists in MySQL 5.0, just the manual needs being updated (it's a fresh development). I'll get the doc updated and paste the documentation URL here.
[5 Jun 2005 12:06]
Dr. No
Hi There, I am eager to try this new feature out. Any hint on the syntax? Dr.No
[7 Jun 2005 14:35]
Dmitry Lenev
Hi!
Extended syntax for LOAD DATA looks like:
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt'
[REPLACE | IGNORE]
INTO TABLE tbl_name
[FIELDS
[TERMINATED BY '\t']
[[OPTIONALLY] ENCLOSED BY '']
[ESCAPED BY '\\' ]
]
[LINES
[STARTING BY '']
[TERMINATED BY '\n']
]
[IGNORE number LINES]
[(col_name_or_user_var,...)]
[ SET col_name = expr, ... ]
1) Now user may specify user variables and column names in target list (and not only column names as it was before).
2) Optional SET clause was added. One can use this clause to specify values for columns not mentioned in target list using expressions which can both user variables and columns from target list.
Each time LOAD DATA reads line from input file, it splits line into fields according to its options and stores field value either in user variable or directly in column of target table, depending which of them we have in corresponding position in target list. After that expressions in SET clause are evaluated and their values assigned to appropriate columns. Then filled row is inserted into table.
So statement from your example shoul look like:
LOAD DATA LOCAL INFILE 'myfile.dat'
INTO TABLE T_MYDATA
FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'
LINES TERMINATED BY '\r\n' STARTING BY ''
IGNORE 1 LINES
(SOURCE_ID, SATT_ID, @dummy_var, ATT_DIR,
NUM_ATT, ATT_SEQ, ATT_TYPE, ATT_VAL);
[18 Jul 2005 18:25]
Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant product(s). Additional info: The new capability in 5.0.3 for using user variables and the SET clause allows columns to be skipped. There is an example in the LOAD DATA section now.

Description: It is currently not possible to skip intermediate columns from an input file using the LOAD DATA INFILE syntax. Many input files have more columns than the table one wants to fill. Writing temporary files in order to strip out the unneded columns isn't a good option especially when dealing with frequent uploads of gigabytes. The request is to add functionality to the LOAD DATA INFILE command in order to leave out intermediate columns. How to repeat: LOAD DATA LOCAL INFILE 'myfile.dat' INTO TABLE T_MYDATA FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\r\n' STARTING BY '' IGNORE 1 LINES (SOURCE_ID,SATT_ID, , ATT_DIR,NUM_ATT,ATT_SEQ,ATT_TYPE ,ATT_VAL); In this example the third column should be skipped but this will result in an error since there is no target column specified for the third column. Suggested fix: Add functionality to ignore columns. The command syntax need not to change. An empty string in between two commas would simply mean that this column will be skipped.