Bug #3632 | load data local ... lines starting by = inserts bad data | ||
---|---|---|---|
Submitted: | 3 May 2004 5:35 | Modified: | 27 Jan 2012 19:51 |
Reporter: | dave mausner | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: DML | Severity: | S4 (Feature request) |
Version: | 5.1 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[3 May 2004 5:35]
dave mausner
[3 May 2004 5:51]
dave mausner
test data for #3632, load data ... starting by
Attachment: bug.zip (application/x-zip-compressed, text), 105.06 KiB.
[4 May 2004 21:52]
Michael Widenius
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.mysql.com/documentation/ and the instructions on how to report a bug at http://bugs.mysql.com/how-to-report.php Additional info: I took a quick look a the test case and found some errors in it: The test data was terminated with \r\n, not by \n, which is the default line terminator. Another problem was that LINES STARTING BY xxx means that the MySQL will assume the lines start is at the next occurence of xxx. In the test you used 01 and 02 as LINES STARTING BY, but as the data also included 01 and 02 MySQL did read in some data from these lines. Fix: Change that lines starts with X1 and X2 instead of 01 and 02 Use the command: LOAD DATA LOCAL INFILE 'bug.dat' INTO TABLE mm FIELDS TERMINATED BY ',' ENCLOSED BY 0x10 LINES STARTING BY 'X1' TERMINATED BY "\r\n" (CD,CS,ED,EM,F2,F3,FN,L2,L3,LN,LT,LV,MA,MC,PH,SD,ST,ZC); To read in the data. I will update the MySQL manual to make this the LINES STARTING BY option more clear. Regards, Monty
[6 May 2004 15:37]
dave mausner
The exact text from the mySQL manual is: "If all the lines you want to read in has a common prefix that you want to skip, you can use LINES STARTING BY prefix_string for this. In other words, the defaults cause LOAD DATA INFILE to act as follows when reading input: · Look for line boundaries at newlines. · If LINES STARTING BY prefix is used, read until prefix is found and start reading at character after prefix. If line doesn't include prefix it will be skipped." The key phrase is "...all the lines you want to read in [have] a common prefix that you want to skip..." A prefix is defined as something that comes before everything else. When saying that lines have a prefix, you are saying that the prefix occurs before everything else on the line. I question whether the code should be changed to usefully represent the doc, or the doc changed to suit a less-useful coding. In the real world, we find that records may have record codes in fixed positions, generally (but not always) at the start of the record -- a prefix. however, other LOAD DATA solutions include a means to specify the exact position of that code. It should NOT be left to chance to find the "prefix" just anywhere in the record, and it is unrealistic to assert that one should change the data to suit mySQL's behavior. Here are my suggestions: (1) let LINES STARTING BY 'X' continue to mean "line contains 'X' anywhere and data prior to 'X' in the record is skipped" and document this as such. (2) add LINES STARTING BY 'X' POSITION N which means "line contains 'X' beginning at character position N relative to the beginning of the record. thanks. if you suggest it, i will submit this idea in any formal manner.
[14 Apr 2006 19:30]
dave mausner
Please convert to feature request.
[4 Feb 2009 14:21]
Susanne Ebrecht
Many thanks for writing a feature request. Our development will discuss this.
[27 Jan 2012 19:51]
dave mausner
Ping. Checking to see if this feature request can be considered. Review: allow LINES STARTED BY to have an additional syntax to specify fixed-position prefix.