Bug #68494 | load data infile handles empty field being in 1st line inconsistently | ||
---|---|---|---|
Submitted: | 26 Feb 2013 9:17 | Modified: | 8 Apr 2013 15:11 |
Reporter: | Balna Borju | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Documentation | Severity: | S3 (Non-critical) |
Version: | 5.5.17 | OS: | Windows (win7 home premium) |
Assigned to: | Paul DuBois | CPU Architecture: | Any |
Tags: | 1st, first, infile, line, load |
[26 Feb 2013 9:17]
Balna Borju
[26 Feb 2013 9:21]
Balna Borju
please ignore the site specific leading path in the sample sql script
[26 Feb 2013 10:56]
Peter Laursen
On 5.5.30 everything seems OK to me: DROP TABLE IF EXISTS fa; CREATE TABLE fa ( c01 INT, c02 INT, c03 INT ); /* file in.csv reads: "22372";"9653";"10577" "95352";"";"115184" */ SET sql_mode = ''; LOAD DATA INFILE 'c:\\in.csv' INTO TABLE fa FIELDS TERMINATED BY ';' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES; SHOW WARNINGS; /* Level Code Message ------- ------ ------------------------------------------------------- Warning 1366 Incorrect integer value: '' for column 'c02' at row 1 */ SELECT * FROM fa; /* c01 c02 c03 ------ ------ -------- 95352 0 115184 */ LOAD DATA INFILE 'c:\\in.csv' INTO TABLE fa FIELDS TERMINATED BY ';' ENCLOSED BY '"' LINES TERMINATED BY '\r\n'; SHOW WARNINGS; /* Level Code Message ------- ------ ------------------------------------------------------- Warning 1366 Incorrect integer value: '' for column 'c02' at row 2 */ SELECT * FROM fa; /* c01 c02 c03 ------ ------ -------- 95352 0 115184 22372 9653 10577 95352 0 115184 */ TRUNCATE TABLE fa; /* file in2.csv reads: 22372;9653;10577 95352;;115184 */ LOAD DATA INFILE 'c:\\in2.csv' INTO TABLE fa FIELDS TERMINATED BY ';' LINES TERMINATED BY '\r\n' IGNORE 1 LINES; SELECT * FROM fa; /* c01 c02 c03 ------ ------ -------- 95352 0 115184 */ TRUNCATE TABLE fa; SET sql_mode = 'strict_trans_tables'; LOAD DATA INFILE 'c:\\in2.csv' INTO TABLE fa FIELDS TERMINATED BY ';' LINES TERMINATED BY '\r\n' IGNORE 1 LINES; -- Error Code: 1366 -- Incorrect integer value: '' for column 'c02' at row 1 SELECT * FROM fa; -- empty set If I should point to anything, I think that the quoted passage from the docs should mention that the behavior in this respect depends on the sql_mode.
[26 Feb 2013 15:13]
Balna Borju
The 3rd mentioned test case produces error#1366 on 5.5.17, so it seems to be corrected since then. My main concern was that the ordering of the input lines influenced the outcome of the LOAD DATA INFILE, depending on whether the first line contained an empty field or not.
[28 Feb 2013 19:03]
Sveta Smirnova
Thank you for the report. I was able to see warning (error 1366), but it did not depend from position of a line with empty field in the file. In my point of view this is correct behavior, described at http://dev.mysql.com/doc/refman/5.5/en/load-data.html: "If an input line has too few fields, the table columns for which input fields are missing are set to their default values. Default value assignment is described in Section 11.5, “Data Type Default Values”. " Default value for INT data type is 0, but if you insert 0 to an integer field which has no explicit DEFAULT in its definition, warning should be generated. At the same time text right below quoted is confusing: ----<q>---- An empty field value is interpreted differently than if the field value is missing: For string types, the column is set to the empty string. For numeric types, the column is set to 0. For date and time types, the column is set to the appropriate “zero” value for the type. See Section 11.3, “Date and Time Types”. ----</q>---- It is absolutely not clear how differ "empty field value" and "missed field value". How csv file should look like in this case? So this bug verified as documentation bug.
[8 Apr 2013 15:04]
Paul DuBois
Here's what's happening: Column position doesn't matter; this is not specific to the first column. IGNORE doesn't matter either. The behavior described in the manual is what occurs by default. If sql_mode is set to a more restrictive value, some warnings become errors. For example, if you set sql_mode='TRADITIONAL', you will see the errors reported. With sql_mode='', they are just warnings. Note that if LOAD DATA LOCAL is used, even with a restrictive sql_mode value, the errors will be treated as warnings because the server has no way to stop file transmission from the client. I'll check into clarifying this in the manual.
[8 Apr 2013 15:11]
Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products. Added text: Treatment of empty or incorrect field values differs from that just described if the SQL mode is set to a restrictive value. For example, if sql_mode='TRADITIONAL, conversion of an empty value or a value such as 'x' for a numeric column results in an error, not conversion to 0. (With LOCAL, warnings rather than errors occur, even with a restrictive sql_mode value, because the server has no way to stop transmission of the file in the middle of the operation.)
[18 Jul 2014 0:21]
Alex Huth
I am seeing the same inconsistent first-line behavior as described in this bug report, running MySQL server version 5.6.12-enterprise-commercial-advanced. Please note that the previous few comments are inaccurate in several respects: 1. The inconsistent behavior is specific to the ROW position, not the COLUMN position. 2. While the IGNORE statement does not matter, what is being discussed here is the IGNORE number LINES option available later in the LOAD DATA syntax, and it definitely does matter. 3. The problem is not with documentation or sql_mode, the problem is that identical statements executed with identical settings are behaving inconsistently with respect to warning/error 1366 (as I will explain). In Workbench, I ran three successive LOAD DATA statements to load CSV data, chunked into three 10M-row files. The first two statements succeeded, while the third failed. I literally copy-and-pasted these statements, changing only one character of the file name between them, and executed them as a set, with sql_mode unchanged throughout. The first two statements showed millions of 1366 warnings but succeeded; the first warnings occur at rows 73 and 97, respectively. However, the third statement failed with Error 1366, occurring at row 1. The only significant difference between these files is at which row the first incorrect decimal value is encountered. (In this case, the incorrect value was an empty string.) In each statement, I used IGNORE 1 LINES to skip a header row in the CSV file. Each of the three files contains that single header row. When I changed the third statement from IGNORE 1 LINES to IGNORE 2 LINES, it no longer failed with Error 1366, but executed successfully with millions of 1366 warnings (the first occurring at row 59), just as in the first two cases. I then tried to manually INSERT the row that I had skipped from the third file in order to get the LOAD DATA statement to run. This failed with Error 1366. I then added a row of dummy data (with id 999999999, for easy deletion later) before my skipped row in the INSERT statement, like this: INSERT INTO tbl VALUES ('999999999', '0', '0'), ('12345', '', ''); With the successful result: 2 row(s) affected, 2 warning(s): 1366 Incorrect decimal value: '' for column 'intptlat10' at row 2 1366 Incorrect decimal value: '' for column 'intptlon10' at row 2 Records: 2 Duplicates: 0 Warnings: 2 So this inconsistent behavior affects not only LOAD DATA, but also INSERT. I tried to replicate this behavior in a fresh table and found that the inconsistency affects MyISAM tables but not InnoDB tables. I tunneled into the server machine (running RHEL 6.3) to run the following statements, in case the inconsistency was related to Workbench: --- mysql> create table t (id int primary key, a decimal) engine=myisam; Query OK, 0 rows affected (0.01 sec) mysql> insert into t values (1, ''); ERROR 1366 (HY000): Incorrect decimal value: '' for column 'a' at row 1 mysql> insert into t values (1, ''), (2, 0); ERROR 1366 (HY000): Incorrect decimal value: '' for column 'a' at row 1 mysql> insert into t values (1, 0), (2, ''); Query OK, 2 rows affected, 1 warning (0.00 sec) Records: 2 Duplicates: 0 Warnings: 1 mysql> create table t2 (id int primary key, a decimal) engine=innodb; Query OK, 0 rows affected (0.16 sec) mysql> insert into t2 values (1, ''); ERROR 1366 (HY000): Incorrect decimal value: '' for column 'a' at row 1 mysql> insert into t2 values (1, 0), (2, ''); ERROR 1366 (HY000): Incorrect decimal value: '' for column 'a' at row 2 In summary, this is a bug because the success or failure of a legitimate INSERT or LOAD DATA statement on a MyISAM table is not expected to have anything to do with the order of the rows.
[18 Jul 2014 21:31]
Alex Huth
After further investigation, it looks this is inconsistent by design, with the relevant section of the documentation here: http://dev.mysql.com/doc/refman/5.6/en/sql-mode.html#sql-mode-important "STRICT_TRANS_TABLES If a value could not be inserted as given into a transactional table, abort the statement. For a nontransactional table, abort the statement if the value occurs in a single-row statement or the first row of a multiple-row statement. More details are given later in this section."
[24 Aug 2014 5:35]
Daniel Erry
something else to note is the inconsistent results that takes place on a transforms. table definition: CREATE TABLE `incomebycustomerdetail` ( `RowNumber` int(11) DEFAULT NULL, `Text` varchar(255) DEFAULT NULL, `Blank` varchar(255) DEFAULT NULL, `RowData` varchar(255) DEFAULT NULL, `RowType` varchar(255) DEFAULT NULL, `TxnType` varchar(255) DEFAULT NULL, `Date` date DEFAULT NULL, `RefNumber` varchar(255) DEFAULT NULL, `Account` varchar(255) DEFAULT NULL, `Amount` decimal(19,4) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; input data: Example data from file Please note: 1) these two rows are identical expect the ID 2) Several Fields are blank 3) The very last field is defined as a nullable decimal field RowNumber|Text|Blank|RowData|RowType|TxnType|Date|RefNumber|Account|Amount 1|AAE0207121|||TextRow||||| 2|AAE0207121|||TextRow||||| Load Statement: truncate table QuickbooksReplication.IncomeByCustomerDetail; SET sql_mode = ''; LOAD DATA INFILE 'C:/Load/QBLIncomeByCustomerDetail.csv' INTO TABLE QuickbooksReplication.IncomeByCustomerDetail FIELDS TERMINATED BY '|' lines terminated by '\n' IGNORE 1 LINES ( @RowNumber , @Text , @Blank , @RowData , @RowType , @TxnType , @Date , @RefNumber , @Account , @Amount) set `RowNumber`=case when @RowNumber = '' then null else @RowNumber end ,`Text`=case when @Text = '' then null else @Text end ,`Blank`=case when @Blank = '' then null else @Blank end ,`RowData`=case when @RowData = '' then null else @RowData end ,`RowType`=case when @RowType = '' then null else @RowType end ,`TxnType`=case when @TxnType = '' then null else @TxnType end ,`Date`=case when @Date = '' then null else @Date end ,`RefNumber`=case when @RefNumber = '' then null else @RefNumber end ,`Account`=case when @Account = '' then null else @Account end ,`Amount`=case when @Amount = '' then null else @Amount end result: # RowNumber, Text, Blank, RowData, RowType, TxnType, Date, RefNumber, Account, Amount '1', 'AAE0207121', NULL, NULL, 'TextRow', NULL, NULL, NULL, NULL, '0.0000' '2', 'AAE0207121', NULL, NULL, 'TextRow', NULL, NULL, NULL, NULL, NULL The Amount Fields are different. One is NULL as expected the other is a 0 which is not expected. I am inserting identical records into the database and both records are going through the same transform before being put into the table but I am getting two different results depending on if the record is in the first row or the second row. I have read the documentation about how the first row is treated differently and how the default value is plugged if the field is missing on the first row. What makes this bad is the fact that this takes place before my transform takes place which results in inconsistent results depending on the row.