Bug #68494 load data infile handles empty field being in 1st line inconsistently
Submitted: 26 Feb 2013 9:17 Modified: 8 Apr 2013 15:11
Reporter: Balna Borju Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.5.17 OS:Windows (win7 home premium)
Assigned to: Paul DuBois CPU Architecture:Any
Tags: 1st, first, infile, line, load

[26 Feb 2013 9:17] Balna Borju
Description:
The manual says about LOAD DATA INFILE:
"An empty field value is interpreted differently than if the field value is missing: ... For numeric types, the column is set to 0."

This is right generally, but if the empty field is in the first line of the input text file, and it belongs to an INT type field, the following error occures:

ERROR 1366 (HY000) at line 10: Incorrect integer value: '' for column 'c02' at row 1

This is aware of the IGNORE x LINES statement, ie. if you ignore the first x lines, and the x+1th line contains an empty field, the error also occures.

I did not try it with other field types, and it seems that it does not depend on the used charset and collation.

How to repeat:
create the following input file called fa.csv:

22372;9653;10577
95352;;115184

the following script causes the mentioned error, if you uncomment the "ignore.." line, whereas it runs correctly, if it does not ignore the first line:

use test2;

DROP TABLE IF EXISTS fa;
CREATE  TABLE fa (
  c01 INT,
  c02 INT,
  c03 INT
);

LOAD DATA INFILE '/munka/Zsuzsi/allomasok_2013febr/proba/fa.csv'
    INTO TABLE fa
    FIELDS TERMINATED BY ';'
#    IGNORE 1 LINES
    ;
[26 Feb 2013 9:21] Balna Borju
please ignore the site specific leading path in the sample sql script
[26 Feb 2013 10:56] Peter Laursen
On 5.5.30 everything seems OK to me:

DROP TABLE IF EXISTS fa;
CREATE  TABLE fa (
  c01 INT,
  c02 INT,
  c03 INT
);

/* file in.csv reads:

"22372";"9653";"10577"
"95352";"";"115184"
*/

SET sql_mode = '';

LOAD DATA INFILE 'c:\\in.csv'
    INTO TABLE fa
    FIELDS TERMINATED BY ';' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' 
    IGNORE 1 LINES;
    
SHOW WARNINGS;
/*
Level      Code  Message                                                
-------  ------  -------------------------------------------------------
Warning    1366  Incorrect integer value: '' for column 'c02' at row 1
*/

SELECT * FROM fa;  
/*
   c01     c02     c03  
------  ------  --------
 95352       0    115184
*/
    
LOAD DATA INFILE 'c:\\in.csv'
    INTO TABLE fa
    FIELDS TERMINATED BY ';' ENCLOSED BY '"' LINES TERMINATED BY '\r\n';
    
SHOW WARNINGS;
/*
Level      Code  Message                                                
-------  ------  -------------------------------------------------------
Warning    1366  Incorrect integer value: '' for column 'c02' at row 2  
*/
    
SELECT * FROM fa;
/*
   c01     c02     c03  
------  ------  --------
 95352       0    115184
 22372    9653     10577
 95352       0    115184
 */
 
TRUNCATE TABLE fa;
 
 /* file in2.csv reads:

22372;9653;10577
95352;;115184
*/

LOAD DATA INFILE 'c:\\in2.csv'
    INTO TABLE fa
    FIELDS TERMINATED BY ';' LINES TERMINATED BY '\r\n' 
    IGNORE 1 LINES;

SELECT * FROM fa;
/*
   c01     c02     c03  
------  ------  --------
 95352       0    115184
*/

TRUNCATE TABLE fa;

SET sql_mode = 'strict_trans_tables';

LOAD DATA INFILE 'c:\\in2.csv'
    INTO TABLE fa
    FIELDS TERMINATED BY ';' LINES TERMINATED BY '\r\n' 
    IGNORE 1 LINES;
-- Error Code: 1366
-- Incorrect integer value: '' for column 'c02' at row 1 

SELECT * FROM fa;
-- empty set

If I should point to anything, I think that the quoted passage from the docs should mention that the behavior in this respect depends on the sql_mode.
[26 Feb 2013 15:13] Balna Borju
The 3rd mentioned test case produces error#1366 on 5.5.17, so it seems to be corrected since then.

My main concern was that the ordering of the input lines influenced the outcome of the LOAD DATA INFILE, depending on whether the first line contained an empty field or not.
[28 Feb 2013 19:03] Sveta Smirnova
Thank you for the report.

I was able to see warning (error 1366), but it did not depend from position of a line with empty field in the file.

In my point of view this is correct behavior, described at http://dev.mysql.com/doc/refman/5.5/en/load-data.html: "If an input line has too few fields, the table columns for which input fields are missing are set to their default values. Default value assignment is described in Section 11.5, “Data Type Default Values”. "

Default value for INT data type is 0, but if you insert 0 to an integer field which has no explicit DEFAULT in its definition, warning should be generated.

At the same time text right below quoted is confusing:

----<q>----
An empty field value is interpreted differently than if the field value is missing:

    For string types, the column is set to the empty string.

    For numeric types, the column is set to 0.

    For date and time types, the column is set to the appropriate “zero” value for the type. See Section 11.3, “Date and Time Types”. 

----</q>----

It is absolutely not clear how differ "empty field value" and "missed field value". How csv file should look like in this case?

So this bug verified as documentation bug.
[8 Apr 2013 15:04] Paul DuBois
Here's what's happening:

Column position doesn't matter; this is not specific to the first column.

IGNORE doesn't matter either.

The behavior described in the manual is what occurs by default. If sql_mode is set to a more restrictive value, some warnings become errors. For example, if you set sql_mode='TRADITIONAL', you will see the errors reported. With sql_mode='', they are just warnings.

Note that if LOAD DATA LOCAL is used, even with a restrictive sql_mode value, the errors will be treated as warnings because the server has no way to stop file transmission from the client.

I'll check into clarifying this in the manual.
[8 Apr 2013 15:11] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.

Added text:

Treatment of empty or incorrect field values differs from that just
described if the SQL mode is set to a restrictive value. For example,
if sql_mode='TRADITIONAL, conversion of an empty value or a value
such as 'x' for a numeric column results in an error, not conversion
to 0. (With LOCAL, warnings rather than errors occur, even with a
restrictive sql_mode value, because the server has no way to stop
transmission of the file in the middle of the operation.)
[18 Jul 2014 0:21] Alex Huth
I am seeing the same inconsistent first-line behavior as described in this bug report, running MySQL server version 5.6.12-enterprise-commercial-advanced.

Please note that the previous few comments are inaccurate in several respects:

1. The inconsistent behavior is specific to the ROW position, not the COLUMN position.
2. While the IGNORE statement does not matter, what is being discussed here is the IGNORE number LINES option available later in the LOAD DATA syntax, and it definitely does matter.
3. The problem is not with documentation or sql_mode, the problem is that identical statements executed with identical settings are behaving inconsistently with respect to warning/error 1366 (as I will explain).

In Workbench, I ran three successive LOAD DATA statements to load CSV data, chunked into three 10M-row files. The first two statements succeeded, while the third failed. I literally copy-and-pasted these statements, changing only one character of the file name between them, and executed them as a set, with sql_mode unchanged throughout.

The first two statements showed millions of 1366 warnings but succeeded; the first warnings occur at rows 73 and 97, respectively. However, the third statement failed with Error 1366, occurring at row 1.

The only significant difference between these files is at which row the first incorrect decimal value is encountered. (In this case, the incorrect value was an empty string.)

In each statement, I used IGNORE 1 LINES to skip a header row in the CSV file. Each of the three files contains that single header row. When I changed the third statement from IGNORE 1 LINES to IGNORE 2 LINES, it no longer failed with Error 1366, but executed successfully with millions of 1366 warnings (the first occurring at row 59), just as in the first two cases.

I then tried to manually INSERT the row that I had skipped from the third file in order to get the LOAD DATA statement to run. This failed with Error 1366.

I then added a row of dummy data (with id 999999999, for easy deletion later) before my skipped row in the INSERT statement, like this:

INSERT INTO tbl VALUES ('999999999', '0', '0'), ('12345', '', '');

With the successful result:

2 row(s) affected, 2 warning(s): 1366 Incorrect decimal value: '' for column 'intptlat10' at row 2 1366 Incorrect decimal value: '' for column 'intptlon10' at row 2 Records: 2  Duplicates: 0  Warnings: 2

So this inconsistent behavior affects not only LOAD DATA, but also INSERT. I tried to replicate this behavior in a fresh table and found that the inconsistency affects MyISAM tables but not InnoDB tables. 

I tunneled into the server machine (running RHEL 6.3) to run the following statements, in case the inconsistency was related to Workbench:

---

mysql> create table t (id int primary key, a decimal) engine=myisam;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t values (1, '');
ERROR 1366 (HY000): Incorrect decimal value: '' for column 'a' at row 1
mysql> insert into t values (1, ''), (2, 0);
ERROR 1366 (HY000): Incorrect decimal value: '' for column 'a' at row 1
mysql> insert into t values (1, 0), (2, '');
Query OK, 2 rows affected, 1 warning (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 1

mysql> create table t2 (id int primary key, a decimal) engine=innodb;
Query OK, 0 rows affected (0.16 sec)

mysql> insert into t2 values (1, '');
ERROR 1366 (HY000): Incorrect decimal value: '' for column 'a' at row 1
mysql> insert into t2 values (1, 0), (2, '');
ERROR 1366 (HY000): Incorrect decimal value: '' for column 'a' at row 2

In summary, this is a bug because the success or failure of a legitimate INSERT or LOAD DATA statement on a MyISAM table is not expected to have anything to do with the order of the rows.
[18 Jul 2014 21:31] Alex Huth
After further investigation, it looks this is inconsistent by design, with the relevant section of the documentation here:

http://dev.mysql.com/doc/refman/5.6/en/sql-mode.html#sql-mode-important

"STRICT_TRANS_TABLES

If a value could not be inserted as given into a transactional table, abort the statement. For a nontransactional table, abort the statement if the value occurs in a single-row statement or the first row of a multiple-row statement. More details are given later in this section."
[24 Aug 2014 5:35] Daniel Erry
something else to note is the inconsistent results that takes place on a transforms.

table definition: 
CREATE TABLE `incomebycustomerdetail` (
  `RowNumber` int(11) DEFAULT NULL,
  `Text` varchar(255) DEFAULT NULL,
  `Blank` varchar(255) DEFAULT NULL,
  `RowData` varchar(255) DEFAULT NULL,
  `RowType` varchar(255) DEFAULT NULL,
  `TxnType` varchar(255) DEFAULT NULL,
  `Date` date DEFAULT NULL,
  `RefNumber` varchar(255) DEFAULT NULL,
  `Account` varchar(255) DEFAULT NULL,
  `Amount` decimal(19,4) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

input data:

Example data from file 
Please note:
1) these two rows are identical expect the ID
2) Several Fields are blank
3) The very last field is defined as a nullable decimal field

RowNumber|Text|Blank|RowData|RowType|TxnType|Date|RefNumber|Account|Amount
1|AAE0207121|||TextRow|||||
2|AAE0207121|||TextRow|||||

Load Statement:

truncate table QuickbooksReplication.IncomeByCustomerDetail;
SET sql_mode = '';
LOAD DATA INFILE 'C:/Load/QBLIncomeByCustomerDetail.csv' INTO TABLE QuickbooksReplication.IncomeByCustomerDetail
FIELDS TERMINATED BY '|'
lines terminated by '\n' 
IGNORE 1 LINES
( @RowNumber
, @Text
, @Blank
, @RowData
, @RowType
, @TxnType
, @Date
, @RefNumber
, @Account
, @Amount)
set
`RowNumber`=case when @RowNumber = '' then null else @RowNumber end
,`Text`=case when @Text = '' then null else @Text end
,`Blank`=case when @Blank = '' then null else @Blank end
,`RowData`=case when @RowData = '' then null else @RowData end
,`RowType`=case when @RowType = '' then null else @RowType end
,`TxnType`=case when @TxnType = '' then null else @TxnType end
,`Date`=case when @Date = '' then null else @Date end
,`RefNumber`=case when @RefNumber = '' then null else @RefNumber end
,`Account`=case when @Account = '' then null else @Account end
,`Amount`=case when @Amount = '' then null else @Amount end

result:
# RowNumber, Text, Blank, RowData, RowType, TxnType, Date, RefNumber, Account, Amount
'1', 'AAE0207121', NULL, NULL, 'TextRow', NULL, NULL, NULL, NULL, '0.0000'
'2', 'AAE0207121', NULL, NULL, 'TextRow', NULL, NULL, NULL, NULL, NULL

The Amount Fields are different. One is NULL as expected the other is a 0 which is not expected. I am inserting identical records into the database and both records are going through the same transform before being put into the table but I am getting two different results depending on if the record is in the first row or the second row. 

I have read the documentation about how the first row is treated differently and how the default value is plugged if the field is missing on the first row. What makes this bad is the fact that this takes place before my transform takes place which results in inconsistent results depending on the row.