MySQL Bugs: #23212: allow missing data = NULL in LOAD DATA INFILE

Bug #23212	allow missing data = NULL in LOAD DATA INFILE
Submitted:	12 Oct 2006 12:15	Modified:	7 Nov 2006 16:12
Reporter:	Rob Blick	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: DML	Severity:	S4 (Feature request)
Version:	5.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Currently, LOAD DATA INFILE interprets missing fields according to the following (from the manual):

An empty field value is interpreted differently than if the field value is missing:

    * For string types, the column is set to the empty string.
    * For numeric types, the column is set to 0.
    * For date and time types, the column is set to the appropriate “zero” value for the type. See Section 11.3, “Date and Time Types”.

These are the same values that result if you assign an empty string explicitly to a string, numeric, or date or time type explicitly in an INSERT or UPDATE statement.

It would be very helpful if one could instruct LOAD DATA INFILE to consider missing fields as "NULL."  For example, say a table is defined as CREATE TABLE A (a int not null primary key, b int default null); and we have the following CSV data:

1,2
2,0
3,3
4,
5,3

Loading that data with LOAD DATA INFILE populates the table as follows:

1,2
2,0
3,3
4,0 (with a warning)
5,3

In some cases, it would be helpful to consider a missing value to really be a missing value (i.e., NULL), rather than converting it to 0 or empty string, which may also be represented in the CSV.

How to repeat:
N/A - this is a feature request.  Current behavior is described in the docs.

Suggested fix:
Provide an option to LOAD DATA INFILE to allow missing fields to be loaded as NULL.

Thank you for a reasonable feature request.

Rob,

Here is what I found in:
http://dev.mysql.com/doc/refman/5.1/en/load-data.html

--- Cut Here ---

Handling of NULL values varies according to the FIELDS and LINES  options in use:

For the default FIELDS and LINES values, NULL is written as a field value of \N for output, and a field value of \N is read as NULL for input (assuming that the ESCAPED BY character is ‘\’).

If FIELDS ENCLOSED BY is not empty, a field containing the literal word NULL as its value is read as a NULL value. This differs from the word NULL enclosed within FIELDS ENCLOSED BY characters, which is read as the string 'NULL'.

If FIELDS ESCAPED BY is empty, NULL is written as the word NULL.

With fixed-row format (which is used when FIELDS TERMINATED BY and FIELDS ENCLOSED BY are both empty), NULL is written as an empty string. Note that this causes both NULL values and empty strings in the table to be indistinguishable when written to the file because both are written as empty strings. If you need to be able to tell the two apart when reading the file back in, you should not use fixed-row format.

--- Cut Here ---

So, I think this has already solved part of your problem. I have already tried typing "\N" in a CSV file and it is regconized as NULL correctly. I don't know whether "\N" would be regconized as "DEFAULT" as well for "NOT NULL" columns.

I think the command should regconize something like "\D" for default values.
I suggest "\D" should work like this:

- If the column is "NOT NULL" and has no "DEFAULT" constraint, use the MySQL datatype default (ie, 0 for numbers, empty string for strings, first enum value for enums, etc.)

- If the column is "NULL" and has no "DEFAULT" constraint, use NULL

- If the column has a "DEFAULT" constraint, use that value

Thank you.

Kenji-
     Yes, I'm aware of using \N.  However, tools that generate CSV and tab-delimited files don't add \N, and rather than having to go through such files and add \N, it would be easier if MySQL treated missing data as null.  This is a feature request, not a bug, after all.

Hi,

I agree with Rob. Being able to import NULL values directly is almost a must have. Replacing empty fields in the source file is somewhat cumbersome...(-:

thanks

I agree with the last posts. I'm currently trying to load data from Sybase into mysql (bcp export) with no possibility to add \n for null values.

Would be great if there would be some way to specify a way for treating null values in the LOAD DATA INFILE syntax.

//jacob

I found a workaround for this problem by setting empty fields to null if they are empty in the bcp-file: 

load data infile 'import.bcp' into table imports (@id) set id = nullif(@id,'');

A simple solution is to use escaped by '\\' in the export statement.

Example export command:

select * from Test into outfile "/tmp/Testexport.csv" fields terminated by ',' enclosed by '"' escaped by '\\' lines terminated by '\r\n';

Example import command:
load data infile '/tmp/Testexport.csv' into table Test2 fields terminated by ',' enclosed by '"' lines terminated by '\r\n';

All your NULL fields will be preserved, both in the exported .csv file and in the table you import the .csv file into.

Note that NULL fields will be represented by \N in the .csv file.

Hiya guys, is anything happening with this feature? I am trying to import data bcp'ed from MSSQL and got mouthful of NULLs which get imported as empty strings. Please help! :)

this is definetely an issue to be resolved,
for large data sets one is now obliged to preprocess the file in some other program, and replace all empty fields by NULL.

Reading a large file into mysql to be able to process it is my main reason to load the file, so preprocessing it by another program looks silly.

hope some developer recognizes this issue,
tnx for the good work,
LH

Please bump this up. Being unable to import CSV files that were generated by any source other than another MySQL database is just silly.

The workarounds suggested here only work if you are exporting the data yourself (and can thus convert it into MySQL's odd, proprietary requirements), or if you type out a conversion function for every column that could receive a NULL value.

MySQL has grown into a mature DB that's still digestionally delayed. It would be great if we could feed MySQL big boy data without having to premasticate it first.

I'm not sure if this truly counts as a feature request.  It seems to me to be more of a fundamental problem that needs to be fixed.  Simply allowing some verbiage in the LOAD DATA syntax to map something (a blank string, a single space, some other character) to a NULL value (or any other value) would fix it and many other import problems.

I have data that is stored in 84,550 rows and 641 columns from a vendor.  Needing to use other software to look through each of the 54,196,550 values and replace " " with "\N" takes approximately 4 hours.  The LOAD DATA statement happily replaces " " with 0 and loads all of the data in about 5 minutes.

Greetings:

I would like to add my voice to those who advocate for being able to import "null" values when data is missing from CSV files.  Here's why:

When calculating averages, it makes a difference whether a value is zero or null: nulls can be ignored in the calculation, but zeros cannot.  By forcing the missing data to be zero, it is now impossible to calculate accurate averages.  Sales valued at "Zero Dollars" are something quite different from "No Sales."  

Adapting the data by hand is not an option: I just loaded a CSV file with close to one million rows - and over 65,000 of them are afflicted with this issue, generating almost 2 million warnings.  Ouch :)  Since I don't yet know whether these errors are meaningful, I guess I'll load the CSV file into MS Access and connect to it via ODBC and find out ... 

P.S.: in case anyone else is trying to hunt down these warnings: 
"| Warning | 1265 | Data truncated for column ...." 
check for missing data in those columns ....

Yes HELP! on this! Someone posted here mentioning to add a command when exporting the file, but that assumes we control the export.  I have a CSV file with 2 million plus rows we import every month and we have to use a 3rd party program to clean up the empty strings after import.  Need a default setting to handle this...and not have us have to specify what to do for each field as that too would be a lot.

I agree that this is a serious bug and not a feature request, since it makes it impossible to import data into nullable fields correctly, if you cannot preprocess the files (for example, you do not have file permissions, or the files are very large so preprocessing adds a lot of time, or the files are designed to be fed to multiple database vendors).

As others have mentioned, if you're importing into a nullable int column, you want to make sure you preserve the distinction between 0 and NULL.

+1 for the very good reasons given by other commenters

concur with all the other commenters - current state of affairs is heinous, a better way to load nulls is very desirable

This problem is near and dear to my heart.  I'm going to write a simple preprocessor in awk that will change all empty fields to "\N", but I'm one of the lucky ones where the load data is rarely more than a few hundred thousand rows.  So far, I've managed to avoid any preprocessing, and I hate having to do it.  The data is coming from a bunch of Netezza systems, and with these monster boxes, you never know when you're going to get a bajillion rows.  (Worst is that I'm going to have to preprocess every file from every box.)

So, the "Triage" field says "Needs Triage: D5 Feature request".  Is this going to ever be triaged?

Mike Steele

Please, Please, PLEASE elevate this to Major Bug status.  I have a 2.1M-record csv dump from an Ingres RDBMS that barely fits into Access, but at least it loads.  All fields in the CSV a(including values and dates) are double-quoted if they are not null, and not quoted if they're empty.  There are no NULL strings or values in the csv.  After six hours of various manipulations, I finally had recast the various INT and DECIMAL fields to VARCHAR in the receiving table to avoid the inevitable "ERROR 1366" when MySQL encountered an "empty" value field.  I came to MySQL because Access doesn't have the horsepower to handle tables of this size, but at least Access is flexible(?) enough to recognize and accommodate implicit null values in the input file.

I believe that this should be upgraded to a bug. Converting empty values to a 0 double is often the wrong behavior, and probably leads to unintentional data corruption for a lot of people.

Its absurd that such a basic request which is rather a bug and not feature is pending for all these years and people keep suffering.
May be its time to move away from MySql and embrace other open source databases like MariaDB.

I am having the same problem. It is anoying to preproccess giant tables (+1GB) to remove null values. 

This is a must have feature!

The command LOAD DATA should have a parameter like MISSING VALUE = EXPRESSION;

I was looking for null handling while loading a data file and came here searching for the solution

This is ridiculous and if there is no way to handle this, my God! How are we to handle empty values in the file? :-(

Does anyone have an alternate way to handle this?
Thanks a lot for the help in advance

I find it amazing that for over SIX YEARS, there was no implementation of this MUST HAVE feature after the initial request. Elsewhere I always hear people say how great and fast open source development goes, etc...

Is this going to be implemented in this century?

Yeah, this definitely needs to be bumped in priority. Importing CSV is a very common need, not being able to do so properly limits MySQL's usefulness.

Yet another vote for "more like a bug", and "wow, 8 years of ignorance now!". ;)

Also, the 

"Suggested fix: 
Provide an option to LOAD DATA INFILE to allow missing fields to be loaded as NULL." 

should not be necessary. Just setting a DB field to "DEFAULT NULL" should imply the right thing! A missing value in the CSV quite obviously calls for a default, if one is specified (and an error, if not). Empty strings should just be quoted.

(Loading CSV files where missing values do mean empty strings, is already possible: just leave the delimiter empty.)

(Sorry, please ignore my previous comment: I was too eager to submit it, and didn't realize that if empty strings are quoted, then everything is quoted, which may not always be desirable. So, the original description in the feature request is still probably the best way to handle it.)

I agree this is a very severe problem, which cannot be emphasised strongly enough. It's a huge barrier to the usage of MySql.
I'm not sure whoever imagined that missing = a value except for fields where nulls are disallowed. It should be the default with an override option to make it assume null (both of which should be relatively easy to implement?) and if possible the ability to choose it per field (a to-do for the future).
It's been so overdue for years - can anyone suggest a reason why it should be hard to implement...? I'm sure we would all chip in for a reward for whoever will do it!!
david

I'd just like to chip in and say the data loading tools for MySQL are pretty poor in general - they always seem to stop short of reasonable functionality. Given the Import Data Wizard in MySQL Workbench can't handle CSV files where the field separator is a *comma* (a deranged decision), LOAD DATA INFILE is pretty much the only way to import CSV data without risking wrecking my data doing a bunch of find/replace operations in a text editor. Having the LOAD DATA INFILE tool make assumptions about my  data that I can't correct is just silly.

Being able to load data sets correctly and with the minimum of fuss is essential for testing database applications.

+1 on this topic. It would be nice for MySQL to sort this out on the database end but will have to code some find/replace or regex on the csv before passing to the DB I think.

Are you freakin serious that two subsequent delimiters are not understood by MySQL loader as null, and that this ridiculous bug exists since ten years ago? Wow, that's a god level of incompetence, well done.

2019 and still not fixed. It's unacceptable that for a 70 fields long csv you have to specify 70 times "FIELD = NULLIF(@field, '')".

2019 and still not fixed. It's unacceptable that for a 70 fields long csv you have to specify 70 times "FIELD = NULLIF(@field, '')".