Bug #37681 CSV Engine needs nullable columns
Submitted: 26 Jun 2008 23:24 Modified: 1 Jul 2008 4:57
Reporter: Duane Hitz Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server: CSV Severity:S4 (Feature request)
Version:5.1+ OS:Any
Assigned to: CPU Architecture:Any
Tags: csv, null

[26 Jun 2008 23:24] Duane Hitz
Description:
The CSV engine appears to be a really useful feature, however, the inability to have nullable columns makes the CSV engine somewhat useless for real-world applications.

Some examples of real-world applications might be:

- Using almost any existing CSV file (I'm hard presed to remember an actual CSV file that didn't have null values),
- Quickly importing data sets provided by external vendors,
- Exporting user-provided Microsoft Excel files to CSV to integrate user-provided data into analytics,
- Quickly exporting data (i.e. insert into select from) if any of the tables queried have null values.

Without the ability to have nullable columns, it severely limits any possible applications for the CSV engine.

I am curious as to the technical issues that required this constraint.

How to repeat:
N/A

Suggested fix:
Add the ability to have nullable columns in CSV tables.
[27 Jun 2008 17:03] Susanne Ebrecht
Many thanks for writing a feature request.

There is no official standard for CVS. Means there are no rules for NULL in CVS. 
Also there is no NULL in ASCII.
[27 Jun 2008 17:30] Duane Hitz
There actually is a null defined in ASCII - it's a decimal, hex and octal value of zero (and in ISO 646 and in Unicode).

Regardless, since that really cannot be (and never is) represented that way in a CSV file, it would seem to me that a length of zero in a CSV file would be equate to a null. 

Since everything in CSV is normally ASCII text (though there is technically no limitation), a null string (at least in C) is one that is null terminated (ASCII 0) at its first character... which is in every way equivilant to having zero length. That's pretty much how any zero-length string has to be represented in memory - so it's not a stretch to call that NULL.

In other words, any string with a length of zero has an undefined, unknown or no value - all valid and reasonable definitions of NULL.

Of course, that is all pre-type conversion - which presumably would have to happen regardless of whether there were a check for zero length/nullness.
[30 Jun 2008 10:03] Susanne Ebrecht
ASCII NUL is not the same as NULL.
[1 Jul 2008 4:57] Duane Hitz
If, as you say, there is no standard for CSV, then why has MySQL chosen to artificially enforce a contrived one while at the same time rendering what could be a very useful feature practically useless?

The lack of a standard would imply that MySQL could do whatever it wants with impunity in order to satisfy its customers needs.

This same thing is a problem with LOAD DATA INFILE as well... in that blank fields in a delimited file just simply break the load requiring, for example:

mycol = IF(length(@mycol) = 0,null,@mycol)

...which to me is an a kludged solution to address an overly obvious shortcoming in MySQL.

Intuitively, one should expect zero-length fields to be treated as nulls in a bulk load utility (as they are in Oracle, etc.).  

I think that intuitively, one should expect a CSV database engine to handle zero-length fields as nulls as well.

If you were designing requirements for features aimed at meeting your end users' needs, of course there could be no possible answer but this.

Is there some other issue that I'm not contemplating here?

It would seem that this is a choice between adhering to some principle of architectural purity versus making a feature useful to your customers... and MySQL has apparently decided against the latter.