MySQL Bugs: #99522: Support column prefix index of size zero (NULL index)

Bug #99522	Support column prefix index of size zero (NULL index)
Submitted:	12 May 2020 7:26	Modified:	13 May 2020 7:21
Reporter:	Vladimir Lasky	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: DDL	Severity:	S4 (Feature request)
Version:	8.0.20	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	INDEX

Description:
In many MySQL applications, we want our query to efficiently retrieve rows from a table where a minority of values in a column are set to NULL or NOT NULL.

Let's imagine a table named order_details used to track food deliveries and we want to retrieve the details of orders that have not been delivered yet, that is, delivered_time is set to NULL.

SELECT order_details from orders where delivered_time IS NULL;

Usually, we would create an ordinary index on delivered_time, i.e.
CREATE INDEX on orders (delivered_time);
An ordinary index is wasteful of disk space and memory if our queries never have conditions that check the actual value of delivered_time

If the column data type were CHAR, VARCHAR, BINARY, and VARBINARY, BLOB or TEXT we could reduce the index space requirements by creating a prefix index using only the first character.

CREATE INDEX on orders (delivered_time(1));
This wastes less space, but is still suboptimal and cannot be used for numerical data types or date & time data types.

How to repeat:
N/A

Suggested fix:
MySQL should be enhanced to allow prefix indexes of size zero to be created. The index purely records whether the value in the column is NULL or NOT NULL, i.e.

CREATE INDEX on orders (delivered_time(0));

This zero-sized column prefix index would then be able to be create for almost all MySQL data types including numerical data types, date & time data types and even spatial data types, in addition to the currently supported ones (CHAR, VARCHAR, BINARY, and VARBINARY, BLOB or TEXT).

One can index generated columns to do this.   Check if it helps?
-- ------------

-- drop table if exists t;
create table t(a varchar(60))engine=innodb;

insert into t select uuid();
insert into t select uuid() from t;
insert into t select uuid() from t a,t;
insert into t select uuid() from t a,t;
insert into t select uuid() from t a,t;
insert into t select uuid() from t a,t limit 1000000;

update t set a=null where a like '%33%';
analyze table t;
select count(*), sum(isnull(a)),sum(not isnull(a)) from t \G
explain select count(*) from t where isnull(a);
flush status;
select count(*) from t where isnull(a);
show status like 'handler%';

-- vs an indexed generated column

alter table t add a_n tinyint generated always as(isnull(a));
alter table t add key n(a_n);
analyze table t;
explain select count(*) from t where a_n=1;
flush status;
select count(*) from t where a_n=1;
show status like 'handler%';

select version();
-- ------------

Hello Shane,

Thanks for taking the time to comment.

Yes, your approach with the indexed generated column is a partial workaround, but it is not always an option and doesn't provide the same benefits.

1. Generated columns are only supported when using the InnoDB storage engine.

Many MySQL users use other storage engines like MyISAM, TokuDB and MyRocks to meet specific needs.

My businesses are relying on TokuDB.

2. Queries with WHERE clauses that check for IS NULL or NOT NULL would have to be rewritten to use the ISNULL() function.

It is considered good practice to avoid creating WHERE clauses with column names inside MySQL functions as this usually prevents MySQL's optimizer from functioning.

Obviously, when you have indexed generated columns populated using function(s) applied to a column, that is an 'exception to the rule', but I would prefer to avoid relying on such exceptions except unless there is no other alternative.

The use case I describe is very common. The syntax I propose for the NULL index (a column prefix of size zero) is more intuitive and simpler for most people to deploy.

3. I expect the development effort required to support this to be small.