Bug #109484 Is null optimization limitation documentation unclear
Submitted: 26 Dec 2022 21:36 Modified: 30 Dec 2022 13:56
Reporter: Tim Nielens Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.7 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: documentation, null, optimization

[26 Dec 2022 21:36] Tim Nielens
Description:
I have issues understanding the limitation of the is NULL optimization from the documentation at https://dev.mysql.com/doc/refman/5.7/en/is-null-optimization.html .

One minor point on the following part. Is it two separate indexes on the two columns, or a composite index? I'll assume composite index.
> Some examples of queries that are optimized, assuming that there is an index on columns a and b of table t2: 

My main issue is with the following part at the end on limitations of the optimization: 
> The optimization can handle only one IS NULL level. In the following query, MySQL uses key lookups only on the expression (t1.a=t2.a AND t2.a IS NULL) and is not able to use the key part on b:
> SELECT * FROM t1, t2
>   WHERE (t1.a=t2.a AND t2.a IS NULL)
>   OR (t1.b=t2.b AND t2.b IS NULL);

I don't understand how this example is relevant, specifically the `OR (t1.b=t2.b AND t2.b IS NULL)` part. From the doc of composite indexes, "MySQL cannot use the index to perform lookups if the columns do not form a leftmost prefix of the index.". I consider `OR (t1.b=t2.b AND t2.b IS NULL)` unable to make use of the index because that subclause doesn't constrain `t2.a`. I don't understand why it's used to explain a limitation of the is NULL optimization. What does "level" refer to in "The optimization can handle only one IS NULL level.".

A query that would make more sense to me for illustrating the limitation is this one: 

SELECT * FROM t1, t2
  WHERE (t1.a=t2.a OR t2.a IS NULL)
  AND (t1.b=t2.b OR t2.b IS NULL);

I've tried so, but the `ref_or_null` join type is still reported by `EXPLAIN`. Therefore I don't understand the limitation documented. A complete example is available in the following section.

I've been searching a bit on the topic but couldn't find much except at
https://stackoverflow.com/questions/289001/does-mysql-index-null-values . But that post doesn't help much on the is NULL optimization.

Am I missing something important from the doc? If not, maybe the doc could elaborate a bit on the limitation.

How to repeat:
CREATE TABLE t1 (
    id         INT AUTO_INCREMENT NOT NULL,
    a  CHAR(30),
    b CHAR(30),
    PRIMARY KEY (id)
);

CREATE TABLE t2 (
    id         INT AUTO_INCREMENT NOT NULL,
    a  CHAR(30),
    b CHAR(30),
    PRIMARY KEY (id),
    INDEX name (a,b)
);

insert into t1 (a, b) values ('Calamity', 'Jane'), ('The Kid', 'Billy'), ('James', 'Jesse'), ('Jack', NULL) 

insert into t2 (a, b) values ('Calamity', 'Jane'), ('The Kid', 'Billy'), ('James', 'Jesse'), ('Jack', NULL) 
  
explain SELECT * FROM t1, t2
  WHERE (t1.a=t2.a OR t2.a IS NULL)
  AND (t1.b=t2.b OR t2.b IS NULL);
[30 Dec 2022 13:56] MySQL Verification Team
We're sorry, but the bug system is not the appropriate forum for asking help on using MySQL products. Your problem is not the result of a bug.

For details on getting support for MySQL products see http://www.mysql.com/support/
You can also check our forums (free) at http://forums.mysql.com/

Thank you for your interest in MySQL.