MySQL Bugs: #15146: Large tables joined on a null key should be faster

Bug #15146	Large tables joined on a null key should be faster
Submitted:	22 Nov 2005 18:46	Modified:	1 Jul 2006 1:23
Reporter:	Kevin Fries	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	4.1.13-nt	OS:	Any (All)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
I have two large table, and I have the primary key of a record in the first table.  I want to select back an equijoin of the two tables based on a column (xId) common to both, but not the primary key of either.   The xId column is indexed in both tables. I'm only looking for one record at a time from the first table, based on the primary key value I know.  

If I specify a primary key value where the xId value is NOT NULL, then the join is lightning fast (.02 seconds) on a system with around 1 million records in each table.  But, if I select a record that has NULL for the xId, then the query takes as much ad 5 seconds to come back.  

Seems to be the join should be faster in the case of a NULL.

How to repeat:
The following script generates data, analyzes the table, and demonstrates two queries.  I believe both should run very quickly.

drop table TestX;
drop table TestY;
drop table TestY2;
drop table TestZ;

create table TestX (xId int not null auto_increment, name varchar(50), Primary Key pk_xid (xId)) type=innodb;
  
insert into TestX(name) values ('something');  

create table TestY (yId int not null auto_increment, details varchar(200), xId int,
     index idx_xId(xId), primary key pk_yId (yId)) type=innodb;
  
create table TestY2 (details varchar(200), xId int, index idx2_xId(xId));
     
insert into TestY(details, xId) values ( 'A record with an xId', 1);
insert into TestY(details, xId) values ( 'A record with NO xId', null);
insert into TestY2 select details, xId from TestY;

/** bulk up a bunmch of rows in Y.. **/
insert into TestY (details, xId) select details, xId from TestY2;
insert into TestY2 select details, xId from TestY;
insert into TestY (details, xId) select details, xId from TestY2;
insert into TestY2 select details, xId from TestY;
insert into TestY (details, xId) select details, xId from TestY2;
insert into TestY2 select details, xId from TestY;
insert into TestY (details, xId) select details, xId from TestY2;
insert into TestY2 select details, xId from TestY;
insert into TestY (details, xId) select details, xId from TestY2;
insert into TestY2 select details, xId from TestY;
insert into TestY (details, xId) select details, xId from TestY2;
insert into TestY2 select details, xId from TestY;
insert into TestY (details, xId) select details, xId from TestY2;
insert into TestY2 select details, xId from TestY;
insert into TestY (details, xId) select details, xId from TestY2;
insert into TestY2 select details, xId from TestY;
insert into TestY (details, xId) select details, xId from TestY2;
insert into TestY2 select details, xId from TestY;
insert into TestY (details, xId) select details, xId from TestY2;
insert into TestY2 select details, xId from TestY;
insert into TestY (details, xId) select details, xId from TestY2;
insert into TestY2 select details, xId from TestY;
insert into TestY (details, xId) select details, xId from TestY2;
insert into TestY2 select details, xId from TestY;
insert into TestY (details, xId) select details, xId from TestY2;
insert into TestY2 select details, xId from TestY;
insert into TestY (details, xId) select details, xId from TestY2;

create table TestZ (xId int, yId int, index idx_z_xId (xId)) type=innodb;  
insert into TestZ (xId, yId) values (1, 1 );
insert into TestZ (yId) select yid from TestY where xid IS NULL;

analyze table TestZ;
analyze table TestY;

select * from TestY where yId < 3;

explain SELECT TestZ.* 
FROM TestY , TestZ 
WHERE TestY.xId = TestZ.xId 
 AND TestY.yId = 2;

explain SELECT TestZ.* 
FROM TestY , TestZ 
WHERE TestY.xId = TestZ.xId 
 AND TestY.yId = 1;
 
/** this takes much too long. Is it because we're scanning the TestZ table ? Why would we even look at it? **/
SELECT TestZ.* 
FROM TestY , TestZ 
WHERE TestY.xId = TestZ.xId 
 AND TestY.yId = 2;

/** this runs quite fast. **/
SELECT TestZ.* 
FROM TestY , TestZ 
WHERE TestY.xId = TestZ.xId 
 AND TestY.yId = 1;

Suggested fix:
Before performing the join between TestY and TestZ, the join key should be verified.  It is not possible to find any rows in the TestZ table where TestY's NULL  equals TestZ's NULL.

Thanks for your bug report.  I was able to reproduce this problem in 5.0.15 under FC4:

mysql> select * from TestY where yId < 3;
+-----+----------------------+------+
| yId | details              | xId  |
+-----+----------------------+------+
|   1 | A record with an xId |    1 |
|   2 | A record with NO xId | NULL |
+-----+----------------------+------+
2 rows in set (0.00 sec)

mysql> explain SELECT TestZ.* 
    -> FROM TestY , TestZ 
    -> WHERE TestY.xId = TestZ.xId 
    ->  AND TestY.yId = 2;
+----+-------------+-------+-------+-----------------+-----------+---------+-------+-------+-------------+
| id | select_type | table | type  | possible_keys   | key       | key_len | ref   | rows  | Extra       |
+----+-------------+-------+-------+-----------------+-----------+---------+-------+-------+-------------+
|  1 | SIMPLE      | TestY | const | PRIMARY,idx_xId | PRIMARY   | 4       | const |     1 |             |
|  1 | SIMPLE      | TestZ | ref   | idx_z_xId       | idx_z_xId | 5       | const | 51497 | Using where |
+----+-------------+-------+-------+-----------------+-----------+---------+-------+-------+-------------+
2 rows in set (0.00 sec)

mysql> explain SELECT TestZ.* 
    -> FROM TestY , TestZ 
    -> WHERE TestY.xId = TestZ.xId 
    ->  AND TestY.yId = 1;
+----+-------------+-------+-------+-----------------+-----------+---------+-------+-------+-------------+
| id | select_type | table | type  | possible_keys   | key       | key_len | ref   | rows  | Extra       |
+----+-------------+-------+-------+-----------------+-----------+---------+-------+-------+-------------+
|  1 | SIMPLE      | TestY | const | PRIMARY,idx_xId | PRIMARY   | 4       | const |     1 |             |
|  1 | SIMPLE      | TestZ | ref   | idx_z_xId       | idx_z_xId | 5       | const | 51497 | Using where |
+----+-------------+-------+-------+-----------------+-----------+---------+-------+-------+-------------+
2 rows in set (0.00 sec)

mysql> SELECT TestZ.* 
    -> FROM TestY , TestZ 
    -> WHERE TestY.xId = TestZ.xId 
    ->  AND TestY.yId = 2;
Empty set (4.30 sec)

mysql> SELECT TestZ.* 
    -> FROM TestY , TestZ 
    -> WHERE TestY.xId = TestZ.xId 
    ->  AND TestY.yId = 1;
+------+------+
| xId  | yId  |
+------+------+
|    1 |    1 |
+------+------+
1 row in set (0.01 sec)

mysql>

In 4.1.21, I get:

SELECT TestZ.* FROM TestY , TestZ  WHERE TestY.xId = TestZ.xId AND TestY.yId = 2;
Operations:
  Handler_read_key,  2 ops

SELECT TestZ.* FROM TestY , TestZ 
WHERE TestY.xId = TestZ.xId AND TestY.yId = 1;
Operation:
  Handler_read_key  4 ops
  Handler_read_next 1 op

i.e. it is already executing fast. 
EXPLAIN outputs:

EXPLAIN SELECT TestZ.*  FROM TestY , TestZ  WHERE TestY.xId = TestZ.xId   AND TestY.yId = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TestY
         type: const
possible_keys: PRIMARY,idx_xId
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: TestZ
         type: ref
possible_keys: idx_z_xId
          key: idx_z_xId
      key_len: 5
          ref: const
         rows: 30152
        Extra: Using where
2 rows in set (0.00 sec)

EXPLAIN SELECT TestZ.*  FROM TestY , TestZ  WHERE TestY.xId = TestZ.xId   AND TestY.yId = 2\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: TestY
         type: const
possible_keys: PRIMARY,idx_xId
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: TestZ
         type: ref
possible_keys: idx_z_xId
          key: idx_z_xId
      key_len: 5
          ref: const
         rows: 30152
        Extra: Using where
2 rows in set (0.00 sec)

In 4.1.13a-debug:
SELECT TestZ.* FROM TestY , TestZ  WHERE TestY.xId = TestZ.xId AND TestY.yId = 2;
Operations:
  Handler_read_key      | 4      |
  Handler_read_next     | 514229 |

BUG#12144

This deficiency has been fixed by fix for BUG#12144.