| Bug #15146 | Large tables joined on a null key should be faster | ||
|---|---|---|---|
| Submitted: | 22 Nov 2005 18:46 | Modified: | 1 Jul 2006 1:23 |
| Reporter: | Kevin Fries | Email Updates: | |
| Status: | Duplicate | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
| Version: | 4.1.13-nt | OS: | Any (All) |
| Assigned to: | Assigned Account | CPU Architecture: | Any |
[22 Nov 2005 19:29]
Jorge del Conde
Thanks for your bug report. I was able to reproduce this problem in 5.0.15 under FC4:
mysql> select * from TestY where yId < 3;
+-----+----------------------+------+
| yId | details | xId |
+-----+----------------------+------+
| 1 | A record with an xId | 1 |
| 2 | A record with NO xId | NULL |
+-----+----------------------+------+
2 rows in set (0.00 sec)
mysql> explain SELECT TestZ.*
-> FROM TestY , TestZ
-> WHERE TestY.xId = TestZ.xId
-> AND TestY.yId = 2;
+----+-------------+-------+-------+-----------------+-----------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------+-----------+---------+-------+-------+-------------+
| 1 | SIMPLE | TestY | const | PRIMARY,idx_xId | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | TestZ | ref | idx_z_xId | idx_z_xId | 5 | const | 51497 | Using where |
+----+-------------+-------+-------+-----------------+-----------+---------+-------+-------+-------------+
2 rows in set (0.00 sec)
mysql> explain SELECT TestZ.*
-> FROM TestY , TestZ
-> WHERE TestY.xId = TestZ.xId
-> AND TestY.yId = 1;
+----+-------------+-------+-------+-----------------+-----------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------+-----------+---------+-------+-------+-------------+
| 1 | SIMPLE | TestY | const | PRIMARY,idx_xId | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | TestZ | ref | idx_z_xId | idx_z_xId | 5 | const | 51497 | Using where |
+----+-------------+-------+-------+-----------------+-----------+---------+-------+-------+-------------+
2 rows in set (0.00 sec)
mysql> SELECT TestZ.*
-> FROM TestY , TestZ
-> WHERE TestY.xId = TestZ.xId
-> AND TestY.yId = 2;
Empty set (4.30 sec)
mysql> SELECT TestZ.*
-> FROM TestY , TestZ
-> WHERE TestY.xId = TestZ.xId
-> AND TestY.yId = 1;
+------+------+
| xId | yId |
+------+------+
| 1 | 1 |
+------+------+
1 row in set (0.01 sec)
mysql>
[30 Jun 2006 23:31]
Sergey Petrunya
In 4.1.21, I get:
SELECT TestZ.* FROM TestY , TestZ WHERE TestY.xId = TestZ.xId AND TestY.yId = 2;
Operations:
Handler_read_key, 2 ops
SELECT TestZ.* FROM TestY , TestZ
WHERE TestY.xId = TestZ.xId AND TestY.yId = 1;
Operation:
Handler_read_key 4 ops
Handler_read_next 1 op
i.e. it is already executing fast.
EXPLAIN outputs:
EXPLAIN SELECT TestZ.* FROM TestY , TestZ WHERE TestY.xId = TestZ.xId AND TestY.yId = 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: TestY
type: const
possible_keys: PRIMARY,idx_xId
key: PRIMARY
key_len: 4
ref: const
rows: 1
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: TestZ
type: ref
possible_keys: idx_z_xId
key: idx_z_xId
key_len: 5
ref: const
rows: 30152
Extra: Using where
2 rows in set (0.00 sec)
EXPLAIN SELECT TestZ.* FROM TestY , TestZ WHERE TestY.xId = TestZ.xId AND TestY.yId = 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: TestY
type: const
possible_keys: PRIMARY,idx_xId
key: PRIMARY
key_len: 4
ref: const
rows: 1
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: TestZ
type: ref
possible_keys: idx_z_xId
key: idx_z_xId
key_len: 5
ref: const
rows: 30152
Extra: Using where
2 rows in set (0.00 sec)
[1 Jul 2006 1:10]
Sergey Petrunya
In 4.1.13a-debug: SELECT TestZ.* FROM TestY , TestZ WHERE TestY.xId = TestZ.xId AND TestY.yId = 2; Operations: Handler_read_key | 4 | Handler_read_next | 514229 |
[1 Jul 2006 1:22]
Sergey Petrunya
BUG#12144
[1 Jul 2006 1:23]
Sergey Petrunya
This deficiency has been fixed by fix for BUG#12144.

Description: I have two large table, and I have the primary key of a record in the first table. I want to select back an equijoin of the two tables based on a column (xId) common to both, but not the primary key of either. The xId column is indexed in both tables. I'm only looking for one record at a time from the first table, based on the primary key value I know. If I specify a primary key value where the xId value is NOT NULL, then the join is lightning fast (.02 seconds) on a system with around 1 million records in each table. But, if I select a record that has NULL for the xId, then the query takes as much ad 5 seconds to come back. Seems to be the join should be faster in the case of a NULL. How to repeat: The following script generates data, analyzes the table, and demonstrates two queries. I believe both should run very quickly. drop table TestX; drop table TestY; drop table TestY2; drop table TestZ; create table TestX (xId int not null auto_increment, name varchar(50), Primary Key pk_xid (xId)) type=innodb; insert into TestX(name) values ('something'); create table TestY (yId int not null auto_increment, details varchar(200), xId int, index idx_xId(xId), primary key pk_yId (yId)) type=innodb; create table TestY2 (details varchar(200), xId int, index idx2_xId(xId)); insert into TestY(details, xId) values ( 'A record with an xId', 1); insert into TestY(details, xId) values ( 'A record with NO xId', null); insert into TestY2 select details, xId from TestY; /** bulk up a bunmch of rows in Y.. **/ insert into TestY (details, xId) select details, xId from TestY2; insert into TestY2 select details, xId from TestY; insert into TestY (details, xId) select details, xId from TestY2; insert into TestY2 select details, xId from TestY; insert into TestY (details, xId) select details, xId from TestY2; insert into TestY2 select details, xId from TestY; insert into TestY (details, xId) select details, xId from TestY2; insert into TestY2 select details, xId from TestY; insert into TestY (details, xId) select details, xId from TestY2; insert into TestY2 select details, xId from TestY; insert into TestY (details, xId) select details, xId from TestY2; insert into TestY2 select details, xId from TestY; insert into TestY (details, xId) select details, xId from TestY2; insert into TestY2 select details, xId from TestY; insert into TestY (details, xId) select details, xId from TestY2; insert into TestY2 select details, xId from TestY; insert into TestY (details, xId) select details, xId from TestY2; insert into TestY2 select details, xId from TestY; insert into TestY (details, xId) select details, xId from TestY2; insert into TestY2 select details, xId from TestY; insert into TestY (details, xId) select details, xId from TestY2; insert into TestY2 select details, xId from TestY; insert into TestY (details, xId) select details, xId from TestY2; insert into TestY2 select details, xId from TestY; insert into TestY (details, xId) select details, xId from TestY2; insert into TestY2 select details, xId from TestY; insert into TestY (details, xId) select details, xId from TestY2; create table TestZ (xId int, yId int, index idx_z_xId (xId)) type=innodb; insert into TestZ (xId, yId) values (1, 1 ); insert into TestZ (yId) select yid from TestY where xid IS NULL; analyze table TestZ; analyze table TestY; select * from TestY where yId < 3; explain SELECT TestZ.* FROM TestY , TestZ WHERE TestY.xId = TestZ.xId AND TestY.yId = 2; explain SELECT TestZ.* FROM TestY , TestZ WHERE TestY.xId = TestZ.xId AND TestY.yId = 1; /** this takes much too long. Is it because we're scanning the TestZ table ? Why would we even look at it? **/ SELECT TestZ.* FROM TestY , TestZ WHERE TestY.xId = TestZ.xId AND TestY.yId = 2; /** this runs quite fast. **/ SELECT TestZ.* FROM TestY , TestZ WHERE TestY.xId = TestZ.xId AND TestY.yId = 1; Suggested fix: Before performing the join between TestY and TestZ, the join key should be verified. It is not possible to find any rows in the TestZ table where TestY's NULL equals TestZ's NULL.