Bug #28188 'not exists' optimization for outer joins does not work anymore
Submitted: 1 May 2007 23:56 Modified: 10 May 2007 17:49
Reporter: Igor Babaev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.0.42-BK, 5.0.18-BK OS:Any
Assigned to: Igor Babaev CPU Architecture:Any

[1 May 2007 23:56] Igor Babaev
Description:
If an outer join query with a WHERE condition contains a conjunctive predicate of   the form IS NULL <non-nullable inner table column> so called 'not exists' optimization can be applied.
With this optimization the nested loop retrieval of the inner table rows is stopped as soon as the first row from the inner table that meets the outer join ON condition is found.   
This optimization worked in 4.1 but does not work anymore in 5.0 starting from version 5.0.6.

This is a regression bug.

How to repeat:
Create the following tables:

CREATE TABLE t1 (id int PRIMARY KEY, a varchar(8));
CREATE TABLE t2 (id int NOT NULL, b int NOT NULL, INDEX idx(id));

Populate them:

INSERT INTO t1 VALUES
  (1,'aaaaaaa'), (5,'eeeeeee'), (4,'ddddddd'), (2,'bbbbbbb'), (3,'ccccccc');
INSERT INTO t2 VALUES
  (3,10), (2,20), (5,30), (3,20), (5,10), (3,40), (3,30), (2,10), (2,40);

Then run the following commands:

flush status;
SELECT t1.id, a FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.b IS NULL;
show status like 'Handler_read%';

The optput for 4.1:

mysql> SELECT VERSION();
+--------------+
| VERSION()    |
+--------------+
| 4.1.23-debug |
+--------------+
1 row in set (0.00 sec)

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT t1.id, a FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.b IS NULL;
+----+---------+
| id | a       |
+----+---------+
|  1 | aaaaaaa |
|  4 | ddddddd |
+----+---------+
2 rows in set (0.00 sec)

mysql> show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 5     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 6     |
+-----------------------+-------+
6 rows in set (0.00 sec)

The optput for 5.0:

mysql> SELECT VERSION();
+--------------+
| VERSION()    |
+--------------+
| 5.0.42-debug |
+--------------+
1 row in set (0.00 sec)

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT t1.id, a FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.b IS NULL;
+----+---------+
| id | a       |
+----+---------+
|  1 | aaaaaaa |
|  4 | ddddddd |
+----+---------+
2 rows in set (0.00 sec)

mysql> show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 5     |
| Handler_read_next     | 9     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 6     |
+-----------------------+-------+
6 rows in set (0.00 sec)

The values of 'Handler_read_next' demonstrate a serious performancendegradation.
[2 May 2007 6:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/25883

ChangeSet@1.2469, 2007-05-01 23:34:14-07:00, igor@olga.mysql.com +3 -0
  Fixed bug #28188: performance degradation for outer join queries to which
  'not exists' optimization is applied.
  
  In fact 'not exists' optimization did not work anymore after the patch
  introducing the evaluate_join_record function had been applied.
  
  Corrected the evaluate_join_record function to respect the 'not_exists'
  optimization.
[7 May 2007 18:14] Bugs System
Pushed into 5.1.18-beta
[8 May 2007 0:28] Paul DuBois
Noted in 5.1.18 changelog.

Setting report to Patch Queued pending push into 5.0.x.
[10 May 2007 6:51] Bugs System
Pushed into 5.0.42
[10 May 2007 17:45] Paul DuBois
Noted in 5.0.42 changelog.