Bug #28188 'not exists' optimization for outer joins does not work anymore
Submitted: 2 May 2007 1:56 Modified: 10 May 2007 19:49
Reporter: Igor Babaev
Status: Closed
Category:Server: Optimizer Severity:S5 (Performance)
Version:5.0.42-BK, 5.0.18-BK OS:Any
Assigned to: Bugs System Target Version:

[2 May 2007 1:56] Igor Babaev
Description:
If an outer join query with a WHERE condition contains a conjunctive predicate of   the
form IS NULL <non-nullable inner table column> so called 'not exists' optimization can be
applied.
With this optimization the nested loop retrieval of the inner table rows is stopped as
soon as the first row from the inner table that meets the outer join ON condition is
found.   
This optimization worked in 4.1 but does not work anymore in 5.0 starting from version
5.0.6.

This is a regression bug.

How to repeat:
Create the following tables:

CREATE TABLE t1 (id int PRIMARY KEY, a varchar(8));
CREATE TABLE t2 (id int NOT NULL, b int NOT NULL, INDEX idx(id));

Populate them:

INSERT INTO t1 VALUES
  (1,'aaaaaaa'), (5,'eeeeeee'), (4,'ddddddd'), (2,'bbbbbbb'), (3,'ccccccc');
INSERT INTO t2 VALUES
  (3,10), (2,20), (5,30), (3,20), (5,10), (3,40), (3,30), (2,10), (2,40);

Then run the following commands:

flush status;
SELECT t1.id, a FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.b IS NULL;
show status like 'Handler_read%';

The optput for 4.1:

mysql> SELECT VERSION();
+--------------+
| VERSION()    |
+--------------+
| 4.1.23-debug |
+--------------+
1 row in set (0.00 sec)

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT t1.id, a FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.b IS NULL;
+----+---------+
| id | a       |
+----+---------+
|  1 | aaaaaaa |
|  4 | ddddddd |
+----+---------+
2 rows in set (0.00 sec)

mysql> show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 5     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 6     |
+-----------------------+-------+
6 rows in set (0.00 sec)

The optput for 5.0:

mysql> SELECT VERSION();
+--------------+
| VERSION()    |
+--------------+
| 5.0.42-debug |
+--------------+
1 row in set (0.00 sec)

mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT t1.id, a FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.b IS NULL;
+----+---------+
| id | a       |
+----+---------+
|  1 | aaaaaaa |
|  4 | ddddddd |
+----+---------+
2 rows in set (0.00 sec)

mysql> show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 5     |
| Handler_read_next     | 9     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 6     |
+-----------------------+-------+
6 rows in set (0.00 sec)

The values of 'Handler_read_next' demonstrate a serious performancendegradation.
[2 May 2007 8:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/25883

ChangeSet@1.2469, 2007-05-01 23:34:14-07:00, igor@olga.mysql.com +3 -0
  Fixed bug #28188: performance degradation for outer join queries to which
  'not exists' optimization is applied.
  
  In fact 'not exists' optimization did not work anymore after the patch
  introducing the evaluate_join_record function had been applied.
  
  Corrected the evaluate_join_record function to respect the 'not_exists'
  optimization.
[7 May 2007 20:14] Bugs System
Pushed into 5.1.18-beta
[8 May 2007 2:28] Paul DuBois
Noted in 5.1.18 changelog.

Setting report to Patch Queued pending push into 5.0.x.
[10 May 2007 8:51] Bugs System
Pushed into 5.0.42
[10 May 2007 19:45] Paul DuBois
Noted in 5.0.42 changelog.