Bug #9622 Optimizer makes join index mistake
Submitted: 4 Apr 2005 21:31 Modified: 30 Sep 13:43
Reporter: Matthew Lord
Status: Verified
Category:Server: Optimizer Severity:S2 (Serious)
Version:5.0,4,1,5.0 OS:Any (*)
Assigned to: Target Version:
Tags: bfsm_2007_11_15
Triage: Triaged: D2 (Serious) / R5 (Severe) / E5 (Major)

[4 Apr 2005 21:31] Matthew Lord
Description:
Here is the query in question:

SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i ON c.sessionid =
i.sessionid WHERE c.timestamp < '2005-03-05 02:42:13' AND c.interface_id = 1 AND
(i.status_id NOT IN (1,8) OR i.status_id IS NULL);

The optimizer chooses not to use the sessionid index and thus causes the incidents table
to be
scanned for each record in clicktrack.  If you add  force index(incidents$sessionid) the
explain
output looks worse but it actually finishes almost immediatly.

The sessionid column in incidents contains only NULLs.  If you do this:
update incidents set sessionid = "*55wNSzh" limit 1;
analyze table incidents;
You'll see that the optimizer now chooses to use the index correctly.

Now if you do this:
update incidents set sessionid = NULL where sessionid IS NOT NULL;
analyze table incidents;

You'll see that the optimizer continues to choose the correct path.  We need to find
out why it isn't initially.

How to repeat:
create database csc5002;
use csc5002;
system gunzip /tmp/foo2.dump.gz; /* location provided */
source /tmp/foo2.dump;

desc SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i ON
c.sessionid = i.sessionid WHERE c.timestamp < '2005-03-05 02:42:13' AND c.interface_id = 1
AND (i.status_id NOT IN (1,8) OR i.status_id IS NULL)\G
# you can actually try and execute this command if you have a lot of time :)

desc SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i force
index(incidents$sessionid) ON c.sessionid = i.sessionid WHERE c.timestamp < '2005-03-05
02:42:13' AND c.interface_id = 1 AND (i.status_id NOT IN (1,8) OR i.status_id IS NULL)\G
SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i force
index(incidents$sessionid) ON c.sessionid = i.sessionid WHERE c.timestamp < '2005-03-05
02:42:13' AND c.interface_id = 1 AND (i.status_id NOT IN (1,8) OR i.status_id IS NULL);

update incidents set sessionid = "*55wNSzh" limit 1;
analyse table incidents;
desc SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i ON
c.sessionid = i.sessionid WHERE c.timestamp < '2005-03-05 02:42:13' AND c.interface_id = 1
AND (i.status_id NOT IN (1,8) OR i.status_id IS NULL)\G
SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i ON c.sessionid =
i.sessionid WHERE c.timestamp < '2005-03-05 02:42:13' AND c.interface_id = 1 AND
(i.status_id NOT IN (1,8) OR i.status_id IS NULL);

#now notice that if we go back to the original data set the optimizer continues choosing
correctly
update incidents set sessionid = NULL where sessionid IS NOT NULL;
analyze table incidents;
desc SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i ON
c.sessionid = i.sessionid WHERE c.timestamp < '2005-03-05 02:42:13' AND c.interface_id = 1
AND (i.status_id NOT IN (1,8) OR i.status_id IS NULL)\G
SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i ON c.sessionid =
i.sessionid WHERE c.timestamp < '2005-03-05 02:42:13' AND c.interface_id = 1 AND
(i.status_id NOT IN (1,8) OR i.status_id IS NULL);
[24 Apr 2005 23:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/24265
[13 May 2005 22:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/24882
[16 May 2005 21:17] Sergey Petrunya
Note for the docs team: 
The bug was that ALTER TABLE ... ENABLE INDEXES statistics collection code treated all
NULL values as equal (that is NULL == NULL), while ANALYZE TABLE statistics collection
code treated NULL values as inequal (NULL != NULL).
We've fixed it - now all statistics collection code treats NULLs as inequal.
[16 May 2005 21:39] Paul DuBois
Noted in 4.1.13 changelog.
[16 May 2005 21:39] Sergey Petrunya
A slightly corrected note for the docs team: 
The bug was that for MyISAM tables, ALTER TABLE ... ENABLE INDEXES 
and bulk insert statistics collection code treated all NULL values as equal 
(NULL == NULL), while ANALYZE TABLE statistics collection code treated
NULL values as inequal (NULL != NULL).

We've fixed it - now all statistics collection code treats NULLs as inequal.
[19 Oct 2005 0:07] Matthew Lord
If you use this mysqld option:
myisam_stats_method=nulls_equal

The test fails again as it did prior to this patch.
[20 Oct 2005 17:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/31277
[20 Oct 2005 17:49] Sergey Petrunya
The patch introduces a new statistics collection method "nulls_ignored".
With this method, all queries run fast:

TESTCASE         nulls_equal nulls_unequal nulls_ignored 
BUG9622           >60 sec     0.08 sec       0.08 sec
BUG12232-case1    0.00 sec    1.42 sec       0.01 sec
BUG12232-case2    0.03 sec    0.98 sec       0.01 sec
BUG12232-case3    1.70 sec   47.32 sec       1.67 sec
[21 Oct 2005 5:03] Sergey Petrunya
A patch with initial review feedback:
http://lists.mysql.com/internals/31284
[23 Oct 2005 0:45] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/31355
[23 Oct 2005 8:24] Sergey Petrunya
Fix pushed into 4.1.16 tree. Documentation will be soon provided
[27 Oct 2005 22:54] Sergey Petrunya
PROBLEM WITH NULLS_IGNORED METHOD
=================================

Consider an index defined as KEY(kp1, kp2), where all values of kp1 are not
nulls, but a good number of values of kp2 are NULLs. Graphically this can be
depicted as follows:

  kp1    kp2  
+-----+-------+
|     |       |
|  A  |   C   |
|     |       |
+-----+-------+
|     |       |
|  B  |  NULL |
|     |       |
+-----+-------+

Areas A, B, C contain non-NULL values here.

In current nulls_ignored method (both spec and implementation):

rec_per_key[0] is an estimate on how many records one will get if he chooses
some value of kp1 from region A or region B.

rec_per_key[1] is an estimate on how many records one will get if he chooses
some value of kp1 from A region, and value of kp2 from region C.

So, 
for rec_per_key[0] we assume that kp1 is in A
for rec_per_key[1]  -  that kp1 is in (A + B).  (***)

This can cause undesirable effects. Suppose the table is populated as
follows:
area A: AVG(value_group_size) = $big_value
area B: AVG(value_group_size) = $small_value
area C: Let each tuple in combined {A|C} area have {kp1 == kp2}, so
        AVG(value_group_size) = $big_value

Then:
rec_per_key[1] = AVG(value_group_size in AC area ) = $big_value
rec_per_key[0] = (somewhere between $small_value and $big_value);

i.e. 

rec_per_key[0] < rec_per_key[1]

i.e.
E(#records(ref(kp1,kp12))) > E(#records(ref(kp1)))

which is obviously not true. I state that the problem lies in (***).

Proposed fix
============

Make different assumptions about the domain that the "probe tuple" belongs
to.

Let's start from 2-part key. Here an index values diagram (Vxy are
!NULLS):

   kp1    kp2
+------+-------+
|      |       |
| NULL |  V2B  |
|      |       |
+------+-------+
|      |       |
| V1A  |  V2A  |
|      |       |
+------+-------+
|      |       |
| V1B  |  NULL |
|      |       |
+------+-------+
|      |       |
| NULL |  NULL |
|      |       |
+------+-------+

We'll assume that the "probe tuple" is {x,y} where 

  x -  random value from combined area V1A+V1B
  y -  random value from combined area V2A+V2B

Note that we include values like {x from V1B, y from V2B}.

Then new rec_per_key values (denote them as rpc2) are:

rpc2[0] = AVG(#records equal to a probe value from V1A+V2A) = 
 = rec_per_key[0], no changes here.

rpc2[1] = P(x in V1A AND y in V2A) * 
          AVG(#records equal to a probe tuple in {a in V1A, b in V2A})
          + 
          P(x not in V1A AND y not in V2A) * AVG(...) // AVG=0 here.
 
The first AVG(...) is the same as rec_per_key[0] in nulls_ignored method,
i.e.

rpc2[1] = P(x in V1A AND y in V2A) * #rec_per_key[1] = 

= {assume independence} =  P(x in V1A)*P(y in V2A) * #rec_per_key[1].

 P(x in V1A) = (#tuples that have kp1 NOT NULL and kp2 NOT NULL) /
               (#tuples that have kp1 NOT NULL and kp2 IS NULL)

 P(y in V2A) is calculated analogously.
[2 Oct 2007 23:41] Sergey Petrunya
See also BUG#30423