| Bug #9622 | Optimizer makes join index mistake | ||
|---|---|---|---|
| Submitted: | 4 Apr 2005 19:31 | Modified: | 24 Jul 2012 14:48 |
| Reporter: | Matthew Lord | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
| Version: | 4.1, 5.0 | OS: | Any (*) |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | bfsm_2007_11_15 | ||
[4 Apr 2005 19:31]
Matthew Lord
[24 Apr 2005 21:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/24265
[13 May 2005 20:22]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/24882
[16 May 2005 19:17]
Sergey Petrunya
Note for the docs team: The bug was that ALTER TABLE ... ENABLE INDEXES statistics collection code treated all NULL values as equal (that is NULL == NULL), while ANALYZE TABLE statistics collection code treated NULL values as inequal (NULL != NULL). We've fixed it - now all statistics collection code treats NULLs as inequal.
[16 May 2005 19:39]
Paul DuBois
Noted in 4.1.13 changelog.
[16 May 2005 19:39]
Sergey Petrunya
A slightly corrected note for the docs team: The bug was that for MyISAM tables, ALTER TABLE ... ENABLE INDEXES and bulk insert statistics collection code treated all NULL values as equal (NULL == NULL), while ANALYZE TABLE statistics collection code treated NULL values as inequal (NULL != NULL). We've fixed it - now all statistics collection code treats NULLs as inequal.
[18 Oct 2005 22:07]
Matthew Lord
If you use this mysqld option: myisam_stats_method=nulls_equal The test fails again as it did prior to this patch.
[20 Oct 2005 15:47]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/31277
[20 Oct 2005 15:49]
Sergey Petrunya
The patch introduces a new statistics collection method "nulls_ignored". With this method, all queries run fast: TESTCASE nulls_equal nulls_unequal nulls_ignored BUG9622 >60 sec 0.08 sec 0.08 sec BUG12232-case1 0.00 sec 1.42 sec 0.01 sec BUG12232-case2 0.03 sec 0.98 sec 0.01 sec BUG12232-case3 1.70 sec 47.32 sec 1.67 sec
[21 Oct 2005 3:03]
Sergey Petrunya
A patch with initial review feedback: http://lists.mysql.com/internals/31284
[22 Oct 2005 22:45]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/31355
[23 Oct 2005 6:24]
Sergey Petrunya
Fix pushed into 4.1.16 tree. Documentation will be soon provided
[27 Oct 2005 20:54]
Sergey Petrunya
PROBLEM WITH NULLS_IGNORED METHOD
=================================
Consider an index defined as KEY(kp1, kp2), where all values of kp1 are not
nulls, but a good number of values of kp2 are NULLs. Graphically this can be
depicted as follows:
kp1 kp2
+-----+-------+
| | |
| A | C |
| | |
+-----+-------+
| | |
| B | NULL |
| | |
+-----+-------+
Areas A, B, C contain non-NULL values here.
In current nulls_ignored method (both spec and implementation):
rec_per_key[0] is an estimate on how many records one will get if he chooses
some value of kp1 from region A or region B.
rec_per_key[1] is an estimate on how many records one will get if he chooses
some value of kp1 from A region, and value of kp2 from region C.
So,
for rec_per_key[0] we assume that kp1 is in A
for rec_per_key[1] - that kp1 is in (A + B). (***)
This can cause undesirable effects. Suppose the table is populated as
follows:
area A: AVG(value_group_size) = $big_value
area B: AVG(value_group_size) = $small_value
area C: Let each tuple in combined {A|C} area have {kp1 == kp2}, so
AVG(value_group_size) = $big_value
Then:
rec_per_key[1] = AVG(value_group_size in AC area ) = $big_value
rec_per_key[0] = (somewhere between $small_value and $big_value);
i.e.
rec_per_key[0] < rec_per_key[1]
i.e.
E(#records(ref(kp1,kp12))) > E(#records(ref(kp1)))
which is obviously not true. I state that the problem lies in (***).
Proposed fix
============
Make different assumptions about the domain that the "probe tuple" belongs
to.
Let's start from 2-part key. Here an index values diagram (Vxy are
!NULLS):
kp1 kp2
+------+-------+
| | |
| NULL | V2B |
| | |
+------+-------+
| | |
| V1A | V2A |
| | |
+------+-------+
| | |
| V1B | NULL |
| | |
+------+-------+
| | |
| NULL | NULL |
| | |
+------+-------+
We'll assume that the "probe tuple" is {x,y} where
x - random value from combined area V1A+V1B
y - random value from combined area V2A+V2B
Note that we include values like {x from V1B, y from V2B}.
Then new rec_per_key values (denote them as rpc2) are:
rpc2[0] = AVG(#records equal to a probe value from V1A+V2A) =
= rec_per_key[0], no changes here.
rpc2[1] = P(x in V1A AND y in V2A) *
AVG(#records equal to a probe tuple in {a in V1A, b in V2A})
+
P(x not in V1A AND y not in V2A) * AVG(...) // AVG=0 here.
The first AVG(...) is the same as rec_per_key[0] in nulls_ignored method,
i.e.
rpc2[1] = P(x in V1A AND y in V2A) * #rec_per_key[1] =
= {assume independence} = P(x in V1A)*P(y in V2A) * #rec_per_key[1].
P(x in V1A) = (#tuples that have kp1 NOT NULL and kp2 NOT NULL) /
(#tuples that have kp1 NOT NULL and kp2 IS NULL)
P(y in V2A) is calculated analogously.
[2 Oct 2007 21:41]
Sergey Petrunya
See also BUG#30423
[27 Jun 2012 14:46]
Valeriy Kravchuk
With 5.1.64 I do not see any problem with original test case:
macbook-pro:5.1 openxs$ bin/mysql -uroot test <~/Downloads/foo2.dump
macbook-pro:5.1 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.64-debug Source distribution
Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> explain SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i ON c.sessionid = i.sessionid WHERE c.timestamp < '2005-03-05 02:42:13' AND c.interface_id = 1 AND (i.status_id NOT IN (1,8) OR i.status_id IS NULL)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: c
type: range
possible_keys: clicktrack$timestamp
key: clicktrack$timestamp
key_len: 8
ref: NULL
rows: 956
Extra: Using where; Using temporary
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: i
type: ref
possible_keys: incidents$sessionid
key: incidents$sessionid
key_len: 14
ref: test.c.sessionid
rows: 1
Extra: Using where; Distinct
2 rows in set (0.04 sec)
The same good plan is used for all values of myisam_stats_method:
mysql> set session myisam_stats_method=nulls_ignored;Query OK, 0 rows affected (0.00 sec)
mysql> explain SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i ON c.sessionid = i.sessionid WHERE c.timestamp < '2005-03-05 02:42:13' AND c.interface_id = 1 AND (i.status_id NOT IN (1,8) OR i.status_id IS NULL)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: c
type: range
possible_keys: clicktrack$timestamp
key: clicktrack$timestamp
key_len: 8
ref: NULL
rows: 956
Extra: Using where; Using temporary
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: i
type: ref
possible_keys: incidents$sessionid
key: incidents$sessionid
key_len: 14
ref: test.c.sessionid
rows: 1
Extra: Using where; Distinct
2 rows in set (0.00 sec)
Probably this bug should just be properly documented and closed, with any remaining problems discussed elsewhere.
[24 Jul 2012 14:48]
Paul DuBois
Noted in 4.1.16 changelog as well.
