Bug #9622 | Optimizer makes join index mistake | ||
---|---|---|---|
Submitted: | 4 Apr 2005 19:31 | Modified: | 24 Jul 2012 14:48 |
Reporter: | Matthew Lord | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
Version: | 4.1, 5.0 | OS: | Any (*) |
Assigned to: | CPU Architecture: | Any | |
Tags: | bfsm_2007_11_15 |
[4 Apr 2005 19:31]
Matthew Lord
[24 Apr 2005 21:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/24265
[13 May 2005 20:22]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/24882
[16 May 2005 19:17]
Sergey Petrunya
Note for the docs team: The bug was that ALTER TABLE ... ENABLE INDEXES statistics collection code treated all NULL values as equal (that is NULL == NULL), while ANALYZE TABLE statistics collection code treated NULL values as inequal (NULL != NULL). We've fixed it - now all statistics collection code treats NULLs as inequal.
[16 May 2005 19:39]
Paul DuBois
Noted in 4.1.13 changelog.
[16 May 2005 19:39]
Sergey Petrunya
A slightly corrected note for the docs team: The bug was that for MyISAM tables, ALTER TABLE ... ENABLE INDEXES and bulk insert statistics collection code treated all NULL values as equal (NULL == NULL), while ANALYZE TABLE statistics collection code treated NULL values as inequal (NULL != NULL). We've fixed it - now all statistics collection code treats NULLs as inequal.
[18 Oct 2005 22:07]
Matthew Lord
If you use this mysqld option: myisam_stats_method=nulls_equal The test fails again as it did prior to this patch.
[20 Oct 2005 15:47]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/31277
[20 Oct 2005 15:49]
Sergey Petrunya
The patch introduces a new statistics collection method "nulls_ignored". With this method, all queries run fast: TESTCASE nulls_equal nulls_unequal nulls_ignored BUG9622 >60 sec 0.08 sec 0.08 sec BUG12232-case1 0.00 sec 1.42 sec 0.01 sec BUG12232-case2 0.03 sec 0.98 sec 0.01 sec BUG12232-case3 1.70 sec 47.32 sec 1.67 sec
[21 Oct 2005 3:03]
Sergey Petrunya
A patch with initial review feedback: http://lists.mysql.com/internals/31284
[22 Oct 2005 22:45]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/31355
[23 Oct 2005 6:24]
Sergey Petrunya
Fix pushed into 4.1.16 tree. Documentation will be soon provided
[27 Oct 2005 20:54]
Sergey Petrunya
PROBLEM WITH NULLS_IGNORED METHOD ================================= Consider an index defined as KEY(kp1, kp2), where all values of kp1 are not nulls, but a good number of values of kp2 are NULLs. Graphically this can be depicted as follows: kp1 kp2 +-----+-------+ | | | | A | C | | | | +-----+-------+ | | | | B | NULL | | | | +-----+-------+ Areas A, B, C contain non-NULL values here. In current nulls_ignored method (both spec and implementation): rec_per_key[0] is an estimate on how many records one will get if he chooses some value of kp1 from region A or region B. rec_per_key[1] is an estimate on how many records one will get if he chooses some value of kp1 from A region, and value of kp2 from region C. So, for rec_per_key[0] we assume that kp1 is in A for rec_per_key[1] - that kp1 is in (A + B). (***) This can cause undesirable effects. Suppose the table is populated as follows: area A: AVG(value_group_size) = $big_value area B: AVG(value_group_size) = $small_value area C: Let each tuple in combined {A|C} area have {kp1 == kp2}, so AVG(value_group_size) = $big_value Then: rec_per_key[1] = AVG(value_group_size in AC area ) = $big_value rec_per_key[0] = (somewhere between $small_value and $big_value); i.e. rec_per_key[0] < rec_per_key[1] i.e. E(#records(ref(kp1,kp12))) > E(#records(ref(kp1))) which is obviously not true. I state that the problem lies in (***). Proposed fix ============ Make different assumptions about the domain that the "probe tuple" belongs to. Let's start from 2-part key. Here an index values diagram (Vxy are !NULLS): kp1 kp2 +------+-------+ | | | | NULL | V2B | | | | +------+-------+ | | | | V1A | V2A | | | | +------+-------+ | | | | V1B | NULL | | | | +------+-------+ | | | | NULL | NULL | | | | +------+-------+ We'll assume that the "probe tuple" is {x,y} where x - random value from combined area V1A+V1B y - random value from combined area V2A+V2B Note that we include values like {x from V1B, y from V2B}. Then new rec_per_key values (denote them as rpc2) are: rpc2[0] = AVG(#records equal to a probe value from V1A+V2A) = = rec_per_key[0], no changes here. rpc2[1] = P(x in V1A AND y in V2A) * AVG(#records equal to a probe tuple in {a in V1A, b in V2A}) + P(x not in V1A AND y not in V2A) * AVG(...) // AVG=0 here. The first AVG(...) is the same as rec_per_key[0] in nulls_ignored method, i.e. rpc2[1] = P(x in V1A AND y in V2A) * #rec_per_key[1] = = {assume independence} = P(x in V1A)*P(y in V2A) * #rec_per_key[1]. P(x in V1A) = (#tuples that have kp1 NOT NULL and kp2 NOT NULL) / (#tuples that have kp1 NOT NULL and kp2 IS NULL) P(y in V2A) is calculated analogously.
[2 Oct 2007 21:41]
Sergey Petrunya
See also BUG#30423
[27 Jun 2012 14:46]
Valeriy Kravchuk
With 5.1.64 I do not see any problem with original test case: macbook-pro:5.1 openxs$ bin/mysql -uroot test <~/Downloads/foo2.dump macbook-pro:5.1 openxs$ bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 Server version: 5.1.64-debug Source distribution Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> explain SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i ON c.sessionid = i.sessionid WHERE c.timestamp < '2005-03-05 02:42:13' AND c.interface_id = 1 AND (i.status_id NOT IN (1,8) OR i.status_id IS NULL)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: c type: range possible_keys: clicktrack$timestamp key: clicktrack$timestamp key_len: 8 ref: NULL rows: 956 Extra: Using where; Using temporary *************************** 2. row *************************** id: 1 select_type: SIMPLE table: i type: ref possible_keys: incidents$sessionid key: incidents$sessionid key_len: 14 ref: test.c.sessionid rows: 1 Extra: Using where; Distinct 2 rows in set (0.04 sec) The same good plan is used for all values of myisam_stats_method: mysql> set session myisam_stats_method=nulls_ignored;Query OK, 0 rows affected (0.00 sec) mysql> explain SELECT DISTINCT c.sessionid FROM clicktrack c LEFT OUTER JOIN incidents i ON c.sessionid = i.sessionid WHERE c.timestamp < '2005-03-05 02:42:13' AND c.interface_id = 1 AND (i.status_id NOT IN (1,8) OR i.status_id IS NULL)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: c type: range possible_keys: clicktrack$timestamp key: clicktrack$timestamp key_len: 8 ref: NULL rows: 956 Extra: Using where; Using temporary *************************** 2. row *************************** id: 1 select_type: SIMPLE table: i type: ref possible_keys: incidents$sessionid key: incidents$sessionid key_len: 14 ref: test.c.sessionid rows: 1 Extra: Using where; Distinct 2 rows in set (0.00 sec) Probably this bug should just be properly documented and closed, with any remaining problems discussed elsewhere.
[24 Jul 2012 14:48]
Paul DuBois
Noted in 4.1.16 changelog as well.