Bug #45640 | optimizer bug produces wrong results | ||
---|---|---|---|
Submitted: | 21 Jun 2009 14:35 | Modified: | 20 Jun 2010 22:28 |
Reporter: | Matthew Lord | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S1 (Critical) |
Version: | 5.0.84-bzr, 5.1.35, 5.1.36-bzr, 5.4.4-bzr | OS: | Any |
Assigned to: | Gleb Shchepa | CPU Architecture: | Any |
[21 Jun 2009 14:35]
Matthew Lord
[21 Jun 2009 16:58]
Valeriy Kravchuk
Thank you for the bug report. Verified just as described (there should be distinct val3 in the last query): valeriy-kravchuks-macbook-pro:5.1 openxs$ bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 5 Server version: 5.1.36-debug-log Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> drop table if exists DIM_test; Query OK, 0 rows affected (0.00 sec) mysql> create table DIM_test (id int unsigned auto_increment primary key, name varchar(128), -> unique (name)) engine=myisam; Query OK, 0 rows affected (0.07 sec) mysql> insert into DIM_test VALUES -> (NULL, 'abcdefgh'), -> (NULL, 'abcdefg'), -> (NULL, 'abcdefgk'), -> (NULL, 'abcdefgl'), -> (NULL,'abcdefgm'), -> (NULL, 'abcdefgn'); Query OK, 6 rows affected (0.01 sec) Records: 6 Duplicates: 0 Warnings: 0 mysql> drop table if exists test; Query OK, 0 rows affected (0.00 sec) mysql> create table test (name varchar(128), val1 int unsigned, val2 int unsigned, val3 int -> unsigned, index(name(16))) engine=myisam; Query OK, 0 rows affected (0.14 sec) mysql> insert into test VALUES -> (NULL, 48, 12, 44454322), -> ('abcdefgh',48, 12, 24333422), -> ('abcdefgh',48, 12, 24333422), -> ('abcdefgk', 32, 12, 4422422), -> ('abcdefgm', 48, 12, 999248), -> (NULL, 48, 12, 828290), -> (NULL, 48, 12, 828290), -> (NULL, 48, 12, 442242), -> ('abcdefgh', 48, 12, 88892), -> (NULL, 48, 12, 30), -> (NULL, 48, 12, 442244), -> ('abcdefgk', 48, 12, 88892); Query OK, 12 rows affected (0.00 sec) Records: 12 Duplicates: 0 Warnings: 0 mysql> select * from DIM_test; +----+----------+ | id | name | +----+----------+ | 1 | abcdefgh | | 2 | abcdefg | | 3 | abcdefgk | | 4 | abcdefgl | | 5 | abcdefgm | | 6 | abcdefgn | +----+----------+ 6 rows in set (0.00 sec) mysql> select * from test; +----------+------+------+----------+ | name | val1 | val2 | val3 | +----------+------+------+----------+ | NULL | 48 | 12 | 44454322 | | abcdefgh | 48 | 12 | 24333422 | | abcdefgh | 48 | 12 | 24333422 | | abcdefgk | 32 | 12 | 4422422 | | abcdefgm | 48 | 12 | 999248 | | NULL | 48 | 12 | 828290 | | NULL | 48 | 12 | 828290 | | NULL | 48 | 12 | 442242 | | abcdefgh | 48 | 12 | 88892 | | NULL | 48 | 12 | 30 | | NULL | 48 | 12 | 442244 | | abcdefgk | 48 | 12 | 88892 | +----------+------+------+----------+ 12 rows in set (0.00 sec) mysql> select if(isnull(name), 0, name) n, val1, val2, count(*) -> from test -> group by n, val1, val2; +----------+------+------+----------+ | n | val1 | val2 | count(*) | +----------+------+------+----------+ | 0 | 48 | 12 | 6 | | abcdefgh | 48 | 12 | 3 | | abcdefgk | 32 | 12 | 1 | | abcdefgk | 48 | 12 | 1 | | abcdefgm | 48 | 12 | 1 | +----------+------+------+----------+ 5 rows in set (0.01 sec) mysql> select if(isnull(name), 0, name) n, val1, val2, count(distinct val3) -> from test -> group by n, val1, val2; +----------+------+------+----------------------+ | n | val1 | val2 | count(distinct val3) | +----------+------+------+----------------------+ | 0 | 48 | 12 | 5 | | abcdefgh | 48 | 12 | 2 | | abcdefgk | 32 | 12 | 1 | | abcdefgk | 48 | 12 | 1 | | abcdefgm | 48 | 12 | 1 | +----------+------+------+----------------------+ 5 rows in set (0.00 sec) mysql> select if(isnull(name), 0, (select id from DIM_test where DIM_test.name=test.name)) n, -> val1, val2, count(*) -> from test -> group by n, val1, val2; +------+------+------+----------+ | n | val1 | val2 | count(*) | +------+------+------+----------+ | 0 | 48 | 12 | 6 | | 1 | 48 | 12 | 3 | | 3 | 32 | 12 | 1 | | 3 | 48 | 12 | 1 | | 5 | 48 | 12 | 1 | +------+------+------+----------+ 5 rows in set (0.01 sec) mysql> select if(isnull(name), 0, (select id from DIM_test where DIM_test.name=test.name)) n, val1, val2, count(distinct val3) from test group by n, val1, val2; +------+------+------+----------------------+ | n | val1 | val2 | count(distinct val3) | +------+------+------+----------------------+ | 3 | 32 | 12 | 1 | | 3 | 48 | 12 | 3 | | 0 | 48 | 12 | 5 | +------+------+------+----------------------+ 3 rows in set (0.01 sec) The result above is incorrect. Actually, distinct matters: mysql> select if(isnull(name), 0, (select id from DIM_test where DIM_test.name=test.name)) n, val1, val2, count(val3) from test group by n, val1, val2; +------+------+------+-------------+ | n | val1 | val2 | count(val3) | +------+------+------+-------------+ | 0 | 48 | 12 | 6 | | 1 | 48 | 12 | 3 | | 3 | 32 | 12 | 1 | | 3 | 48 | 12 | 1 | | 5 | 48 | 12 | 1 | +------+------+------+-------------+ 5 rows in set (0.01 sec) Also EXPLAIN shows the reason: mysql> explain select if(isnull(name), 0, (select id from DIM_test where DIM_test.name=test.name)) n, val1, val2, count(distinct val3) from test group by n, val1, val2\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: test type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 12 Extra: Using filesort *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: DIM_test type: ref possible_keys: name key: name key_len: 131 ref: func rows: 2 Extra: Using where 2 rows in set (0.00 sec) For the query with correct results temporary table is used: mysql> explain select if(isnull(name), 0, (select id from DIM_test where DIM_test.name=test.name)) n, val1, val2, count(val3) from test group by n, val1, val2\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: test type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 12 Extra: Using temporary; Using filesort *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: DIM_test type: ref possible_keys: name key: name key_len: 131 ref: func rows: 2 Extra: Using where 2 rows in set (0.00 sec)
[9 Nov 2009 20:16]
Gleb Shchepa
Also see duplicate bug #38217.
[9 Nov 2009 20:29]
Gleb Shchepa
Simplified test case: create table t1 (a int, b int); insert into t1 values (4, 40), (1, 10), (2, 20), (2, 20), (3, 30); select (select t1.a) aa, count(distinct b) from t1 group by aa; aa count(distinct b) 4 1 1 1 2 1 2 2 As you can see, output data consists of 1) incorrect (duplicated groups, lost group) and 2) unordered data. Output without DISTINCT: select (select t1.a) aa, count(b) from t1 group by aa; aa count(b) 1 1 2 2 3 1 4 1
[10 Nov 2009 9:38]
Gleb Shchepa
An intermediate patch: it solves the problem for simple GROUP BY expressions ("BY aa, ...", where "aa" is a subselect alias)
Attachment: 45640.patch (text/x-patch), 4.15 KiB.
[11 Nov 2009 22:07]
Gleb Shchepa
The sufficiency of the fix (see previous commentary with a patch) depends on the fix for the bug #48703: the find_order_in_list function doesn't do deep analysis of GROUP BY expressions but analyses/substitutesd the topmost items only, so inner items are unchecked and unchanged.
[14 Nov 2009 14:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/90437 3201 Gleb Shchepa 2009-11-14 Bug #45640: optimizer bug produces wrong results Grouping by a subquery in a query with a distinct aggregate function lead to a wrong result (wrong and unordered grouping values). There are two related problems: 1) The query like this: SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c FROM t1 GROUP BY aa returned wrong result, because the outer reference "t1.a" in the subquery was substituted with the Item_ref item. The Item_ref item obtains data from the result_field object that refreshes once after the end of each group. This data is not applicable to filesort since filesort() doesn't care about groups (and doesn't update result_field objects with copy_fields() and so on). Also that data is not applicable to group separation algorithm: end_send_group() checks every record with test_if_group_changed() that evaluates Item_ref items, but it refreshes those Item_ref-s only after the end of group, that is a vicious circle and the grouped column values in the output are shifted. Fix: if a) we grouping by a subquery and b) that subquery has outer references to FROM list of the grouping query, then we substitute these outer references with Item_direct_ref like references under aggregate functions: Item_direct_ref obtains data directly from the current record. 2) The query with a non-trivial grouping expression like: SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c FROM t1 GROUP BY aa+0 also returned wrong result, since JOIN::exec() substitutes references to top-level aliases in SELECT list with Item_copy caching items. Item_copy items have same refreshing policy as Item_ref items, so the whole groping expression with Item_copy inside returns wrong result in filesort() and end_send_group(). Fix: include aliased items into GROUP BY item tree instead of Item_ref references to them. @ mysql-test/r/group_by.result Test case for bug #45640 @ mysql-test/t/group_by.test Test case for bug #45640 @ sql/item.cc Bug #45640: optimizer bug produces wrong results - Item_field::fix_fields() has been modified to resolve aliases in GROUP BY item trees into aliased items instead of Item_ref items. - Item_outer_ref::walk() has been overloaded to implement Item_outer_ref tree search algorithm. @ sql/item.h Bug #45640: optimizer bug produces wrong results Item::find_outer_ref_processor() helper has been introduced and Item_outer_ref::walk() has been overloaded to implement Item_outer_ref tree search algorithm. @ sql/mysql_priv.h Bug #45640: optimizer bug produces wrong results fix_inner_refs() has been modified to accept group_list parameter. @ sql/sql_lex.cc Bug #45640: optimizer bug produces wrong results Initialization of st_select_lex::group_fix_field has been added. @ sql/sql_lex.h Bug #45640: optimizer bug produces wrong results The st_select_lex::group_fix_field field has been introduced to control alias resolution in Itef_fied::fix_fields. @ sql/sql_select.cc Bug #45640: optimizer bug produces wrong results - The fix_inner_refs function has been modified to treat subquery outer references like outer fields under aggregate functions, if they are included in GROUP BY item tree. - The find_order_in_list function has been modified to fix Item_field alias fields included in the GROUP BY item trees in a special manner.
[24 Dec 2009 9:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/95667 3295 Gleb Shchepa 2009-12-24 Bug #45640: optimizer bug produces wrong results Grouping by a subquery in a query with a distinct aggregate function lead to a wrong result (wrong and unordered grouping values). There are two related problems: 1) The query like this: SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c FROM t1 GROUP BY aa returned wrong result, because the outer reference "t1.a" in the subquery was substituted with the Item_ref item. The Item_ref item obtains data from the result_field object that refreshes once after the end of each group. This data is not applicable to filesort since filesort() doesn't care about groups (and doesn't update result_field objects with copy_fields() and so on). Also that data is not applicable to group separation algorithm: end_send_group() checks every record with test_if_group_changed() that evaluates Item_ref items, but it refreshes those Item_ref-s only after the end of group, that is a vicious circle and the grouped column values in the output are shifted. Fix: if a) we grouping by a subquery and b) that subquery has outer references to FROM list of the grouping query, then we substitute these outer references with Item_direct_ref like references under aggregate functions: Item_direct_ref obtains data directly from the current record. 2) The query with a non-trivial grouping expression like: SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c FROM t1 GROUP BY aa+0 also returned wrong result, since JOIN::exec() substitutes references to top-level aliases in SELECT list with Item_copy caching items. Item_copy items have same refreshing policy as Item_ref items, so the whole groping expression with Item_copy inside returns wrong result in filesort() and end_send_group(). Fix: include aliased items into GROUP BY item tree instead of Item_ref references to them. @ mysql-test/r/group_by.result Test case for bug #45640 @ mysql-test/t/group_by.test Test case for bug #45640 @ sql/item.cc Bug #45640: optimizer bug produces wrong results Item_field::fix_fields() has been modified to resolve aliases in GROUP BY item trees into aliased items instead of Item_ref items. @ sql/item.h Bug #45640: optimizer bug produces wrong results - Item::find_item_processor() has been introduced. - Item_ref::walk() has been modified to apply processors to itself too (not only to referenced item). @ sql/mysql_priv.h Bug #45640: optimizer bug produces wrong results fix_inner_refs() has been modified to accept group_list parameter. @ sql/sql_lex.cc Bug #45640: optimizer bug produces wrong results Initialization of st_select_lex::group_fix_field has been added. @ sql/sql_lex.h Bug #45640: optimizer bug produces wrong results The st_select_lex::group_fix_field field has been introduced to control alias resolution in Itef_fied::fix_fields. @ sql/sql_select.cc Bug #45640: optimizer bug produces wrong results - The fix_inner_refs function has been modified to treat subquery outer references like outer fields under aggregate functions, if they are included in GROUP BY item tree. - The find_order_in_list function has been modified to fix Item_field alias fields included in the GROUP BY item trees in a special manner.
[6 Feb 2010 22:00]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/99546 3370 Gleb Shchepa 2010-02-06 Bug #45640: optimizer bug produces wrong results Grouping by a subquery in a query with a distinct aggregate function lead to a wrong result (wrong and unordered grouping values). There are two related problems: 1) The query like this: SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c FROM t1 GROUP BY aa returned wrong result, because the outer reference "t1.a" in the subquery was substituted with the Item_ref item. The Item_ref item obtains data from the result_field object that refreshes once after the end of each group. This data is not applicable to filesort since filesort() doesn't care about groups (and doesn't update result_field objects with copy_fields() and so on). Also that data is not applicable to group separation algorithm: end_send_group() checks every record with test_if_group_changed() that evaluates Item_ref items, but it refreshes those Item_ref-s only after the end of group, that is a vicious circle and the grouped column values in the output are shifted. Fix: if a) we grouping by a subquery and b) that subquery has outer references to FROM list of the grouping query, then we substitute these outer references with Item_direct_ref like references under aggregate functions: Item_direct_ref obtains data directly from the current record. 2) The query with a non-trivial grouping expression like: SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c FROM t1 GROUP BY aa+0 also returned wrong result, since JOIN::exec() substitutes references to top-level aliases in SELECT list with Item_copy caching items. Item_copy items have same refreshing policy as Item_ref items, so the whole groping expression with Item_copy inside returns wrong result in filesort() and end_send_group(). Fix: include aliased items into GROUP BY item tree instead of Item_ref references to them. @ mysql-test/r/group_by.result Test case for bug #45640 @ mysql-test/t/group_by.test Test case for bug #45640 @ sql/item.cc Bug #45640: optimizer bug produces wrong results Item_field::fix_fields() has been modified to resolve aliases in GROUP BY item trees into aliased items instead of Item_ref items. @ sql/item.h Bug #45640: optimizer bug produces wrong results - Item::find_item_processor() has been introduced. - Item_ref::walk() has been modified to apply processors to itself too (not only to referenced item). @ sql/mysql_priv.h Bug #45640: optimizer bug produces wrong results fix_inner_refs() has been modified to accept group_list parameter. @ sql/sql_lex.cc Bug #45640: optimizer bug produces wrong results Initialization of st_select_lex::group_fix_field has been added. @ sql/sql_lex.h Bug #45640: optimizer bug produces wrong results The st_select_lex::group_fix_field field has been introduced to control alias resolution in Itef_fied::fix_fields. @ sql/sql_select.cc Bug #45640: optimizer bug produces wrong results - The fix_inner_refs function has been modified to treat subquery outer references like outer fields under aggregate functions, if they are included in GROUP BY item tree. - The find_order_in_list function has been modified to fix Item_field alias fields included in the GROUP BY item trees in a special manner.
[1 Mar 2010 8:46]
Bugs System
Pushed into 5.1.45 (revid:joro@sun.com-20100301083827-xnimmrjg6bh33o1o) (version source revid:joerg@mysql.com-20100212173307-ph563zr4wmoklgwd) (merge vers: 5.1.45) (pib:16)
[2 Mar 2010 14:34]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100302142746-u1gxdf5yk2bjrq3e) (version source revid:alik@sun.com-20100225090938-2j5ybqoau570mytu) (merge vers: 6.0.14-alpha) (pib:16)
[2 Mar 2010 14:39]
Bugs System
Pushed into 5.5.3-m2 (revid:alik@sun.com-20100302072233-t3uqgjzdukt1pyhe) (version source revid:alexey.kopytov@sun.com-20100209075938-mmcnyf6w631ozc45) (merge vers: 5.5.2-m2) (pib:16)
[2 Mar 2010 14:44]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100302072432-k8xvfkgcggkwgi94) (version source revid:alik@sun.com-20100224135227-rcqs9pe9b2in80pf) (pib:16)
[16 Mar 2010 0:25]
Paul DuBois
Noted in 5.1.45, 5.5.3, 6.0.14 changelogs. Grouping by a subquery in a query with a DISTINCT aggregate function led to incorrect and unordered grouping values.
[17 Jun 2010 12:07]
Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:54]
Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:martin.skold@mysql.com-20100609140708-52rvuyq4q500sxkq) (merge vers: 5.1.45-ndb-6.2.19) (pib:16)
[17 Jun 2010 13:35]
Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)