| Bug #45640 | optimizer bug produces wrong results | ||
|---|---|---|---|
| Submitted: | 21 Jun 2009 14:35 | Modified: | 20 Jun 2010 22:28 |
| Reporter: | Matthew Lord | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S1 (Critical) |
| Version: | 5.0.84-bzr, 5.1.35, 5.1.36-bzr, 5.4.4-bzr | OS: | Any |
| Assigned to: | Gleb Shchepa | CPU Architecture: | Any |
[21 Jun 2009 16:58]
Valeriy Kravchuk
Thank you for the bug report. Verified just as described (there should be distinct val3 in the last query):
valeriy-kravchuks-macbook-pro:5.1 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.1.36-debug-log Source distribution
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> drop table if exists DIM_test;
Query OK, 0 rows affected (0.00 sec)
mysql> create table DIM_test (id int unsigned auto_increment primary key, name varchar(128),
-> unique (name)) engine=myisam;
Query OK, 0 rows affected (0.07 sec)
mysql> insert into DIM_test VALUES
-> (NULL, 'abcdefgh'),
-> (NULL, 'abcdefg'),
-> (NULL, 'abcdefgk'),
-> (NULL, 'abcdefgl'),
-> (NULL,'abcdefgm'),
-> (NULL, 'abcdefgn');
Query OK, 6 rows affected (0.01 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> drop table if exists test;
Query OK, 0 rows affected (0.00 sec)
mysql> create table test (name varchar(128), val1 int unsigned, val2 int unsigned, val3 int
-> unsigned, index(name(16))) engine=myisam;
Query OK, 0 rows affected (0.14 sec)
mysql> insert into test VALUES
-> (NULL, 48, 12, 44454322),
-> ('abcdefgh',48, 12, 24333422),
-> ('abcdefgh',48, 12, 24333422),
-> ('abcdefgk', 32, 12, 4422422),
-> ('abcdefgm', 48, 12, 999248),
-> (NULL, 48, 12, 828290),
-> (NULL, 48, 12, 828290),
-> (NULL, 48, 12, 442242),
-> ('abcdefgh', 48, 12, 88892),
-> (NULL, 48, 12, 30),
-> (NULL, 48, 12, 442244),
-> ('abcdefgk', 48, 12, 88892);
Query OK, 12 rows affected (0.00 sec)
Records: 12 Duplicates: 0 Warnings: 0
mysql> select * from DIM_test;
+----+----------+
| id | name |
+----+----------+
| 1 | abcdefgh |
| 2 | abcdefg |
| 3 | abcdefgk |
| 4 | abcdefgl |
| 5 | abcdefgm |
| 6 | abcdefgn |
+----+----------+
6 rows in set (0.00 sec)
mysql> select * from test;
+----------+------+------+----------+
| name | val1 | val2 | val3 |
+----------+------+------+----------+
| NULL | 48 | 12 | 44454322 |
| abcdefgh | 48 | 12 | 24333422 |
| abcdefgh | 48 | 12 | 24333422 |
| abcdefgk | 32 | 12 | 4422422 |
| abcdefgm | 48 | 12 | 999248 |
| NULL | 48 | 12 | 828290 |
| NULL | 48 | 12 | 828290 |
| NULL | 48 | 12 | 442242 |
| abcdefgh | 48 | 12 | 88892 |
| NULL | 48 | 12 | 30 |
| NULL | 48 | 12 | 442244 |
| abcdefgk | 48 | 12 | 88892 |
+----------+------+------+----------+
12 rows in set (0.00 sec)
mysql> select if(isnull(name), 0, name) n, val1, val2, count(*)
-> from test
-> group by n, val1, val2;
+----------+------+------+----------+
| n | val1 | val2 | count(*) |
+----------+------+------+----------+
| 0 | 48 | 12 | 6 |
| abcdefgh | 48 | 12 | 3 |
| abcdefgk | 32 | 12 | 1 |
| abcdefgk | 48 | 12 | 1 |
| abcdefgm | 48 | 12 | 1 |
+----------+------+------+----------+
5 rows in set (0.01 sec)
mysql> select if(isnull(name), 0, name) n, val1, val2, count(distinct val3)
-> from test
-> group by n, val1, val2;
+----------+------+------+----------------------+
| n | val1 | val2 | count(distinct val3) |
+----------+------+------+----------------------+
| 0 | 48 | 12 | 5 |
| abcdefgh | 48 | 12 | 2 |
| abcdefgk | 32 | 12 | 1 |
| abcdefgk | 48 | 12 | 1 |
| abcdefgm | 48 | 12 | 1 |
+----------+------+------+----------------------+
5 rows in set (0.00 sec)
mysql> select if(isnull(name), 0, (select id from DIM_test where DIM_test.name=test.name)) n,
-> val1, val2, count(*)
-> from test
-> group by n, val1, val2;
+------+------+------+----------+
| n | val1 | val2 | count(*) |
+------+------+------+----------+
| 0 | 48 | 12 | 6 |
| 1 | 48 | 12 | 3 |
| 3 | 32 | 12 | 1 |
| 3 | 48 | 12 | 1 |
| 5 | 48 | 12 | 1 |
+------+------+------+----------+
5 rows in set (0.01 sec)
mysql> select if(isnull(name), 0, (select id from DIM_test where DIM_test.name=test.name)) n, val1, val2, count(distinct val3) from test group by n, val1, val2;
+------+------+------+----------------------+
| n | val1 | val2 | count(distinct val3) |
+------+------+------+----------------------+
| 3 | 32 | 12 | 1 |
| 3 | 48 | 12 | 3 |
| 0 | 48 | 12 | 5 |
+------+------+------+----------------------+
3 rows in set (0.01 sec)
The result above is incorrect. Actually, distinct matters:
mysql> select if(isnull(name), 0, (select id from DIM_test where DIM_test.name=test.name)) n, val1, val2, count(val3) from test group by n, val1, val2;
+------+------+------+-------------+
| n | val1 | val2 | count(val3) |
+------+------+------+-------------+
| 0 | 48 | 12 | 6 |
| 1 | 48 | 12 | 3 |
| 3 | 32 | 12 | 1 |
| 3 | 48 | 12 | 1 |
| 5 | 48 | 12 | 1 |
+------+------+------+-------------+
5 rows in set (0.01 sec)
Also EXPLAIN shows the reason:
mysql> explain select if(isnull(name), 0, (select id from DIM_test where DIM_test.name=test.name)) n, val1, val2, count(distinct val3) from test group by n, val1, val2\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: test
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 12
Extra: Using filesort
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: DIM_test
type: ref
possible_keys: name
key: name
key_len: 131
ref: func
rows: 2
Extra: Using where
2 rows in set (0.00 sec)
For the query with correct results temporary table is used:
mysql> explain select if(isnull(name), 0, (select id from DIM_test where DIM_test.name=test.name)) n, val1, val2, count(val3) from test group by n, val1, val2\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: test
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 12
Extra: Using temporary; Using filesort
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: DIM_test
type: ref
possible_keys: name
key: name
key_len: 131
ref: func
rows: 2
Extra: Using where
2 rows in set (0.00 sec)
[9 Nov 2009 20:16]
Gleb Shchepa
Also see duplicate bug #38217.
[9 Nov 2009 20:29]
Gleb Shchepa
Simplified test case: create table t1 (a int, b int); insert into t1 values (4, 40), (1, 10), (2, 20), (2, 20), (3, 30); select (select t1.a) aa, count(distinct b) from t1 group by aa; aa count(distinct b) 4 1 1 1 2 1 2 2 As you can see, output data consists of 1) incorrect (duplicated groups, lost group) and 2) unordered data. Output without DISTINCT: select (select t1.a) aa, count(b) from t1 group by aa; aa count(b) 1 1 2 2 3 1 4 1
[10 Nov 2009 9:38]
Gleb Shchepa
An intermediate patch: it solves the problem for simple GROUP BY expressions ("BY aa, ...", where "aa" is a subselect alias)
Attachment: 45640.patch (text/x-patch), 4.15 KiB.
[11 Nov 2009 22:07]
Gleb Shchepa
The sufficiency of the fix (see previous commentary with a patch) depends on the fix for the bug #48703: the find_order_in_list function doesn't do deep analysis of GROUP BY expressions but analyses/substitutesd the topmost items only, so inner items are unchecked and unchanged.
[14 Nov 2009 14:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/90437 3201 Gleb Shchepa 2009-11-14 Bug #45640: optimizer bug produces wrong results Grouping by a subquery in a query with a distinct aggregate function lead to a wrong result (wrong and unordered grouping values). There are two related problems: 1) The query like this: SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c FROM t1 GROUP BY aa returned wrong result, because the outer reference "t1.a" in the subquery was substituted with the Item_ref item. The Item_ref item obtains data from the result_field object that refreshes once after the end of each group. This data is not applicable to filesort since filesort() doesn't care about groups (and doesn't update result_field objects with copy_fields() and so on). Also that data is not applicable to group separation algorithm: end_send_group() checks every record with test_if_group_changed() that evaluates Item_ref items, but it refreshes those Item_ref-s only after the end of group, that is a vicious circle and the grouped column values in the output are shifted. Fix: if a) we grouping by a subquery and b) that subquery has outer references to FROM list of the grouping query, then we substitute these outer references with Item_direct_ref like references under aggregate functions: Item_direct_ref obtains data directly from the current record. 2) The query with a non-trivial grouping expression like: SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c FROM t1 GROUP BY aa+0 also returned wrong result, since JOIN::exec() substitutes references to top-level aliases in SELECT list with Item_copy caching items. Item_copy items have same refreshing policy as Item_ref items, so the whole groping expression with Item_copy inside returns wrong result in filesort() and end_send_group(). Fix: include aliased items into GROUP BY item tree instead of Item_ref references to them. @ mysql-test/r/group_by.result Test case for bug #45640 @ mysql-test/t/group_by.test Test case for bug #45640 @ sql/item.cc Bug #45640: optimizer bug produces wrong results - Item_field::fix_fields() has been modified to resolve aliases in GROUP BY item trees into aliased items instead of Item_ref items. - Item_outer_ref::walk() has been overloaded to implement Item_outer_ref tree search algorithm. @ sql/item.h Bug #45640: optimizer bug produces wrong results Item::find_outer_ref_processor() helper has been introduced and Item_outer_ref::walk() has been overloaded to implement Item_outer_ref tree search algorithm. @ sql/mysql_priv.h Bug #45640: optimizer bug produces wrong results fix_inner_refs() has been modified to accept group_list parameter. @ sql/sql_lex.cc Bug #45640: optimizer bug produces wrong results Initialization of st_select_lex::group_fix_field has been added. @ sql/sql_lex.h Bug #45640: optimizer bug produces wrong results The st_select_lex::group_fix_field field has been introduced to control alias resolution in Itef_fied::fix_fields. @ sql/sql_select.cc Bug #45640: optimizer bug produces wrong results - The fix_inner_refs function has been modified to treat subquery outer references like outer fields under aggregate functions, if they are included in GROUP BY item tree. - The find_order_in_list function has been modified to fix Item_field alias fields included in the GROUP BY item trees in a special manner.
[24 Dec 2009 9:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/95667 3295 Gleb Shchepa 2009-12-24 Bug #45640: optimizer bug produces wrong results Grouping by a subquery in a query with a distinct aggregate function lead to a wrong result (wrong and unordered grouping values). There are two related problems: 1) The query like this: SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c FROM t1 GROUP BY aa returned wrong result, because the outer reference "t1.a" in the subquery was substituted with the Item_ref item. The Item_ref item obtains data from the result_field object that refreshes once after the end of each group. This data is not applicable to filesort since filesort() doesn't care about groups (and doesn't update result_field objects with copy_fields() and so on). Also that data is not applicable to group separation algorithm: end_send_group() checks every record with test_if_group_changed() that evaluates Item_ref items, but it refreshes those Item_ref-s only after the end of group, that is a vicious circle and the grouped column values in the output are shifted. Fix: if a) we grouping by a subquery and b) that subquery has outer references to FROM list of the grouping query, then we substitute these outer references with Item_direct_ref like references under aggregate functions: Item_direct_ref obtains data directly from the current record. 2) The query with a non-trivial grouping expression like: SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c FROM t1 GROUP BY aa+0 also returned wrong result, since JOIN::exec() substitutes references to top-level aliases in SELECT list with Item_copy caching items. Item_copy items have same refreshing policy as Item_ref items, so the whole groping expression with Item_copy inside returns wrong result in filesort() and end_send_group(). Fix: include aliased items into GROUP BY item tree instead of Item_ref references to them. @ mysql-test/r/group_by.result Test case for bug #45640 @ mysql-test/t/group_by.test Test case for bug #45640 @ sql/item.cc Bug #45640: optimizer bug produces wrong results Item_field::fix_fields() has been modified to resolve aliases in GROUP BY item trees into aliased items instead of Item_ref items. @ sql/item.h Bug #45640: optimizer bug produces wrong results - Item::find_item_processor() has been introduced. - Item_ref::walk() has been modified to apply processors to itself too (not only to referenced item). @ sql/mysql_priv.h Bug #45640: optimizer bug produces wrong results fix_inner_refs() has been modified to accept group_list parameter. @ sql/sql_lex.cc Bug #45640: optimizer bug produces wrong results Initialization of st_select_lex::group_fix_field has been added. @ sql/sql_lex.h Bug #45640: optimizer bug produces wrong results The st_select_lex::group_fix_field field has been introduced to control alias resolution in Itef_fied::fix_fields. @ sql/sql_select.cc Bug #45640: optimizer bug produces wrong results - The fix_inner_refs function has been modified to treat subquery outer references like outer fields under aggregate functions, if they are included in GROUP BY item tree. - The find_order_in_list function has been modified to fix Item_field alias fields included in the GROUP BY item trees in a special manner.
[6 Feb 2010 22:00]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/99546 3370 Gleb Shchepa 2010-02-06 Bug #45640: optimizer bug produces wrong results Grouping by a subquery in a query with a distinct aggregate function lead to a wrong result (wrong and unordered grouping values). There are two related problems: 1) The query like this: SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c FROM t1 GROUP BY aa returned wrong result, because the outer reference "t1.a" in the subquery was substituted with the Item_ref item. The Item_ref item obtains data from the result_field object that refreshes once after the end of each group. This data is not applicable to filesort since filesort() doesn't care about groups (and doesn't update result_field objects with copy_fields() and so on). Also that data is not applicable to group separation algorithm: end_send_group() checks every record with test_if_group_changed() that evaluates Item_ref items, but it refreshes those Item_ref-s only after the end of group, that is a vicious circle and the grouped column values in the output are shifted. Fix: if a) we grouping by a subquery and b) that subquery has outer references to FROM list of the grouping query, then we substitute these outer references with Item_direct_ref like references under aggregate functions: Item_direct_ref obtains data directly from the current record. 2) The query with a non-trivial grouping expression like: SELECT (SELECT t1.a) aa, COUNT(DISTINCT b) c FROM t1 GROUP BY aa+0 also returned wrong result, since JOIN::exec() substitutes references to top-level aliases in SELECT list with Item_copy caching items. Item_copy items have same refreshing policy as Item_ref items, so the whole groping expression with Item_copy inside returns wrong result in filesort() and end_send_group(). Fix: include aliased items into GROUP BY item tree instead of Item_ref references to them. @ mysql-test/r/group_by.result Test case for bug #45640 @ mysql-test/t/group_by.test Test case for bug #45640 @ sql/item.cc Bug #45640: optimizer bug produces wrong results Item_field::fix_fields() has been modified to resolve aliases in GROUP BY item trees into aliased items instead of Item_ref items. @ sql/item.h Bug #45640: optimizer bug produces wrong results - Item::find_item_processor() has been introduced. - Item_ref::walk() has been modified to apply processors to itself too (not only to referenced item). @ sql/mysql_priv.h Bug #45640: optimizer bug produces wrong results fix_inner_refs() has been modified to accept group_list parameter. @ sql/sql_lex.cc Bug #45640: optimizer bug produces wrong results Initialization of st_select_lex::group_fix_field has been added. @ sql/sql_lex.h Bug #45640: optimizer bug produces wrong results The st_select_lex::group_fix_field field has been introduced to control alias resolution in Itef_fied::fix_fields. @ sql/sql_select.cc Bug #45640: optimizer bug produces wrong results - The fix_inner_refs function has been modified to treat subquery outer references like outer fields under aggregate functions, if they are included in GROUP BY item tree. - The find_order_in_list function has been modified to fix Item_field alias fields included in the GROUP BY item trees in a special manner.
[1 Mar 2010 8:46]
Bugs System
Pushed into 5.1.45 (revid:joro@sun.com-20100301083827-xnimmrjg6bh33o1o) (version source revid:joerg@mysql.com-20100212173307-ph563zr4wmoklgwd) (merge vers: 5.1.45) (pib:16)
[2 Mar 2010 14:34]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100302142746-u1gxdf5yk2bjrq3e) (version source revid:alik@sun.com-20100225090938-2j5ybqoau570mytu) (merge vers: 6.0.14-alpha) (pib:16)
[2 Mar 2010 14:39]
Bugs System
Pushed into 5.5.3-m2 (revid:alik@sun.com-20100302072233-t3uqgjzdukt1pyhe) (version source revid:alexey.kopytov@sun.com-20100209075938-mmcnyf6w631ozc45) (merge vers: 5.5.2-m2) (pib:16)
[2 Mar 2010 14:44]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100302072432-k8xvfkgcggkwgi94) (version source revid:alik@sun.com-20100224135227-rcqs9pe9b2in80pf) (pib:16)
[16 Mar 2010 0:25]
Paul DuBois
Noted in 5.1.45, 5.5.3, 6.0.14 changelogs. Grouping by a subquery in a query with a DISTINCT aggregate function led to incorrect and unordered grouping values.
[17 Jun 2010 12:07]
Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:54]
Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:martin.skold@mysql.com-20100609140708-52rvuyq4q500sxkq) (merge vers: 5.1.45-ndb-6.2.19) (pib:16)
[17 Jun 2010 13:35]
Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Description: An optimizer bug appears to be affecting the results of a GROUP BY based on the count() call used. How to repeat: drop table if exists DIM_test; create table DIM_test (id int unsigned auto_increment primary key, name varchar(128), unique (name)) engine=myisam; insert into DIM_test VALUES (NULL, 'abcdefgh'), (NULL, 'abcdefg'), (NULL, 'abcdefgk'), (NULL, 'abcdefgl'), (NULL,'abcdefgm'), (NULL, 'abcdefgn'); drop table if exists test; create table test (name varchar(128), val1 int unsigned, val2 int unsigned, val3 int unsigned, index(name(16))) engine=myisam; insert into test VALUES (NULL, 48, 12, 44454322), ('abcdefgh',48, 12, 24333422), ('abcdefgh',48, 12, 24333422), ('abcdefgk', 32, 12, 4422422), ('abcdefgm', 48, 12, 999248), (NULL, 48, 12, 828290), (NULL, 48, 12, 828290), (NULL, 48, 12, 442242), ('abcdefgh', 48, 12, 88892), (NULL, 48, 12, 30), (NULL, 48, 12, 442244), ('abcdefgk', 48, 12, 88892); select * from DIM_test; select * from test; select if(isnull(name), 0, name) n, val1, val2, count(*) from test group by n, val1, val2; select if(isnull(name), 0, name) n, val1, val2, count(distinct val3) from test group by n, val1, val2; select if(isnull(name), 0, (select id from DIM_test where DIM_test.name=test.name)) n, val1, val2, count(*) from test group by n, val1, val2; select if(isnull(name), 0, (select id from DIM_test where DIM_test.name=test.name)) n, val1, val2, count(distinct val2) from test group by n, val1, val2;