Bug #50539 | GROUP BY + COUNT(DISTINCT ) on an int column shows results off by one | ||
---|---|---|---|
Submitted: | 22 Jan 2010 10:36 | Modified: | 7 Mar 2010 1:12 |
Reporter: | Benjamin Schuster-Böckler | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
Version: | 5.5.0, 5.5.2-m2, 6.0.14-bzr | OS: | Any (64bit) |
Assigned to: | Evgeny Potemkin | CPU Architecture: | Any |
Tags: | regression |
[22 Jan 2010 10:36]
Benjamin Schuster-Böckler
[22 Jan 2010 11:18]
Valeriy Kravchuk
Verified with recent trunk tree from bzr: openxs@suse:/home2/openxs/dbs/trunk> bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 Server version: 5.5.2-m2-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> load data infile '~/melanoma_snps.red' into table `melanoma_snps` ; Query OK, 4985 rows affected (0.31 sec) Records: 4985 Deleted: 0 Skipped: 0 Warnings: 0 mysql> SELECT COUNT(DISTINCT position) -> FROM melanoma_snps -> WHERE chromosome = 1; +--------------------------+ | COUNT(DISTINCT position) | +--------------------------+ | 278 | +--------------------------+ 1 row in set (0.05 sec) mysql> SELECT chromosome, COUNT(DISTINCT position) -> FROM melanoma_snps -> GROUP BY chromosome -> ; +------------+--------------------------+ | chromosome | COUNT(DISTINCT position) | +------------+--------------------------+ | 2 | 278 | | 3 | 471 | | 4 | 404 | | 5 | 438 | | 6 | 209 | | 7 | 335 | | 8 | 339 | | 9 | 357 | | 10 | 205 | | 11 | 179 | | 12 | 144 | | 13 | 274 | | 14 | 232 | | 15 | 113 | | 16 | 90 | | 17 | 114 | | 18 | 111 | | 19 | 139 | | 20 | 87 | | 21 | 112 | | 22 | 81 | | 23 | 74 | | 23 | 199 | +------------+--------------------------+ 23 rows in set (0.11 sec) This is a regression bug, as 5.0.90 shows: openxs@suse:/home2/openxs/dbs/5.0> bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.0.90-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CREATE TABLE `melanoma_snps` ( `chromosome` int(11) NOT NULL DEFAULT '0', `position` int(10) unsigned NOT NULL DEFAULT '0', `reference` char(1) DEFAULT NULL, `mutant` char(1) NOT NULL DEFAULT '', `zygosity` enum('het','hom') DEFAULT NULL, `effect` enum('Intergenic','Intronic','Missense','Noncoding_RNA','Nonsense','Silent','Splice','UTR') DEFAULT NULL, `validation` enum('"REAL, STATUS UNKNOWN"','COSMIC','N/A','SOMATIC','UNCONFIRMED') DEFAULT NULL, PRIMARY KEY (`chromosome`,`position`,`mutant`), KEY `k1` (`mutant`), KEY `k2` (`reference`), KEY `k3` (`zygosity`), KEY `k4` (`position`), KEY `k5` (`chromosome`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; Query OK, 0 rows affected (0.08 sec) mysql> load data infile '~/melanoma_snps.red' into table `melanoma_snps` ; Query OK, 4985 rows affected (0.22 sec) Records: 4985 Deleted: 0 Skipped: 0 Warnings: 0 mysql> SELECT chromosome, COUNT(DISTINCT position) FROM melanoma_snps GROUP BY chromosome; +------------+--------------------------+ | chromosome | COUNT(DISTINCT position) | +------------+--------------------------+ | 1 | 278 | | 2 | 471 | | 3 | 404 | | 4 | 438 | | 5 | 209 | | 6 | 335 | | 7 | 339 | | 8 | 357 | | 9 | 205 | | 10 | 179 | | 11 | 144 | | 12 | 274 | | 13 | 232 | | 14 | 113 | | 15 | 90 | | 16 | 114 | | 17 | 111 | | 18 | 139 | | 19 | 87 | | 20 | 112 | | 21 | 81 | | 22 | 74 | | 23 | 199 | +------------+--------------------------+ 23 rows in set (0.04 sec)
[22 Jan 2010 11:19]
Valeriy Kravchuk
5.1.44 is also NOT affected.
[27 Jan 2010 12:16]
Manyi Lu
Valeriy, could you please check whether this bug can be reproduced in 6.0-codebase-bugfixing or the trunk? Thanks, Manyi
[27 Jan 2010 13:30]
Valeriy Kravchuk
Bug is verified on trunk tree. Do you mean next-mr? Cloning mysql-6.0-codebase-bugfixing now...
[27 Jan 2010 14:27]
Valeriy Kravchuk
mysql-6.0-codebase is also affected: 77-52-24-143:6.0-codebase openxs$ bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 Server version: 6.0.14-alpha-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> load data infile '~/Downloads/melanoma_snps.red' into table `melanoma_snps` ; Query OK, 4985 rows affected (0.18 sec) Records: 4985 Deleted: 0 Skipped: 0 Warnings: 0 mysql> SELECT COUNT(DISTINCT position) -> FROM melanoma_snps -> WHERE chromosome = 1; +--------------------------+ | COUNT(DISTINCT position) | +--------------------------+ | 278 | +--------------------------+ 1 row in set (0.00 sec) mysql> SELECT chromosome, COUNT(DISTINCT position) -> FROM melanoma_snps -> GROUP BY chromosome; +------------+--------------------------+ | chromosome | COUNT(DISTINCT position) | +------------+--------------------------+ | 2 | 278 | | 3 | 471 | | 4 | 404 | | 5 | 438 | | 6 | 209 | | 7 | 335 | | 8 | 339 | | 9 | 357 | | 10 | 205 | | 11 | 179 | | 12 | 144 | | 13 | 274 | | 14 | 232 | | 15 | 113 | | 16 | 90 | | 17 | 114 | | 18 | 111 | | 19 | 139 | | 20 | 87 | | 21 | 112 | | 22 | 81 | | 23 | 74 | | 23 | 199 | +------------+--------------------------+ 23 rows in set (0.09 sec)
[8 Feb 2010 16:04]
Patrick Crews
From initial tests against this bug, not all of the int column values are being displayed at +1 to true value: mysql> insert into melanoma_snps values (1,2,'a','a','het','',''); Query OK, 1 row affected, 2 warnings (0.06 sec) mysql> insert into melanoma_snps values (2,2,'a','a','het','',''); Query OK, 1 row affected, 2 warnings (0.00 sec) mysql> insert into melanoma_snps values (3,2,'a','a','het','',''); Query OK, 1 row affected, 2 warnings (0.00 sec) mysql> insert into melanoma_snps values (2,2,'a','a','het','',''); ERROR 1062 (23000): Duplicate entry '2-2-a' for key 'PRIMARY' mysql> insert into melanoma_snps values (2,3,'a','a','het','',''); Query OK, 1 row affected, 2 warnings (0.00 sec) mysql> insert into melanoma_snps values (1,3,'a','a','het','',''); Query OK, 1 row affected, 2 warnings (0.00 sec) mysql> insert into melanoma_snps values (1,1,'a','a','het','',''); Query OK, 1 row affected, 2 warnings (0.00 sec) mysql> insert into melanoma_snps values (1,5,'a','a','het','',''); Query OK, 1 row affected, 2 warnings (0.00 sec) mysql> SELECT chromosome, COUNT(DISTINCT position) -> FROM melanoma_snps -> GROUP BY chromosome -> ; +------------+--------------------------+ | chromosome | COUNT(DISTINCT position) | +------------+--------------------------+ | 2 | 4 | | 3 | 2 | | 3 | 1 | +------------+--------------------------+ 3 rows in set (0.04 sec) Note that the row for chromosome 3 (last row) shows the correct chromosome value. Adding additional rows after this did not cause the int value to increment.
[8 Feb 2010 16:09]
Patrick Crews
NOTE: The last value of chromosome is not incremented. We can see that the value 23 is repeated 2x in the original bug report. So, it seems that everything but MAX(int_column) is incremented while MAX(int_column remains as it should be) mysql> insert into melanoma_snps values (1,1,'a','a','het','','');Query OK, 1 row affected, 2 warnings (0.00 sec) mysql> insert into melanoma_snps values (1,2,'a','a','het','',''); Query OK, 1 row affected, 2 warnings (0.00 sec) mysql> insert into melanoma_snps values (1,3,'a','a','het','',''); Query OK, 1 row affected, 2 warnings (0.00 sec) mysql> insert into melanoma_snps values (1,4,'a','a','het','',''); Query OK, 1 row affected, 2 warnings (0.00 sec) mysql> insert into melanoma_snps values (2,1,'a','a','het','',''); Query OK, 1 row affected, 2 warnings (0.00 sec) mysql> SELECT chromosome, COUNT(DISTINCT position) FROM melanoma_snps GROUP BY chromosome; +------------+--------------------------+ | chromosome | COUNT(DISTINCT position) | +------------+--------------------------+ | 2 | 4 | | 2 | 1 | +------------+--------------------------+ 2 rows in set (0.00 sec) mysql> insert into melanoma_snps values (3,1,'a','a','het','','');Query OK, 1 row affected, 2 warnings (0.00 sec) mysql> SELECT chromosome, COUNT(DISTINCT position) FROM melanoma_snps GROUP BY chromosome; +------------+--------------------------+ | chromosome | COUNT(DISTINCT position) | +------------+--------------------------+ | 2 | 4 | | 3 | 1 | | 3 | 1 | +------------+--------------------------+ 3 rows in set (0.00 sec) mysql> insert into melanoma_snps values (4,1,'a','a','het','','');Query OK, 1 row affected, 2 warnings (0.00 sec) mysql> SELECT chromosome, COUNT(DISTINCT position) FROM melanoma_snps GROUP BY chromosome; +------------+--------------------------+ | chromosome | COUNT(DISTINCT position) | +------------+--------------------------+ | 2 | 4 | | 3 | 1 | | 4 | 1 | | 4 | 1 | +------------+--------------------------+ 4 rows in set (0.00 sec)
[8 Feb 2010 18:44]
Patrick Crews
This bug also needs a multi-part key (see the PRIMARY KEY statement) on the table, and the SELECT statement needs to have an aggregate + DISTINCT (without DISTINCT appears fine) on the 2nd part of the key with a GROUP BY on the first part of the key.
[10 Feb 2010 13:43]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/99821 2991 Evgeny Potemkin 2010-02-10 Bug#50539: Wrong result when loose index scan is used for an aggregate function with distinct. Loose index scan is used to find MIN/MAX values using appropriate index and thus allow to avoid grouping. For each found row it updates non-aggregated fields with values from row with found MIN/MAX value. WL#3220 allows to use loose index scan to speed up queries with aggregate functions with distinct. In such cases it doesn't calculate the result of aggregate functions but simply jumps to a next distinct value. This scenario wasn't taken into account by the QUICK_GROUP_MIN_MAX_SELECT::get_next and it was updating non aggregated fields to values from current row which in this case might belong to the next group. The QUICK_GROUP_MIN_MAX_SELECT::get_next now updates non-aggregated fields only if it's not used to speedup aggregate function with distinct. @ mysql-test/r/group_min_max.result Added a test case for the bug#50539. @ mysql-test/t/group_min_max.test Added a test case for the bug#50539. @ sql/opt_range.cc Bug#50539: Wrong result when loose index scan is used for an aggregate function with distinct. The QUICK_GROUP_MIN_MAX_SELECT::get_next now updates non-aggregated fields only if it's not used to speedup aggregate function with distinct.
[11 Feb 2010 16:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/99976 2991 Evgeny Potemkin 2010-02-11 Bug#50539: Wrong result when loose index scan is used for an aggregate function with distinct. Loose index scan is used to find MIN/MAX values using appropriate index and thus allow to avoid grouping. For each found row it updates non-aggregated fields with values from row with found MIN/MAX value. Without loose index scan non-aggregated fields are copied by end_send_group function. With loose index scan there is no need in end_send_group and end_send is used instead. Non-aggregated fields still need to be copied and this was wrongly implemented in QUICK_GROUP_MIN_MAX_SELECT::get_next. WL#3220 added a case when loose index scan can be used with end_send_group to optimize calculation of aggregate functions with distinct. In this case the row found by QUICK_GROUP_MIN_MAX_SELECT::get_next might belong to a next group and copying it will produce wrong result. Update of non-aggregated fields is moved to the end_send function from QUICK_GROUP_MIN_MAX_SELECT::get_next. @ mysql-test/r/group_min_max.result Added a test case for the bug#50539. @ mysql-test/t/group_min_max.test Added a test case for the bug#50539. @ sql/opt_range.cc Bug#50539: Wrong result when loose index scan is used for an aggregate function with distinct. Update of non-aggregated fields is moved to the end_send function from QUICK_GROUP_MIN_MAX_SELECT::get_next. @ sql/sql_select.cc Bug#50539: Wrong result when loose index scan is used for an aggregate function with distinct. Update of non-aggregated fields is moved to the end_send function from QUICK_GROUP_MIN_MAX_SELECT::get_next.
[11 Feb 2010 17:21]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/99986 2996 Evgeny Potemkin 2010-02-11 [merge] Auto-merged fix for the bug#50539.
[12 Feb 2010 8:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/100081 2943 Evgeny Potemkin 2010-02-12 Bug#50539: Wrong result when loose index scan is used for an aggregate function with distinct. Loose index scan is used to find MIN/MAX values using appropriate index and thus allow to avoid grouping. For each found row it updates non-aggregated fields with values from row with found MIN/MAX value. Without loose index scan non-aggregated fields are copied by end_send_group function. With loose index scan there is no need in end_send_group and end_send is used instead. Non-aggregated fields still need to be copied and this was wrongly implemented in QUICK_GROUP_MIN_MAX_SELECT::get_next. WL#3220 added a case when loose index scan can be used with end_send_group to optimize calculation of aggregate functions with distinct. In this case the row found by QUICK_GROUP_MIN_MAX_SELECT::get_next might belong to a next group and copying it will produce wrong result. Update of non-aggregated fields is moved to the end_send function from QUICK_GROUP_MIN_MAX_SELECT::get_next. @ mysql-test/r/group_min_max.result Added a test case for the bug#50539. @ mysql-test/t/group_min_max.test Added a test case for the bug#50539. @ sql/opt_range.cc Bug#50539: Wrong result when loose index scan is used for an aggregate function with distinct. Update of non-aggregated fields is moved to the end_send function from QUICK_GROUP_MIN_MAX_SELECT::get_next. @ sql/sql_select.cc Bug#50539: Wrong result when loose index scan is used for an aggregate function with distinct. Update of non-aggregated fields is moved to the end_send function from QUICK_GROUP_MIN_MAX_SELECT::get_next.
[12 Feb 2010 9:42]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/100086 2999 Evgeny Potemkin 2010-02-12 [merge] Auto-merged fix for the bug#50539.
[12 Feb 2010 9:50]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/100090 3883 Evgeny Potemkin 2010-02-12 [merge] Auto-merged fix for bug#50539.
[12 Feb 2010 17:40]
Bugs System
Pushed into 5.5.2-m2 (revid:joerg@mysql.com-20100212164100-jnurxdw5z88m472s) (version source revid:joerg@mysql.com-20100212164100-jnurxdw5z88m472s) (merge vers: 5.5.2-m2) (pib:16)
[13 Feb 2010 8:36]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100213083436-9pesg4h55w1mekxc) (version source revid:alik@sun.com-20100212100039-eh6atbvijcm71eyb) (merge vers: 6.0.14-alpha) (pib:16)
[13 Feb 2010 8:39]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100213083327-cee4ao3jpg33eggv) (version source revid:alik@sun.com-20100212095912-k3fklaqrxzzls9cd) (pib:16)
[25 Feb 2010 0:36]
Paul DuBois
Noted in 5.5.2, 6.0.14 changelogs. Use of loose index scan optimization for an aggregate function with DISTINCT (for example, COUNT(DISTINCT)) could produce incorrect results. Setting report to Need Merge pending push of Celosia into release tree.
[6 Mar 2010 11:06]
Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:vvaintroub@mysql.com-20100213160132-nx1vlocxuta76txh) (merge vers: 5.5.99-m3) (pib:16)
[7 Mar 2010 1:12]
Paul DuBois
Already fixed in earlier 5.5.x.