Bug #69638 | Wrong results when running a SELECT that includes a HAVING based on a function | ||
---|---|---|---|
Submitted: | 1 Jul 2013 18:04 | Modified: | 17 Apr 2015 14:58 |
Reporter: | Roger Esteban | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
Version: | 5.6.12 | OS: | Linux (CentOS) |
Assigned to: | CPU Architecture: | Any | |
Tags: | FUNCTION, GROUP BY, having, regression, SELECT |
[1 Jul 2013 18:04]
Roger Esteban
[1 Jul 2013 18:09]
Roger Esteban
mysqldump of the sample database to reproduce the bug
Attachment: bug_having-dump.sql (application/octet-stream, text), 35.06 KiB.
[4 Jul 2013 11:12]
MySQL Verification Team
Hello Roger, Thank you for the bug report and the test case. Verified as described. Thanks, Umesh
[4 Jul 2013 11:28]
MySQL Verification Team
5.6.12 ======== How to repeat: source bug_having-dump.sql; mysql> SELECT tid, (rand()*100) AS r FROM bad group by tid HAVING r < 20 ORDER BY r DESC LIMIT 15; +------+--------------------+ | tid | r | +------+--------------------+ | 1528 | 88.64830666095791 | | 1635 | 47.58917507490998 | | 1634 | 30.503785266078808 | | 1503 | 30.4803796396408 | | 1636 | 29.660286688860772 | | 1524 | 21.281236709357458 | +------+--------------------+ 6 rows in set (0.06 sec) mysql> SELECT tid, (rand()*100) AS r FROM good group by tid HAVING r < 20 ORDER BY r DESC LIMIT 15; +------+--------------------+ | tid | r | +------+--------------------+ | 1489 | 18.16299689762573 | | 1465 | 17.354056069025823 | | 1467 | 15.09892094423894 | | 1271 | 8.159987542927254 | | 1430 | 6.693891255831244 | | 424 | 5.661855177638917 | | 1536 | 5.430176859190852 | | 1511 | 5.292637371730653 | | 1472 | 2.4737454973848028 | | 1528 | 1.0738670835959419 | +------+--------------------+ 10 rows in set (0.01 sec) mysql> desc bad; +-------+---------------------+------+-----+-------------------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+---------------------+------+-----+-------------------+-------+ | mid | bigint(20) unsigned | NO | | NULL | | | tid | bigint(20) unsigned | NO | MUL | NULL | | | date | timestamp | NO | | CURRENT_TIMESTAMP | | +-------+---------------------+------+-----+-------------------+-------+ 3 rows in set (0.00 sec) mysql> desc good; +-------+---------------------+------+-----+-------------------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+---------------------+------+-----+-------------------+-------+ | mid | bigint(20) unsigned | NO | | NULL | | | tid | bigint(20) unsigned | NO | | NULL | | | date | timestamp | NO | | CURRENT_TIMESTAMP | | +-------+---------------------+------+-----+-------------------+-------+ 3 rows in set (0.00 sec) mysql> show create table bad\G *************************** 1. row *************************** Table: bad Create Table: CREATE TABLE `bad` ( `mid` bigint(20) unsigned NOT NULL, `tid` bigint(20) unsigned NOT NULL, `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, KEY `bad_tid` (`tid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 1 row in set (0.00 sec) mysql> show create table good\G *************************** 1. row *************************** Table: good Create Table: CREATE TABLE `good` ( `mid` bigint(20) unsigned NOT NULL, `tid` bigint(20) unsigned NOT NULL, `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ) ENGINE=InnoDB DEFAULT CHARSET=utf8 1 row in set (0.00 sec) mysql> truncate bad; Query OK, 0 rows affected (0.02 sec) mysql> insert into bad select * from good; Query OK, 451 rows affected (0.01 sec) Records: 451 Duplicates: 0 Warnings: 0 mysql> SELECT tid, (rand()*100) AS r FROM bad group by tid HAVING r < 20 ORDER BY r DESC LIMIT 15; +------+--------------------+ | tid | r | +------+--------------------+ | 1323 | 87.94111878419399 | | 1063 | 86.05972936941285 | | 1465 | 81.74778668372686 | | 1536 | 80.05105394874798 | | 1653 | 64.17301610500833 | | 1467 | 62.57447485120452 | | 188 | 57.59990835338823 | | 694 | 57.11679976164996 | | 1636 | 55.80786918830859 | | 1498 | 52.03379909697342 | | 1486 | 47.34417791230993 | | 463 | 34.6296714941316 | | 1420 | 32.680558816232306 | | 1556 | 5.774004203988243 | +------+--------------------+ 14 rows in set (0.00 sec) mysql> SELECT tid, (rand()*100) AS r FROM good group by tid HAVING r < 20 ORDER BY r DESC LIMIT 15; +------+--------------------+ | tid | r | +------+--------------------+ | 1528 | 18.836151732910565 | | 1525 | 16.457368448802615 | | 1508 | 9.260958069247154 | | 1430 | 7.453798789022303 | | 1452 | 4.779329621027531 | | 316 | 3.657315488585565 | | 188 | 3.263545411884362 | | 1489 | 1.5885871849847837 | | 1498 | 1.336359792702235 | +------+--------------------+ 9 rows in set (0.00 sec) mysql> explain SELECT tid, (rand()*100) AS r FROM bad group by tid HAVING r < 20 ORDER BY r DESC LIMIT 15; +----+-------------+-------+-------+---------------+---------+---------+------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+------+----------------------------------------------+ | 1 | SIMPLE | bad | index | bad_tid | bad_tid | 8 | NULL | 451 | Using index; Using temporary; Using filesort | +----+-------------+-------+-------+---------------+---------+---------+------+------+----------------------------------------------+ 1 row in set (0.00 sec) mysql> explain SELECT tid, (rand()*100) AS r FROM good group by tid HAVING r < 20 ORDER BY r DESC LIMIT 15; +----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+ | 1 | SIMPLE | good | ALL | NULL | NULL | NULL | NULL | 451 | Using temporary; Using filesort | +----+-------------+-------+------+---------------+------+---------+------+------+---------------------------------+ 1 row in set (0.00 sec)
[4 Jul 2013 11:33]
MySQL Verification Team
// Works fine with 5.5.32(GA) mysql> select version(); +------------+ | version() | +------------+ | 5.5.32-log | +------------+ 1 row in set (0.00 sec) mysql> SELECT tid, (rand()*100) AS r FROM bad group by tid HAVING r < 20 ORDER BY r DESC LIMIT 15; +------+--------------------+ | tid | r | +------+--------------------+ | 1635 | 19.966749772398497 | | 1323 | 17.266090509729544 | | 1167 | 16.22010107731456 | | 1508 | 14.62252877151829 | | 972 | 11.49069602740062 | | 694 | 10.35648454945207 | | 1120 | 8.08822839342824 | | 316 | 7.063135883829682 | | 186 | 3.3950373562006626 | | 1634 | 2.6699658508132824 | +------+--------------------+ 10 rows in set (0.00 sec) mysql> SELECT tid, (rand()*100) AS r FROM good group by tid HAVING r < 20 ORDER BY r DESC LIMIT 15; +------+--------------------+ | tid | r | +------+--------------------+ | 1635 | 19.732079114515408 | | 1659 | 19.70466610016736 | | 1120 | 18.334571102945667 | | 1486 | 18.008704500280977 | | 1498 | 8.918220371863079 | | 274 | 6.555110408510184 | | 186 | 5.460767453034192 | | 1259 | 3.2911210351540903 | | 1230 | 0.136731751390483 | +------+--------------------+ 9 rows in set (0.00 sec)
[4 Jul 2013 14:12]
Roger Esteban
Thanks Umesh! Roger.
[3 Dec 2013 5:49]
Wai Wong
Is this fixed? It is a bit surprised that this is not yet fixed for 5 months. We found similar problems recently, in which some records, though exist, cannot be found, while in other cases, more than matching records are returned.
[17 Dec 2013 8:59]
Tor Didriksen
suggested rewrite: select tid, r from (select tid, (rand()*100) as r from bad) as x where r < 20 group by tid order by r desc limit 15;
[19 Dec 2013 3:44]
Wai Wong
In our case, we have many different sql's depending on user inputs, so it is not possible to rewrite them all. Moreover, for the same query with different parameters, the result may or may not have problem. So this "rewrite" approach will not work for our case.
[17 Apr 2015 14:58]
Paul DuBois
Noted in 5.6.25, 5.7.8, 5.8.0 changelogs. Queries that included a HAVING clause based on nondeterministic functions could produce incorrect results.
[24 Jun 2015 4:34]
Laurynas Biveinis
commit f289aeeef0743508ff87211084453b3b88a6d017 Author: Mithun C Y <mithun.c.y@oracle.com> Date: Wed Apr 15 15:08:25 2015 +0530 Bug #17055185 : WRONG RESULTS WHEN RUNNING A SELECT THAT INCLUDES A HAVING BASED ON A FUNCTION. ISSUE: ------ In end_write and end_write_group, Item_refs in having conditions should refer to tmp_table->record[0]. But join->ref_ptrs is not set to tmp_table->ref_array because of this Item_refs in having is referring to Items before tmp table. This lead to re-evaluation expressions in having. Non-deterministic functions like rand() on re-evaluation produces different results. If having refers to one such expression in select list and it re-evaluate them instead of referring to the saved results through tmp_table->ref_array, can result in inconsistent results. A similar issue in end_send where we need to refer to grouping ref_array. Example: In loose index scan where having refers to non-deterministic expression can result in inconsistent results. Solution: --------- If there is no select distinct, convert having condition to temp table conditions. With this Item_ref in having will be converted to Item_field of temp field. Since temp table already stores the result of group items, new Item_fields will refer to these stored results now. NOTE: ----- 1. We cannot set the join->ref_ptrs to tmp table ref_array as in end_send_group, because this have impact on PREPARE EXEC statements, reason is in JOIN::optimize we still do resolution under the flag "first_optimization", here we try to save select_lex->prep_where for next execution. Setting ref_ptrs can make outer ref in where clause to point to temp table items, so prep_where will point to some temp table items, which will be freed after execution, for next execution of same statement prep_where points to freed items, then de-referencing same!!!. This issue is solved with WL7082 in 5.7 2. If distinct is there I cannot move the having after applying distinct reason is if having items of having condition are not part of distinct then distinct key may remove rows witch satisfy having and keep which do not satisfy having. Thus me may loose some valid rows before applying having. 3. If plan is loose index scan we do not fix it in 5.6 because to fix this in end_send we need to set join->ref_ptrs to items3. As in point 1 doing any such things in 5.6 will lead to crash for PS. Already we have similar issue for end_send_group. so avoiding it. commit db31cfcca0cad19f00ad9b463523111434f48fed Author: Mithun C Y <mithun.c.y@oracle.com> Date: Tue Apr 21 10:23:16 2015 +0530 Bug #17055185 : WRONG RESULTS WHEN RUNNING A SELECT THAT INCLUDES A HAVING BASED ON A FUNCTION. Post push fix, corrected the test case to make result order irrelavant. commit 19f766bf3531add8eddfecad811cdaf84145292b Author: Mithun C Y <mithun.c.y@oracle.com> Date: Wed Apr 29 16:29:49 2015 +0530 Bug #17055185 : WRONG RESULTS WHEN RUNNING A SELECT THAT INCLUDES A HAVING BASED ON A FUNCTION. Post push fix, corrected the test case the results are ordered by column u duplicate values in u results in change in order of final projections of column t. Corrected the same by making values of column unique.