Bug #45224 SELECT col_1 WHERE column(s) IN (subquery) AND condition GROUP BY col_n fails
Submitted: 31 May 2009 22:34 Modified: 1 Jun 2009 10:28
Reporter: Patrick Crews Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.4 OS:Any
Assigned to: CPU Architecture:Any
Tags: GROUP BY, IN, Optimizer, RQG, subquery

[31 May 2009 22:34] Patrick Crews
Description:
Queries of the pattern:
SELECT col_1 FROM TABLE WHERE <column_list> IN (subquery) AND <condition> GROUP BY col_n are producing differing values when run against 5.1-bugteam and 5.4 (as noted by the Random Query Generator's subquery_semijoin_nested grammar):

From my observations, the differences are that both versions return the same number of rows, but that the result sets are off by one value (5.1 returns 1,3,5 while 5.4 returns 1,4,5):

# 17:06:25 Query: SELECT  GRANDPARENT1 . `time_nokey` AS G1 FROM CC AS GRANDPARENT1 WHERE GRANDPARENT1 . `varchar_key` IN ( SELECT  PARENT1 . `varchar_nokey` AS P1 FROM CC AS PARENT1 LEFT JOIN B AS PARENT2 ON ( PARENT1 . `pk` <> PARENT2 . `int_key` )   ) AND GRANDPARENT1 . `datetime_key` >= '2007-09-08' GROUP BY GRANDPARENT1 . `int_key` HAVING G1 < 's' ORDER BY GRANDPARENT1 . `varchar_nokey`  failed: result content mismatch between servers.
--- /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen3502-server0.dump        2009-05-31 17:06:25.000000000 -0400
+++ /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen3502-server1.dump        2009-05-31 17:06:25.000000000 -0400
@@ -1,3 +1,3 @@
 00:00:00
+12:27:34
 15:12:24
-15:48:24
# 17:06:27 Simplified query:  SELECT `time_nokey`  FROM CC  WHERE `varchar_key`  IN (  SELECT `varchar_nokey`  FROM CC  )  AND `datetime_key`  >= '2007-09-08' GROUP  BY `int_key`

Can provide further examples as needed.

How to repeat:
Run the random query generator

./runall.pl \
  --basedir1=<path>/mysql-5.1-bugteam \
  --basedir2=<path>/mysql-azalea \
  --grammar=conf/subquery_semijoin_nested.yy \
  --threads=1 \
  --queries=1000 \
  --mysqld2=--init-file=<path>/mysql-test-extra-6.0/mysql-test/gentest/init/no_subquery.sql \
  --validator=ResultsetComparatorSimplify \
  --queries=100000

It should also be possible to create a simplified grammar that focuses on the failure pattern as well.

Suggested fix:
Ensure 5.4 queries are processed correctly.
[1 Jun 2009 10:28] Patrick Crews
Misinterpreted RQG output.
The query pattern noted is not definitive because if one SELECT's on column X and then GROUP BY's on column Y, then the result is undefined, since the Optimizer is free to pick any value for X .

Closing as Not a Bug