Bug #47694 | Setting join_cache_level= 1 produces extra, duplicate results | ||
---|---|---|---|
Submitted: | 29 Sep 2009 1:46 | Modified: | 5 Nov 2009 10:32 |
Reporter: | Patrick Crews | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.1/5.4/6.0 | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | duplicate rows, join_cache_level, regression |
[29 Sep 2009 1:46]
Patrick Crews
[30 Sep 2009 8:13]
Philip Stoev
Setting a "regression" tag due to join_cache_level = 1 being the default for the 6.0 codebase.
[9 Oct 2009 11:29]
Jørgen Løland
This problem can also be reproduced in 5.1 by replacing STRAIGHT_JOIN with JOIN. Reduced test case run in 5.1 follows: CREATE TABLE `CC` ( `pk` int(11) NOT NULL AUTO_INCREMENT, `int_nokey` int(11) DEFAULT NULL, `int_key` int(11) DEFAULT NULL, `varchar_key` varchar(1) DEFAULT NULL, PRIMARY KEY (`pk`), KEY `int_key` (`int_key`), KEY `varchar_key` (`varchar_key`,`int_key`) ) ENGINE=MEMORY AUTO_INCREMENT=30 DEFAULT CHARSET=latin1; INSERT INTO `CC` VALUES (3,NULL,0,'z'); INSERT INTO `CC` VALUES (4,3,133,'e'); INSERT INTO `CC` VALUES (5,5,1,'h'); INSERT INTO `CC` VALUES (6,NULL,5,'e'); INSERT INTO `CC` VALUES (7,1,8,'j'); INSERT INTO `CC` VALUES (8,0,6,'e'); INSERT INTO `CC` VALUES (9,210,51,'f'); INSERT INTO `CC` VALUES (10,8,4,'v'); INSERT INTO `CC` VALUES (1,7,7,'x'); INSERT INTO `CC` VALUES (2,NULL,4,'c'); CREATE TABLE `D` ( `pk` int(11) NOT NULL AUTO_INCREMENT, `int_nokey` int(11) DEFAULT NULL, `int_key` int(11) DEFAULT NULL, `varchar_key` varchar(1) DEFAULT NULL, PRIMARY KEY (`pk`), KEY `int_key` (`int_key`), KEY `varchar_key` (`varchar_key`,`int_key`) ) ENGINE=MEMORY AUTO_INCREMENT=101 DEFAULT CHARSET=latin1; INSERT INTO `D` VALUES (4,6,7,'c'); INSERT INTO `D` VALUES (11,9,8,'z'); INSERT INTO `D` VALUES (12,8,8,'h'); INSERT INTO `D` VALUES (13,NULL,8,'q'); INSERT INTO `D` VALUES (16,1,5,'j'); INSERT INTO `D` VALUES (18,6,7,'m'); INSERT INTO `D` VALUES (20,1,4,'e'); SELECT * FROM CC JOIN D ON D.`varchar_key`=CC.`varchar_key` WHERE (D.`pk`, CC.`int_nokey`) NOT IN (SELECT `int_key`, `int_nokey` FROM CC) ORDER BY CC.pk,D.pk; pk int_nokey int_key varchar_key pk int_nokey int_key varchar_key 2 NULL 4 c 4 6 7 c 3 NULL 0 z 11 9 8 z 4 3 133 e 20 1 4 e 5 5 1 h 12 8 8 h 6 NULL 5 e 20 1 4 e 7 1 8 j 16 1 5 j 8 0 6 e 20 1 4 e SELECT * FROM CC STRAIGHT_JOIN D ON D.`varchar_key`=CC.`varchar_key` WHERE (D.`pk`, CC.`int_nokey`) NOT IN (SELECT `int_key`, `int_nokey` FROM CC) ORDER BY CC.pk,D.pk; pk int_nokey int_key varchar_key pk int_nokey int_key varchar_key 4 3 133 e 20 1 4 e 5 5 1 h 12 8 8 h 7 1 8 j 16 1 5 j 8 0 6 e 20 1 4 e
[9 Oct 2009 12:08]
Jørgen Løland
Run with the dataset above, jcl=0 behaves like 5.1 while jcl=1 behaves the other way around: JCL=0: ====== SELECT * FROM CC JOIN D ON D.`varchar_key`=CC.`varchar_key` WHERE (D.`pk`, CC.`int_nokey`) NOT IN (SELECT `int_key`, `int_nokey` FROM CC) ORDER BY CC.pk,D.pk; pk int_nokey int_key varchar_key pk varchar_key 2 NULL 4 c 4 c 3 NULL 0 z 11 z 4 3 133 e 20 e 5 5 1 h 12 h 6 NULL 5 e 20 e 7 1 8 j 16 j 8 0 6 e 20 e SELECT * FROM CC STRAIGHT_JOIN D ON D.`varchar_key`=CC.`varchar_key` WHERE (D.`pk`, CC.`int_nokey`) NOT IN (SELECT `int_key`, `int_nokey` FROM CC) ORDER BY CC.pk,D.pk; pk int_nokey int_key varchar_key pk varchar_key 4 3 133 e 20 e 5 5 1 h 12 h 7 1 8 j 16 j 8 0 6 e 20 e JCL=1: ====== SELECT * FROM CC JOIN D ON D.`varchar_key`=CC.`varchar_key` WHERE (D.`pk`, CC.`int_nokey`) NOT IN (SELECT `int_key`, `int_nokey` FROM CC) ORDER BY CC.pk,D.pk; pk int_nokey int_key varchar_key pk varchar_key 4 3 133 e 20 e 5 5 1 h 12 h 7 1 8 j 16 j 8 0 6 e 20 e SELECT * FROM CC STRAIGHT_JOIN D ON D.`varchar_key`=CC.`varchar_key` WHERE (D.`pk`, CC.`int_nokey`) NOT IN (SELECT `int_key`, `int_nokey` FROM CC) ORDER BY CC.pk,D.pk; pk int_nokey int_key varchar_key pk varchar_key 2 NULL 4 c 4 c 3 NULL 0 z 11 z 4 3 133 e 20 e 5 5 1 h 12 h 6 NULL 5 e 20 e 7 1 8 j 16 j 8 0 6 e 20 e
[9 Oct 2009 12:12]
Jørgen Løland
Analysis: SELECT CC.pk, D.pk, CC.int_nokey FROM CC STRAIGHT_JOIN D ON D.`varchar_key`=CC.`varchar_key` ORDER BY CC.pk,D.pk; pk pk int_nokey 2 4 NULL 3 11 NULL 4 20 3 5 12 5 6 20 NULL 7 16 1 8 20 0 SELECT `int_key`, `int_nokey` FROM CC; int_key int_nokey 0 NULL 133 3 1 5 5 NULL 8 1 6 0 51 210 4 8 7 7 4 NULL For [<x> NOT IN <y>], where either x or y is NULL, the row shall be filtered away as per the standard. Hence, the correct result should be: pk int_nokey int_key varchar_key pk varchar_key 4 3 133 e 20 e 5 5 1 h 12 h 7 1 8 j 16 j 8 0 6 e 20 e
[9 Oct 2009 13:44]
Jørgen Løland
Requesting new triage since this bug is also repeatable on 5.1
[19 Oct 2009 11:20]
Jørgen Løland
SQL 2003 has these relevant sections, defining why the rows with NULL above should not be in the result set: WHERE clause: ------------- General Rules 1) Let T be the result of the preceding <from clause>. 2) The <search condition> is applied to each row of T. The result of the <where clause> is a table of those rows of T for which the result of the <search condition> is True. SEARCH condition: ----------------- Specify a condition that is True, False, or Unknown, depending on the value of a <boolean value expression>. BOOLEAN VALUE expression: ------------------------- * NOT (Unknown) is Unknown.
[20 Oct 2009 9:07]
Jørgen Løland
BUG#48177 is probably the root cause for this bug.
[5 Nov 2009 10:32]
Jørgen Løland
Confirmed fixed by BUG#48177.