| Bug #47694 | Setting join_cache_level= 1 produces extra, duplicate results | ||
|---|---|---|---|
| Submitted: | 29 Sep 2009 1:46 | Modified: | 5 Nov 2009 10:32 |
| Reporter: | Patrick Crews | Email Updates: | |
| Status: | Duplicate | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
| Version: | 5.1/5.4/6.0 | OS: | Any |
| Assigned to: | Assigned Account | CPU Architecture: | Any |
| Tags: | duplicate rows, join_cache_level, regression | ||
[29 Sep 2009 1:46]
Patrick Crews
[30 Sep 2009 8:13]
Philip Stoev
Setting a "regression" tag due to join_cache_level = 1 being the default for the 6.0 codebase.
[9 Oct 2009 11:29]
Jørgen Løland
This problem can also be reproduced in 5.1 by replacing STRAIGHT_JOIN with JOIN. Reduced test case run in 5.1 follows:
CREATE TABLE `CC` (
`pk` int(11) NOT NULL AUTO_INCREMENT,
`int_nokey` int(11) DEFAULT NULL,
`int_key` int(11) DEFAULT NULL,
`varchar_key` varchar(1) DEFAULT NULL,
PRIMARY KEY (`pk`),
KEY `int_key` (`int_key`),
KEY `varchar_key` (`varchar_key`,`int_key`)
) ENGINE=MEMORY AUTO_INCREMENT=30 DEFAULT CHARSET=latin1;
INSERT INTO `CC` VALUES (3,NULL,0,'z');
INSERT INTO `CC` VALUES (4,3,133,'e');
INSERT INTO `CC` VALUES (5,5,1,'h');
INSERT INTO `CC` VALUES (6,NULL,5,'e');
INSERT INTO `CC` VALUES (7,1,8,'j');
INSERT INTO `CC` VALUES (8,0,6,'e');
INSERT INTO `CC` VALUES (9,210,51,'f');
INSERT INTO `CC` VALUES (10,8,4,'v');
INSERT INTO `CC` VALUES (1,7,7,'x');
INSERT INTO `CC` VALUES (2,NULL,4,'c');
CREATE TABLE `D` (
`pk` int(11) NOT NULL AUTO_INCREMENT,
`int_nokey` int(11) DEFAULT NULL,
`int_key` int(11) DEFAULT NULL,
`varchar_key` varchar(1) DEFAULT NULL,
PRIMARY KEY (`pk`),
KEY `int_key` (`int_key`),
KEY `varchar_key` (`varchar_key`,`int_key`)
) ENGINE=MEMORY AUTO_INCREMENT=101 DEFAULT CHARSET=latin1;
INSERT INTO `D` VALUES (4,6,7,'c');
INSERT INTO `D` VALUES (11,9,8,'z');
INSERT INTO `D` VALUES (12,8,8,'h');
INSERT INTO `D` VALUES (13,NULL,8,'q');
INSERT INTO `D` VALUES (16,1,5,'j');
INSERT INTO `D` VALUES (18,6,7,'m');
INSERT INTO `D` VALUES (20,1,4,'e');
SELECT *
FROM CC JOIN D
ON D.`varchar_key`=CC.`varchar_key`
WHERE (D.`pk`, CC.`int_nokey`)
NOT IN (SELECT `int_key`, `int_nokey`
FROM CC)
ORDER BY CC.pk,D.pk;
pk int_nokey int_key varchar_key pk int_nokey int_key varchar_key
2 NULL 4 c 4 6 7 c
3 NULL 0 z 11 9 8 z
4 3 133 e 20 1 4 e
5 5 1 h 12 8 8 h
6 NULL 5 e 20 1 4 e
7 1 8 j 16 1 5 j
8 0 6 e 20 1 4 e
SELECT *
FROM CC STRAIGHT_JOIN D
ON D.`varchar_key`=CC.`varchar_key`
WHERE (D.`pk`, CC.`int_nokey`)
NOT IN (SELECT `int_key`, `int_nokey`
FROM CC)
ORDER BY CC.pk,D.pk;
pk int_nokey int_key varchar_key pk int_nokey int_key varchar_key
4 3 133 e 20 1 4 e
5 5 1 h 12 8 8 h
7 1 8 j 16 1 5 j
8 0 6 e 20 1 4 e
[9 Oct 2009 12:08]
Jørgen Løland
Run with the dataset above, jcl=0 behaves like 5.1 while jcl=1 behaves the other way around:
JCL=0:
======
SELECT *
FROM CC JOIN D
ON D.`varchar_key`=CC.`varchar_key`
WHERE (D.`pk`, CC.`int_nokey`)
NOT IN (SELECT `int_key`, `int_nokey`
FROM CC)
ORDER BY CC.pk,D.pk;
pk int_nokey int_key varchar_key pk varchar_key
2 NULL 4 c 4 c
3 NULL 0 z 11 z
4 3 133 e 20 e
5 5 1 h 12 h
6 NULL 5 e 20 e
7 1 8 j 16 j
8 0 6 e 20 e
SELECT *
FROM CC STRAIGHT_JOIN D
ON D.`varchar_key`=CC.`varchar_key`
WHERE (D.`pk`, CC.`int_nokey`)
NOT IN (SELECT `int_key`, `int_nokey`
FROM CC)
ORDER BY CC.pk,D.pk;
pk int_nokey int_key varchar_key pk varchar_key
4 3 133 e 20 e
5 5 1 h 12 h
7 1 8 j 16 j
8 0 6 e 20 e
JCL=1:
======
SELECT *
FROM CC JOIN D
ON D.`varchar_key`=CC.`varchar_key`
WHERE (D.`pk`, CC.`int_nokey`)
NOT IN (SELECT `int_key`, `int_nokey`
FROM CC)
ORDER BY CC.pk,D.pk;
pk int_nokey int_key varchar_key pk varchar_key
4 3 133 e 20 e
5 5 1 h 12 h
7 1 8 j 16 j
8 0 6 e 20 e
SELECT *
FROM CC STRAIGHT_JOIN D
ON D.`varchar_key`=CC.`varchar_key`
WHERE (D.`pk`, CC.`int_nokey`)
NOT IN (SELECT `int_key`, `int_nokey`
FROM CC)
ORDER BY CC.pk,D.pk;
pk int_nokey int_key varchar_key pk varchar_key
2 NULL 4 c 4 c
3 NULL 0 z 11 z
4 3 133 e 20 e
5 5 1 h 12 h
6 NULL 5 e 20 e
7 1 8 j 16 j
8 0 6 e 20 e
[9 Oct 2009 12:12]
Jørgen Løland
Analysis: SELECT CC.pk, D.pk, CC.int_nokey FROM CC STRAIGHT_JOIN D ON D.`varchar_key`=CC.`varchar_key` ORDER BY CC.pk,D.pk; pk pk int_nokey 2 4 NULL 3 11 NULL 4 20 3 5 12 5 6 20 NULL 7 16 1 8 20 0 SELECT `int_key`, `int_nokey` FROM CC; int_key int_nokey 0 NULL 133 3 1 5 5 NULL 8 1 6 0 51 210 4 8 7 7 4 NULL For [<x> NOT IN <y>], where either x or y is NULL, the row shall be filtered away as per the standard. Hence, the correct result should be: pk int_nokey int_key varchar_key pk varchar_key 4 3 133 e 20 e 5 5 1 h 12 h 7 1 8 j 16 j 8 0 6 e 20 e
[9 Oct 2009 13:44]
Jørgen Løland
Requesting new triage since this bug is also repeatable on 5.1
[19 Oct 2009 11:20]
Jørgen Løland
SQL 2003 has these relevant sections, defining why the rows with NULL above should not be in the result set: WHERE clause: ------------- General Rules 1) Let T be the result of the preceding <from clause>. 2) The <search condition> is applied to each row of T. The result of the <where clause> is a table of those rows of T for which the result of the <search condition> is True. SEARCH condition: ----------------- Specify a condition that is True, False, or Unknown, depending on the value of a <boolean value expression>. BOOLEAN VALUE expression: ------------------------- * NOT (Unknown) is Unknown.
[20 Oct 2009 9:07]
Jørgen Løland
BUG#48177 is probably the root cause for this bug.
[5 Nov 2009 10:32]
Jørgen Løland
Confirmed fixed by BUG#48177.
