Bug #20128 outer JOIN + GROUP BY: wrong result
Submitted: 29 May 2006 15:55 Modified: 30 Jun 2006 11:14
Reporter: Matthias Leich Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.0 OS:
Assigned to: Igor Babaev CPU Architecture:Any

[29 May 2006 15:55] Matthias Leich
Description:
Testcase derived from select.test:

CREATE TABLE t1 ( a INT NOT NULL, b INT NOT NULL);
INSERT INTO t1 VALUES (1,2);
CREATE TABLE t2 ( a INT NOT NULL, b INT NOT NULL );
INSERT INTO t2 VALUES ( 1,10), (1,11), (1,2);
# Where is the record with t2.b = 11 ?
SELECT t2.a, t2.b AS d FROM t2 LEFT JOIN
t1 ON t2.a = t1.a AND t2.b = t1.b GROUP BY t1.a, t1.b;
a	d
1	10
1	2
Comparable testcase using two statements:
CREATE TABLE t3 AS SELECT t2.a, t2.b FROM t2 LEFT JOIN
t1 ON t2.a = t1.a AND t2.b = t1.b;
SELECT a,b FROM t3;
a	b
1	10
1	11
1	2
SELECT a,b AS d FROM t3 GROUP BY a, b;
a	d
1	2
1	10
1	11

My environment:
   - PC - Intel Pentium M (x86-32Bit) - with Linux(SuSE 9.3)
   - MySQL compiled from source
         Version 5.0 ChangeSet@1.2109.1.2, 2006-05-05
         version 5.1 ChangeSet@1.2182, 2006-05-24

This is probably a duplicate of already known JOIN bugs,
but I assume my testcase is one of the most simple.

How to repeat:
Please use my attached testscript ml009.test
  copy it to mysql-test/t
  echo "Dummy" > r/ml009.result   # Produce a dummy file with 
                                                   # expected results
  ./mysql-test-run ml009
[29 May 2006 15:56] Matthias Leich
testcase

Attachment: ml009.test (application/test, text), 1.49 KiB.

[30 Jun 2006 3:56] Igor Babaev
As we can see the result set for the query:

SELECT t1.a, t1.b, t2.a, t2.b FROM t2 LEFT JOIN t1 ON t2.a=t1.a AND t2.b=t1.b

contains two groups for values of (t1.a,t1.b)

mysql> SELECT t1.a, t1.b, t2.a, t2.b FROM t2 LEFT JOIN t1 ON t2.a=t1.a AND t2.b=t1.b;

+------+------+---+----+
| a    | b    | a | b  |
+------+------+---+----+
| NULL | NULL | 1 | 10 |
| NULL | NULL | 1 | 11 |
|    1 |    2 | 1 |  2 |
+------+------+---+----+

That's why the following result is absolutely correct:

mysql> SELECT t1.a, t1.b, t2.a, t2.b FROM t2 LEFT JOIN t1 ON t2.a=t1.a AND t2.b=t1.b GROUP BY t1.a, t2.a;
+------+------+---+----+
| a    | b    | a | b  |
+------+------+---+----+
| NULL | NULL | 1 | 10 |
|    1 |    2 | 1 |  2 |
+------+------+---+----+

So I don't see any bug here.
[30 Jun 2006 11:14] Matthias Leich
Igor is right, it is indeed no bug and my "comparable testcase using 
two statements" does not fit.
The statement is:
   SELECT t2.a, t2.b AS d FROM t2 LEFT JOIN
   t1 ON t2.a = t1.a AND t2.b = t1.b GROUP BY t1.a, t1.b;
The important point is, that the columns within the select column list
are neither embedded in aggregat functions like SUM nor columns of the
group by column list. BTW: This is not allowed in standard SQL.
The manual says about this situation:
2.10.3. GROUP BY and HAVING with Hidden Fields
MySQL extends the use of GROUP BY so that you can use non-aggregated
columns or calculations in the SELECT list that do not appear in the 
GROUP BY clause.  ...
Do not use this feature if the columns you omit from the GROUP BY part
are not constant in the group. The server is free to return any value
from the group, so the results are indeterminate unless all values 
are the same.
   SELECT t1.a, t1.b, t2.a, t2.b FROM t2 LEFT JOIN t1 ON t2.a=t1.a AND
   t2.b=t1.b;
   +------+------+---+----+
   | a    | b    | a | b  |
   +------+------+---+----+
   | NULL | NULL | 1 | 10 |
   | NULL | NULL | 1 | 11 |
   |    1 |    2 | 1 |  2 |
   +------+------+---+----+
   We have two t1.a, t1.b groups.
      NULL NULL
         1    2
   The server is must return exact one row per group and
   he is allowed to return any value from the group.
   This means for
    SELECT t2.a, t2.b AS d FROM t2 LEFT JOIN
    t1 ON t2.a = t1.a AND t2.b = t1.b GROUP BY t1.a, t1.b;
   +---+----+
   | a | b  |
   +---+----+
   | 1 | 10 |
   | 1 |  2 |
   +---+----+
   and also
   +---+----+
   | a | b  |
   +---+----+
   | 1 | 11 |
   | 1 |  2 |
   +---+----+
   are correct.