MySQL Bugs: #6489: Named columns (USING) join returns redundant columns

Bug #6489	Named columns (USING) join returns redundant columns
Submitted:	7 Nov 2004 23:18	Modified:	17 Sep 2005 14:19
Reporter:	Troels Arvin	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	4.1.7	OS:	Linux (Linux)
Assigned to:	Timour Katchaounov	CPU Architecture:	Any

Description:
MySQL supports "USING" as one of the ways to qualify a join ("named columns join" in SQL-standard language):
"SELECT ... FROM tab1 JOIN tab2 USING (somecol)"
The nice thing about USING is that it can make some queries more terse.

According to [1], columns joined by way of USING should only appear once in the result set. That's also how I interpret SQL:2003's somewhat convoluted specification. And it's also how Oracle and PostgreSQL behave.

However, MySQL includes the common column from both of the joined tables into the result set. This takes away some of the usefulness of the USING join variant because one still has to explicitly qualify the resulting column(s) when referring to it, e.g. in WHERE expressions.

References:
1. http://books.elsevier.com/mk/default.asp?isbn=1558604561

How to repeat:
Setup - two tables:

mysql> SELECT * FROM tab1;
+---------+
| somecol |
+---------+
| a       |
| b       |
| c       |
| d       |
+---------+
                                                                                
mysql> SELECT * FROM tab2;
+---------+
| somecol |
+---------+
| b       |
| c       |
| d       |
| e       |
+---------+

Now, a quick way to join them:

mysql> SELECT * FROM tab1 JOIN tab2 USING (somecol);
+---------+---------+
| somecol | somecol |
+---------+---------+
| b       | b       |
| c       | c       |
| d       | d       |
+---------+---------+

The expected output:
+---------+
| somecol |
+---------+
| b       |
| c       |
| d       |
+---------+

The following query also fails in MySQL:
mysql> SELECT somecol FROM tab1 JOIN tab2 USING(somecol) WHERE somecol='b';
ERROR 1052 (23000): Column 'somecol' in field list is ambiguous

I tested this on 5.0.0-alpha-max-debug and found the same error,
specifically using INNER JOIN.

Per the SQL Standard, the result of an INNER JOIN should have
a column list that consists of all specified columns of the joined 
tables, except that the common columns are coalesced into a 
single column that has the same attributes (e.g. name, data type) 
as the columns from which it is derived. The result is that the join 
result set should contain only one column for each set of common
columns. This is not happening; instead, the INNER JOIN result
contains duplicate columns for each set of common columns.

How to repeat:
mysql> CREATE TABLE GROUPS2 (EMPNUM SMALLINT, GRP INT);
Query OK, 0 rows affected (0.12 sec)

mysql> CREATE TABLE NAMES2 (EMPNUM INT, NAME CHAR(5));
Query OK, 0 rows affected (0.08 sec)

mysql> insert into GROUPS2 VALUES(1,1);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO NAMES2 VALUES(1,'BOB');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM NAMES2 INNER JOIN GROUPS2 USING (EMPNUM);
+---------+-------+--------+------+
| EMPNUM | NAME | EMPNUM | GRP  |
+---------+-------+--------+------+
|      1     | BOB     |      1     |    1  |
+---------+-------+--------+------+
-- This is the incorrect result. The common column EMPNUM should
be coalesced into a single EMPNUM column, so the result should be:
+--------+--------+-----+
| EMPNUM | NAME |  GRP  |
+--------+--------+-----+
|      1    | BOB     |     1  |
+--------+--------+-----+

One other item relevant to this bug: The following
CREATE TABLE T1 (F1 BIGINT, T1F2 BIGINT, F3 BIGINT);
CREATE TABLE T2 (F1 BIGINT, T2F2 BIGINT, F3 BIGINT);

SELECT * FROM T1 INNER JOIN T2 USING(F3,F1);
-- should return a result set with columns in this order:
F1 F3 T1F2 T2F2

The result column order must be:
Part1: The coalesced columns of the <join column list>,
         sorted according to the column order of the first table
         (in this case, T1).
Part2: The other columns of the first table, in the order in
          which they were defined in CREATE TABLE.
Part3: The other columns of the second table, in the order in
          which they were defined in CREATE TABLE.

MySQL does not do this, though; currently we return:
F1 T1F2 F3 F1 T2F2 F3

So even after we correct the duplicate column bug, we'll still 
have a bug regarding the proper column order, which should
be fixed at the same time.

As I read SQL:2003 (section 7.7), elimination (coalescing) of common columns only pertains to natural joins and USING-joins. I don't see anything indicating that basic INNER joins are subject to elimination of common columns.

Mr Arvin is correct, the rules apply specifically for INNER and OUTER joins 
built with a USING clause. Here is a partial listing of the join syntax in 
SQL:2003:

7.7 <joined table>
<qualified join> ::=
<table reference> [ <join type> ] JOIN <table reference> <join specification>

<join specification> ::= <join condition> | <named columns join>

<named columns join> ::= USING <left paren> <join column list> <right paren>

<join type> ::= INNER | <outer join type> [ OUTER ]

Thus: an INNER or OUTER join with USING is a <qualified join> that includes a 
<join specification>. In SQL:2003 parlance, this is an example of a <join specification> that contains a <named columns join>. 

The SQL:2003 rules say:
7) If NATURAL is specified or if a <join specification> immediately containing a <named columns join> is specified, then:
...
b) If a <named columns join> is specified, then every <column name> in the <join column list> shall be equivalent to the <field name> of exactly one field of RT1 and the <field name> of exactly one field of RT2. Let common column name be the name of such a column. Let corresponding join columns refer to the columns identified in the <join column list>.
c) Let C1 and C2 be a pair of corresponding join columns of RT1 and RT2, respectively. C1 and C2 shall be comparable. ...
d) If there is at least one corresponding join column, then let SLCC be a <select list> of <derived column>s of the form COALESCE ( TA.C, TB.C ) AS C
for every column C that is a corresponding join column, taken in order of their ordinal positions in RT1.
...

In short, an INNER or OUTER join with USING should have a column list that consists of all specified columns of the joined tables, except that the common columns are coalesced into a single column.

An earlier rule says:
6) If neither NATURAL is specified nor a <join specification> immediately containing a <named columns join> is specified, then the descriptors of the columns of the result of the <joined table> are the same as the descriptors of the columns of CP, with the possible exception of the nullability characteristics of the columns.

In other words, an INNER or OUTER join with ON should return a complete set of columns (no coalescing) from the joined tables.

In view of what the standard prescribes for the order of columns, this order is also violated
when executing NATURAL join. Currently NATURAL join returns all columns from the left table
in their table order, and all remaining non-duplicate columns from the right table, also
according to their table order.

Timour Katchaounov: Exactly how does that violate the standard? Isn't it OK that duplicate columns are removed but non-duplicate columns are not? By the way: See also bug #6495.

Troels Arvin: The problem with NATURAL join is that currently it concatenates the
columns of the first table with all remaining non-duplicate columns of the second
table. The resulting columns are ok, but they are in the wrong order. According
to Trudy's comment above, the column order must be:
<common columns><non-dupl. left table cols><non-dupl. right table cols>
while now it is:
<left table columns><non-dupl. right table columns>.
Basically the rule for NATURAL join and USING is the same, so what I mean above
is that currenly they both share the same sub-problem - that of column order.

Thanks, Timour. Now I see what you mean, and I believe you are right.

ChangeSet
  1.2176 05/02/17 11:26:00 timour@mysql.com +16 -0
  Fixes for BUG#6489 and BUG#6136.
  
  This fix makes the processing of the USING clause consistent with that
  of the NATURAL JOIN clause. In addition the fix also makes NATURAL/USING OUTER JOINS
  consistent with their non-outerjon counterparts.

Notice that this fix doesn't change column order as prescribed by the standard, becuase
the standard order is inconsistent with the current NATURAL/USING join semantics. The
reason is that in 4.1 we perform such joins pairwise (and we disregard braces), and
correspondingly we remove duplicate columns only among pairs of relations. Thus for
more than 2-way join we may still have duplicate column names in the result, so the
standard reordering rule doesn't make sense.

Correcting this bug in v. 5.0 requires a different approach which is part
of a bigger ongoing task for ANSI compliant processing of
NATURAL/UNSING joins. In order to keep both versions (4.1 and 5.0)
in sync, the patch for this bug will be pushed once we are ready with
the related task for v. 5.0.

For MySQL 5.0, the bug has been fixed; fix for 4.1 remains
to be implemented.

It's fixed in 5.0 only (the fix is too big to backport)

Noted in 5.0.12 changelog.