Bug #6489 | Named columns (USING) join returns redundant columns | ||
---|---|---|---|
Submitted: | 7 Nov 2004 23:18 | Modified: | 17 Sep 2005 14:19 |
Reporter: | Troels Arvin | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 4.1.7 | OS: | Linux (Linux) |
Assigned to: | Timour Katchaounov | CPU Architecture: | Any |
[7 Nov 2004 23:18]
Troels Arvin
[11 Nov 2004 16:38]
Trudy Pelzer
I tested this on 5.0.0-alpha-max-debug and found the same error, specifically using INNER JOIN. Per the SQL Standard, the result of an INNER JOIN should have a column list that consists of all specified columns of the joined tables, except that the common columns are coalesced into a single column that has the same attributes (e.g. name, data type) as the columns from which it is derived. The result is that the join result set should contain only one column for each set of common columns. This is not happening; instead, the INNER JOIN result contains duplicate columns for each set of common columns. How to repeat: mysql> CREATE TABLE GROUPS2 (EMPNUM SMALLINT, GRP INT); Query OK, 0 rows affected (0.12 sec) mysql> CREATE TABLE NAMES2 (EMPNUM INT, NAME CHAR(5)); Query OK, 0 rows affected (0.08 sec) mysql> insert into GROUPS2 VALUES(1,1); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO NAMES2 VALUES(1,'BOB'); Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM NAMES2 INNER JOIN GROUPS2 USING (EMPNUM); +---------+-------+--------+------+ | EMPNUM | NAME | EMPNUM | GRP | +---------+-------+--------+------+ | 1 | BOB | 1 | 1 | +---------+-------+--------+------+ -- This is the incorrect result. The common column EMPNUM should be coalesced into a single EMPNUM column, so the result should be: +--------+--------+-----+ | EMPNUM | NAME | GRP | +--------+--------+-----+ | 1 | BOB | 1 | +--------+--------+-----+
[11 Nov 2004 17:07]
Trudy Pelzer
One other item relevant to this bug: The following CREATE TABLE T1 (F1 BIGINT, T1F2 BIGINT, F3 BIGINT); CREATE TABLE T2 (F1 BIGINT, T2F2 BIGINT, F3 BIGINT); SELECT * FROM T1 INNER JOIN T2 USING(F3,F1); -- should return a result set with columns in this order: F1 F3 T1F2 T2F2 The result column order must be: Part1: The coalesced columns of the <join column list>, sorted according to the column order of the first table (in this case, T1). Part2: The other columns of the first table, in the order in which they were defined in CREATE TABLE. Part3: The other columns of the second table, in the order in which they were defined in CREATE TABLE. MySQL does not do this, though; currently we return: F1 T1F2 F3 F1 T2F2 F3 So even after we correct the duplicate column bug, we'll still have a bug regarding the proper column order, which should be fixed at the same time.
[11 Nov 2004 21:43]
Troels Arvin
As I read SQL:2003 (section 7.7), elimination (coalescing) of common columns only pertains to natural joins and USING-joins. I don't see anything indicating that basic INNER joins are subject to elimination of common columns.
[19 Nov 2004 15:35]
Trudy Pelzer
Mr Arvin is correct, the rules apply specifically for INNER and OUTER joins built with a USING clause. Here is a partial listing of the join syntax in SQL:2003: 7.7 <joined table> <qualified join> ::= <table reference> [ <join type> ] JOIN <table reference> <join specification> <join specification> ::= <join condition> | <named columns join> <named columns join> ::= USING <left paren> <join column list> <right paren> <join type> ::= INNER | <outer join type> [ OUTER ] Thus: an INNER or OUTER join with USING is a <qualified join> that includes a <join specification>. In SQL:2003 parlance, this is an example of a <join specification> that contains a <named columns join>. The SQL:2003 rules say: 7) If NATURAL is specified or if a <join specification> immediately containing a <named columns join> is specified, then: ... b) If a <named columns join> is specified, then every <column name> in the <join column list> shall be equivalent to the <field name> of exactly one field of RT1 and the <field name> of exactly one field of RT2. Let common column name be the name of such a column. Let corresponding join columns refer to the columns identified in the <join column list>. c) Let C1 and C2 be a pair of corresponding join columns of RT1 and RT2, respectively. C1 and C2 shall be comparable. ... d) If there is at least one corresponding join column, then let SLCC be a <select list> of <derived column>s of the form COALESCE ( TA.C, TB.C ) AS C for every column C that is a corresponding join column, taken in order of their ordinal positions in RT1. ... In short, an INNER or OUTER join with USING should have a column list that consists of all specified columns of the joined tables, except that the common columns are coalesced into a single column. An earlier rule says: 6) If neither NATURAL is specified nor a <join specification> immediately containing a <named columns join> is specified, then the descriptors of the columns of the result of the <joined table> are the same as the descriptors of the columns of CP, with the possible exception of the nullability characteristics of the columns. In other words, an INNER or OUTER join with ON should return a complete set of columns (no coalescing) from the joined tables.
[24 Jan 2005 8:06]
Timour Katchaounov
In view of what the standard prescribes for the order of columns, this order is also violated when executing NATURAL join. Currently NATURAL join returns all columns from the left table in their table order, and all remaining non-duplicate columns from the right table, also according to their table order.
[24 Jan 2005 8:21]
Troels Arvin
Timour Katchaounov: Exactly how does that violate the standard? Isn't it OK that duplicate columns are removed but non-duplicate columns are not? By the way: See also bug #6495.
[25 Jan 2005 7:14]
Timour Katchaounov
Troels Arvin: The problem with NATURAL join is that currently it concatenates the columns of the first table with all remaining non-duplicate columns of the second table. The resulting columns are ok, but they are in the wrong order. According to Trudy's comment above, the column order must be: <common columns><non-dupl. left table cols><non-dupl. right table cols> while now it is: <left table columns><non-dupl. right table columns>. Basically the rule for NATURAL join and USING is the same, so what I mean above is that currenly they both share the same sub-problem - that of column order.
[25 Jan 2005 9:04]
Troels Arvin
Thanks, Timour. Now I see what you mean, and I believe you are right.
[17 Feb 2005 9:37]
Timour Katchaounov
ChangeSet 1.2176 05/02/17 11:26:00 timour@mysql.com +16 -0 Fixes for BUG#6489 and BUG#6136. This fix makes the processing of the USING clause consistent with that of the NATURAL JOIN clause. In addition the fix also makes NATURAL/USING OUTER JOINS consistent with their non-outerjon counterparts. Notice that this fix doesn't change column order as prescribed by the standard, becuase the standard order is inconsistent with the current NATURAL/USING join semantics. The reason is that in 4.1 we perform such joins pairwise (and we disregard braces), and correspondingly we remove duplicate columns only among pairs of relations. Thus for more than 2-way join we may still have duplicate column names in the result, so the standard reordering rule doesn't make sense.
[14 Mar 2005 11:51]
Timour Katchaounov
Correcting this bug in v. 5.0 requires a different approach which is part of a bigger ongoing task for ANSI compliant processing of NATURAL/UNSING joins. In order to keep both versions (4.1 and 5.0) in sync, the patch for this bug will be pushed once we are ready with the related task for v. 5.0.
[29 Aug 2005 15:23]
Paul DuBois
For MySQL 5.0, the bug has been fixed; fix for 4.1 remains to be implemented.
[14 Sep 2005 8:53]
Sergei Golubchik
It's fixed in 5.0 only (the fix is too big to backport)
[17 Sep 2005 14:19]
Paul DuBois
Noted in 5.0.12 changelog.