Bug #73332 ambiguous column error in order by query
Submitted: 19 Jul 2014 18:58 Modified: 6 Aug 2014 15:44
Reporter: Ryan Brothers Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Parser Severity:S3 (Non-critical)
Version:5.6.19 OS:Any
Assigned to: CPU Architecture:Any

[19 Jul 2014 18:58] Ryan Brothers
Description:
The following query returns an error "ERROR 1052 (23000): Column 'host' in order clause is ambiguous":

SELECT 1 AS host
FROM mysql.user
JOIN mysql.db ON (db.host = user.host)
ORDER BY LENGTH(host)

If I change it to "ORDER BY host", it runs successfully.

Is there a reason why the above query is an error?  I realize it's related to ordering by a derived field plus using 2 tables that both have a field name called host.

Since it orders by the derived field without the function call, should it use the derived field in the function call too?

Thanks for your help.

How to repeat:
See above example query.
[20 Jul 2014 11:21] Peter Laursen
Further observations:

1)
SELECT 1 AS HOST
FROM mysql.user
JOIN mysql.db ON (db.host = user.host)
ORDER BY LENGTH(`host`);
-- same error

2)
SELECT 1 AS HOST
FROM mysql.user
JOIN mysql.db ON (db.host = user.host)
ORDER BY LENGTH('host')
-- success

3)
SELECT 1 AS USER
FROM mysql.user
JOIN mysql.db ON (db.host = user.host)
-- same error

4)
SELECT 1 AS db
FROM mysql.user
JOIN mysql.db ON (db.host = user.host)
ORDER BY LENGTH(db);
-- success

Now, documentation states that an alias may be quoted as an identifier or as a string.  Using string-quotes lets the server recognize that the ORDER BY references the alias and not a column (because string-quotes excludes the possiblity of an identifier). 

But even I think it is very inconsitent that "ORDER BY LENGTH(host)" and "ORDER BY host" do not behave identically.

Also I find it extremely weird that 3 and 4) in my examples above return different. "user" and "host" are not reserved words according to http://dev.mysql.com/doc/refman/5.6/en/reserved-words.html - but they still seem to trigger another behaviour in such queries that "non-keyword"-words (such as "db" in case).

I have checked with all versions since 5.0.96 that it is the same in them all.

-- Peter
-- not a MySQL/Oracle person
[20 Jul 2014 11:23] Peter Laursen
My case 3) got truncated.  Here is the right one.

SELECT 1 AS USER
FROM mysql.user
JOIN mysql.db ON (db.host = user.host)
ORDER BY LENGTH(USER);
[20 Jul 2014 11:39] Peter Laursen
Adn finally - using an nqoted reserved word retuns a (1064) syntax error as expected.

Like here

SELECT 1 AS JOIN
FROM mysql.user
JOIN mysql.db ON (db.host = user.host)
ORDER BY LENGTH(JOIN);

and 

SELECT 1 AS JOIN
FROM mysql.user
JOIN mysql.db ON (db.host = user.host)
ORDER BY JOIN
[21 Jul 2014 10:57] Sveta Smirnova
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

When you run "ORDER BY host" alias is used and when you run "ORDER BY LENGTH(host)" MySQL tries to use either db.host or user.host which is ambiguous. You can confirm this if use one of columns which exists only in one of tables and EXPLAIN EXTENDED:

mysql> explain extended SELECT 1 AS password FROM mysql.user JOIN mysql.db ON (db.host = user.host) ORDER BY length(password);
+----+-------------+-------+-------+---------------+---------+---------+---------------+------+----------+----------------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref           | rows | filtered | Extra                                        |
+----+-------------+-------+-------+---------------+---------+---------+---------------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | db    | index | PRIMARY       | PRIMARY | 420     | NULL          |    2 |   100.00 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | user  | ref   | PRIMARY       | PRIMARY | 180     | mysql.db.Host |    1 |   100.00 | NULL                                         |
+----+-------------+-------+-------+---------------+---------+---------+---------------+------+----------+----------------------------------------------+
2 rows in set, 1 warning (0.00 sec)

Note (Code 1003): /* select#1 */ select 1 AS `password` from `mysql`.`user` join `mysql`.`db` where (`mysql`.`user`.`Host` = `mysql`.`db`.`Host`) order by length(`mysql`.`user`.`Password`)

Compare with:

mysql> explain extended SELECT 1 AS password FROM mysql.user JOIN mysql.db ON (db.host = user.host) ORDER BY password;
+----+-------------+-------+-------+---------------+---------+---------+---------------+------+----------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref           | rows | filtered | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+---------------+------+----------+-------------+
|  1 | SIMPLE      | db    | index | PRIMARY       | PRIMARY | 420     | NULL          |    2 |   100.00 | Using index |
|  1 | SIMPLE      | user  | ref   | PRIMARY       | PRIMARY | 180     | mysql.db.Host |    1 |   100.00 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+---------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

Note (Code 1003): /* select#1 */ select 1 AS `password` from `mysql`.`user` join `mysql`.`db` where (`mysql`.`user`.`Host` = `mysql`.`db`.`Host`) order by `password`
[21 Jul 2014 11:16] Peter Laursen
The conclusion here illustrates IMO unseriously bug reports are sometimes treated by MySQL/Oracle, There is nothing in Sveta's last post that we have not already noticed.

It is easy to *see* ("*aftersee*") the behavior, obviously. But it is not possible to *foresee* it. Is it completely random? Does some rules apply? Where is the parser behavior with such naming conflict documented? and why does it matter if the alias is wrapped inside a function or not?

People should avoid such alias, of course.  But still it reveals the primitivity of the MySQL parser in some respects that it had from early days and that have not been improved over 10+ years. 

IMO this should be verified so that it will be considered in a future parser rewrite aiming at achieving more consistency.
[21 Jul 2014 11:38] Ryan Brothers
Thanks, I see it's because both the db and user tables have a field called host.  Why does MySQL use the alias in "ORDER BY host", but use a field in "ORDER BY LENGTH(host)"?  Why is there different behavior there?  Should it be consistent in those 2 cases?

How could I change my query to order by the length of host, which is a derived field in my query?  I tried "ORDER BY `host`", but it still gives the ambiguous column error.
[21 Jul 2014 11:41] Ryan Brothers
I mean "ORDER BY LENGTH(`host`)" still gives the ambiguous column error.
[21 Jul 2014 11:45] Sveta Smirnova
Peter,

this is not for MySQL who decided how to perform name resolution in this case. See http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt, particularly "6.4  <column reference>" and "6.5 <set function specification>" parts, pay attention for "Outer reference".
[21 Jul 2014 11:47] Peter Laursen
@Ryan .

"ORDER BY LENGTH('host')"(using apostrophes and not backquotes)does not! 

`host` is still ambigious because both identifiers for database objects and aliases may be `backquoted`. But an identifier cannot be 'singlequoted'.
[21 Jul 2014 12:13] Ryan Brothers
Peter,

From my testing, when I do "ORDER BY LENGTH('host')", it's ordering by the length of the string 'host', which is a constant 4.

I just tried the below test which does not give the expected results.  The results are coming back in the incorrect order.

CREATE TABLE test1 (
  host varchar(10) default NULL
);

INSERT INTO test1 VALUES ('aaaa'), ('bbb'), ('cc'), ('d');

-- returns incorrect order
SELECT *
FROM test1
ORDER BY length('host');

-- returns correct order
SELECT *
FROM test1
ORDER BY length(host);
[22 Jul 2014 13:16] Peter Laursen
OK! I did not try with data!
[22 Jul 2014 13:19] Peter Laursen
@Ryan .. if you want more comments/considerations from MySQL supporters,  I think you will have to change status from 'Not a Bug' to 'active'.
[6 Aug 2014 11:57] Hartmut Holzgraefe
The real problem seems to be that in the ORDER BY syntax:

  ORDER BY {col_name | expr | position} [ASC | DESC], ...

Later on in the "SELECT Syntax" page it says:

  Columns selected for output can be referred to 
  in ORDER BY and GROUP BY clauses using column names, 
  column aliases, or column positions.

So the "col_name" in the ORDER BY syntax line can
be either a real column name or an alias (and later
on it is defined that aliases take precedence)

But when using an "expr" the general syntax rules
for expressions apply, and these do not know about
column aliases. 

To make the original query work as expected the
whole expression parser would need to be modified
to be context sensitive whenever seeing something
that looks like a column name, and only if the
expression is found in a ORDER BY, GROUP BY or
HAVING context it may also consider select column
aliases as alternatives to consider ...

This might be possible, but will probably only 
lead to other ambiguity side effects, and I 
wouldn't be surprised if that behavior was in
violation with the standard, too.

All in all I'd say this is a documentation issue
as the manual doesn't state clearly enough that
only direct column alias names are supported
in the group by list, but that their use in
expressions is not ...
[6 Aug 2014 13:32] Ryan Brothers
Hartmut - thanks for your detailed reply.  It looks like it is possible to reference an alias in an ORDER BY expr, but the results aren't correct:

SELECT RAND() AS abc
FROM mysql.user
ORDER BY ABS(abc)

This query is a valid query, but the data isn't being sorted correctly.  Do you know what it's ordering by?
[6 Aug 2014 13:40] Peter Laursen
Good question really!
[6 Aug 2014 15:44] Ryan Brothers
From further testing, the above query with RAND() seems to be a separate, unrelated issue.

To summarize what I'm seeing:

1) SELECT 'abc' AS user FROM mysql.user ORDER BY user;
2) SELECT 'abc' AS test1 FROM mysql.user ORDER BY LENGTH(test1);
3) SELECT 'abc' AS user FROM mysql.user ORDER BY LENGTH(user);

Query 1 sorts by the alias user.
Query 2 sorts by the alias test1.
Query 3 sorts by the column name user.
[6 Aug 2014 16:22] Bear Limvere
This behaviour is very inconsistent and makes using the sort function much harder that it should be. Here is my example:

CREATE TABLE test (n INT(11), val VARCHAR(32));

INSERT INTO test
SELECT n, CONCAT('a', ROUND(n * RAND(), 4)) AS rand_n
FROM common_schema.numbers;

SELECT
	n,
	val AS alias,
	n AS val
FROM test
ORDER BY UPPER(val)
LIMIT 5;
+------+---------+------+
| n    | alias   | val  |
+------+---------+------+
|    0 | a0.0000 |    0 |
|  480 | a0.0091 |  480 |
|  250 | a0.1293 |  250 |
|    1 | a0.2261 |    1 |
|   11 | a0.2617 |   11 |
+------+---------+------+
5 rows in set (0.01 sec)

SELECT
	n,
	val AS alias,
	n AS val
FROM test
ORDER BY val
LIMIT 5;
+------+---------+------+
| n    | alias   | val  |
+------+---------+------+
|    0 | a0.0000 |    0 |
|    1 | a0.2261 |    1 |
|    2 | a0.5227 |    2 |
|    3 | a1.8849 |    3 |
|    4 | a1.4299 |    4 |
+------+---------+------+