Bug #27963 ORDER BY ignored when SELECTing from INFORMATION_SCHEMA
Submitted: 19 Apr 2007 20:48 Modified: 2 May 2007 13:15
Reporter: Jay Pipes Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Information schema Severity:S2 (Serious)
Version:5.0.42-debug-log, 5.1 OS:Linux (2.6.17-11)
Assigned to: Georgi Kodinov CPU Architecture:Any
Tags: I_S, regression

[19 Apr 2007 20:48] Jay Pipes
Description:
When SELECTing from INFORMATION_SCHEMA tables, ORDER BY clause is no longer honoured.

How to repeat:
First of all, test this on a server with some databases in it, not on a blank test database...

Then, run something like the following:

SELECT
  t.TABLE_SCHEMA
 , t.TABLE_NAME
 , s.INDEX_NAME
 , s.COLUMN_NAME
 , s.SEQ_IN_INDEX
 , (
   SELECT MAX(SEQ_IN_INDEX)
   FROM INFORMATION_SCHEMA.STATISTICS s2
   WHERE s.TABLE_SCHEMA = s2.TABLE_SCHEMA
   AND s.TABLE_NAME = s2.TABLE_NAME
   AND s.INDEX_NAME = s2.INDEX_NAME
  ) AS "COLS_IN_INDEX"
 , s.CARDINALITY AS "CARD"
 , t.TABLE_ROWS AS "ROWS"
 , ROUND(((s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) * 100), 2) AS "SEL %"
FROM INFORMATION_SCHEMA.STATISTICS s
 INNER JOIN INFORMATION_SCHEMA.TABLES t
  ON s.TABLE_SCHEMA = t.TABLE_SCHEMA
  AND s.TABLE_NAME = t.TABLE_NAME
WHERE t.TABLE_SCHEMA != 'mysql'
AND t.TABLE_ROWS > 10
AND s.CARDINALITY IS NOT NULL
AND (s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) < 1.00
ORDER BY "SEL %", TABLE_SCHEMA, TABLE_NAME
LIMIT 10;

Or this, which should be the same result but uses a derived table:

SELECT
  t.TABLE_SCHEMA
 , t.TABLE_NAME
 , s.INDEX_NAME
 , s.COLUMN_NAME
 , s.SEQ_IN_INDEX
 , s2.max_columns AS "COLS_IN_INDEX"
 , s.CARDINALITY AS "CARD"
 , t.TABLE_ROWS AS "ROWS"
 , ROUND(((s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) * 100), 2) AS "SEL %"
FROM INFORMATION_SCHEMA.STATISTICS s
 INNER JOIN INFORMATION_SCHEMA.TABLES t
  ON s.TABLE_SCHEMA = t.TABLE_SCHEMA
  AND s.TABLE_NAME = t.TABLE_NAME
 INNER JOIN (
  SELECT 
     TABLE_SCHEMA
   , TABLE_NAME
   , INDEX_NAME
   , MAX(SEQ_IN_INDEX) AS max_columns
  FROM INFORMATION_SCHEMA.STATISTICS
  WHERE TABLE_SCHEMA != 'mysql'
  GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
 ) AS s2
 ON s.TABLE_SCHEMA = s2.TABLE_SCHEMA
 AND s.TABLE_NAME = s2.TABLE_NAME
 AND s.INDEX_NAME = s2.INDEX_NAME
WHERE t.TABLE_SCHEMA != 'mysql'
AND t.TABLE_ROWS > 10
AND s.CARDINALITY IS NOT NULL
AND (s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) < 1.00
ORDER BY "SEL %", TABLE_SCHEMA, TABLE_NAME 
LIMIT 10;

Running both statements on my local server shows very different output AND EXPLAIN outputs, with WILDLY different performance (check out the execution times in the attached results file)

Suggested fix:
AFAIK, this bug appeared between 5.0-24 and 5.0.42, so run some tests on the code changes to find where the regression entered into the code base.
[19 Apr 2007 20:49] Jay Pipes
Results file from 5.0.42 showing no oRDER BY honoured

Attachment: bug27963.txt (text/plain), 10.80 KiB.

[20 Apr 2007 8:06] Sveta Smirnova
Thank you for the report.

Please provide dump of your information_schema database. We can examine if this affects only Information Schema or every similar query.
[20 Apr 2007 22:29] Sveta Smirnova
test case

Attachment: bug27963.test (application/octet-stream, text), 61.53 KiB.

[20 Apr 2007 22:36] Sveta Smirnova
Thank you for the report.

Verified as described using attached test case.

Introduced in 5.0.33. Version 5.0.27 is not affected.
[2 May 2007 13:15] Georgi Kodinov
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

The core of the problem described here can be demonstrated by the following commands :
CREATE TABLE t1 (a INT); INSERT INTO t1 VALUES (2),(1);
SELECT a as "SEL" FROM t1 ORDER BY "SEL";
SET SQL_MODE="ANSI_QUOTES";
SELECT a as "SEL" FROM t1 ORDER BY "SEL";

The docs say (about quotes) :
 - "The identifier quote character is the backtick (“`”):"
 - "If the ANSI_QUOTES SQL mode is enabled, it is also allowable to quote identifiers within double quotes"
 - "A string is a sequence of bytes or characters, enclosed within either single quote (“'”) or double quote (“"”) characters."
 - "If the ANSI_QUOTES SQL mode is enabled, string literals can be quoted only within single quotes because a string quoted within double quotes is interpreted as an identifier."

Taking the above into account it follows that the "SEL" in the above query is a string in non-ANSI mode (the default) and an identifier in ANSI mode.

There was a bug in MySQL (bug #14019) that it was not checking the type of the reference and hence treating strings (and others) as column references to the equally named columns. This was fixed in in 4.1.22/5.0.30/5.1.13