Bug #32400 Complex SELECT query returns correct result only on some occasions
Submitted: 14 Nov 2007 22:49 Modified: 14 Jan 2008 19:24
Reporter: huji huji Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.0.41/5.0BK/5.1BK OS:Any
Assigned to: Georgi Kodinov CPU Architecture:Any
Tags: bad result, complex, nested, SELECT

[14 Nov 2007 22:49] huji huji
Description:
The structure of the table I quried is like this (table name is "revision"):

Field           Type                 Null     Key     Default     Extra    
 --------------  -------------------  -------  ------  ----------  -------- 
...
 rev_page        int(8) unsigned      NO               0                    
...
 rev_timestamp   varchar(14)          NO                                    
...

For each rev_page, there are several records, which differ in their rev_timestamp. I could simply get the minimum value of rev_timestamp for each rev_page like this:

    select rev_page as pg1, min(rev_timestamp) as m1 from revision
    group by pg1

Then, I tried to get the "second minimum" value of rev_timestamp for each rev_page, like this:

   select rev_page as pg2, min(rev_timestamp) as m2 from revision
   where rev_timestamp >
   (select min(rev_timestamp) from revision where rev_page = pg2)
   group by pg2

This is where the bug appeared. The query returns zero records on most occasions (and it only takes part of a second for it to execute, and it never gets an error message). However, if you re-execute the query over and over again, on some occasions it takes a few seconds, and returns correct results like this:

 pg2     m2             
 ------  -------------- 
 0       20070430155538 
 1       20060113175304 
 2       20060113175851 
 3       20060113175520 
 4       20060113175919 
 5       20060113180021 
 6       20060113180645 
 7       20060113203013 
...

How to repeat:
Create a table like this:

create table temp(
rev_page int(8) unsigned,
rev_timestamp varchar(14)
);

Then import the data from the file I've uploaded to the following link:

http://pastebin.com/pastebin.php?dl=f168cd947

It is 33KB in size, and contains 1939 lines of data. 

After importing this into the temp table, create an index:

create index temp_index
on temp(rev_timestamp);

finally, try this minimal query:

select rev_page as pg2, min(rev_timestamp) as m2 from temp
where rev_timestamp > (select min(rev_timestamp) from temp where rev_page = pg2)
and rev_page < 4
group by pg2
order by pg2 asc;

Expected output is as follows:

 pg2     m2             
 ------  -------------- 
 0       20070430155538 
 1       20060113175304 
 2       20060113175851 
 3       20060730182601 

If you get this:

 pg2     m2    
 ------  ----- 

 0 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms] 

then you are seeing what I'm reporting as a bug here.
[15 Nov 2007 2:46] MySQL Verification Team
Thank you for the bug report. Server 5.0 and 5.1 affected, server 4.1 gives
expected result:

[miguel@skybr 5.1]$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.23-rc-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create index temp_index
    -> on temp(rev_timestamp);
Query OK, 1939 rows affected (0.03 sec)
Records: 1939  Duplicates: 0  Warnings: 0

mysql> select rev_page as pg2, min(rev_timestamp) as m2 from temp
    -> where rev_timestamp > (select min(rev_timestamp) from temp where rev_page = pg2)
    -> and rev_page < 4
    -> group by pg2
    -> order by pg2 asc;
Empty set (6.22 sec)

mysql> 

[miguel@skybr 4.1]$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2 to server version: 4.1.24-debug

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create index temp_index
    -> on temp(rev_timestamp);
Query OK, 1939 rows affected (0.01 sec)
Records: 1939  Duplicates: 0  Warnings: 0

mysql> select rev_page as pg2, min(rev_timestamp) as m2 from temp
    -> where rev_timestamp > (select min(rev_timestamp) from temp where rev_page = pg2)
    -> and rev_page < 4
    -> group by pg2
    -> order by pg2 asc;
+------+----------------+
| pg2  | m2             |
+------+----------------+
|    0 | 20070430155538 |
|    1 | 20060113175304 |
|    2 | 20060113175851 |
|    3 | 20060730182601 |
+------+----------------+
4 rows in set (3.07 sec)

mysql>
[15 Nov 2007 2:50] MySQL Verification Team
Data file to populate table

Attachment: bug32400.zip (application/zip, text), 8.28 KiB.

[15 Nov 2007 2:53] MySQL Verification Team
create table temp(
rev_page int(8) unsigned,
rev_timestamp varchar(14)
);

load data infile 'f168cd947.txt' into table temp;
create index temp_index on temp(rev_timestamp);

select rev_page as pg2, min(rev_timestamp) as m2 from temp
where rev_timestamp > (select min(rev_timestamp) from temp where rev_page = pg2)
and rev_page < 4
group by pg2
order by pg2 asc;
[20 Nov 2007 17:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/38157

ChangeSet@1.2575, 2007-11-20 19:18:21+02:00, gkodinov@magare.gmz +11 -0
  Bug #32400: Complex SELECT query returns correct result 
   only on some occasions
  
  Referencing an element from the SELECT list in a WHERE 
  clause is not permitted. The namespace of the WHERE
  clause is the table columns only. This was not enforced
  correctly when resolving outer references in sub-queries.
  
  Fixed by not allowing references to aliases in a 
  sub-query in WHERE.
[20 Nov 2007 20:02] huji huji
Thanks for your reply here.

If I don't get things wrong here, the solution for this problem has been not to allow running a query like what I posted above. In this case, I would like to know if there is an alternative way for getting the "second minimum" value, like what I described above.

Please advise.
[22 Nov 2007 16:04] Georgi Kodinov
Hi,

What's prohibited is the use of aliases in WHERE. Here's how to rewrite your query for the second minimum without aliases in WHERE:
select rev_page pg2, min(rev_timestamp) as m2 from temp temp_outer
where rev_timestamp > (select min(rev_timestamp) from temp where rev_page = temp_outer.rev_page)
and rev_page < 4
group by pg2
order by pg2 asc;
[22 Nov 2007 19:58] huji huji
Thank you Georgi, but the command you offered here returns "nothing" again :( Use the same temp table provided above. I think even preventing usage of aliases in WHERE hasn't solved the issue completely.
[23 Nov 2007 13:49] Georgi Kodinov
Please try with a newer version. 

With my fix applied I'm getting:

create table temp(
rev_page int(8) unsigned,
rev_timestamp varchar(14)
)
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
load data infile 'f168cd947.txt' into table temp
--------------

Query OK, 1939 rows affected, 1938 warnings (0.01 sec)
Records: 1939  Deleted: 0  Skipped: 0  Warnings: 1938

--------------
create index temp_index on temp(rev_timestamp)
--------------

Query OK, 1939 rows affected (0.01 sec)
Records: 1939  Duplicates: 0  Warnings: 0

--------------
select rev_page as pg2, min(rev_timestamp) as m2 from temp outr
where rev_timestamp > (select min(rev_timestamp) from temp where rev_page = outr.rev_page)
and rev_page < 4
group by pg2
order by pg2 asc
--------------

+------+----------------+
| pg2  | m2             |
+------+----------------+
|    0 | 20070430155538 | 
|    1 | 20060113175304 | 
|    2 | 20060113175851 | 
|    3 | 20060730182601 | 
+------+----------------+
4 rows in set (2.56 sec)
[23 Nov 2007 17:29] huji huji
I repeated the process you said, and used the same query. What I get is just like yours, except for the last step (nested select) which returns:

 pg2     m2    
 ------  ----- 

 0 record(s) selected [Fetch MetaData: 0/ms] [Fetch Data: 0/ms] 

 [Executed: 11/23/07 8:58:55 PM GMT+03:30 ] [Execution: 4797/ms]

Ran on Windows XP SP2 and MySQL 5.0.41-community-nt
[23 Nov 2007 17:41] huji huji
I repeated the same procedure on another machine, and it worked:

mysql> create table temp( rev_page int(8) unsigned, rev_timestamp varchar(14) );Query OK, 0 rows affected (0.04 sec)

mysql> load data infile 'f168cd947.txt' into table temp;
Query OK, 1939 rows affected, 1938 warnings (0.09 sec)
Records: 1939  Deleted: 0  Skipped: 0  Warnings: 1938

mysql> create index temp_index on temp(rev_timestamp);
Query OK, 1939 rows affected (0.11 sec)
Records: 1939  Duplicates: 0  Warnings: 0

mysql> select rev_page as pg2, min(rev_timestamp) as m2 from temp outr
    -> where rev_timestamp > (select min(rev_timestamp) from temp where rev_page =
    -> outr.rev_page)
    -> and rev_page < 4
    -> group by pg2
    -> order by pg2 asc;
+------+----------------+
| pg2  | m2             |
+------+----------------+
|    0 | 20070430155538 | 
|    1 | 20060113175304 | 
|    2 | 20060113175851 | 
|    3 | 20060730182601 | 
+------+----------------+
4 rows in set (2.11 sec)

mysql> select version();
+------------------------------+
| version()                    |
+------------------------------+
| 5.0.38-Ubuntu_0ubuntu1.1-log | 
+------------------------------+
1 row in set (0.01 sec)

I'm not sure why it still exists on my Windows MySQL installation. Please also notice that the version of the database differs in Windows and Ubuntu installations.

Seeking your advice,

huji
[14 Dec 2007 8:14] Bugs System
Pushed into 5.0.54
[14 Dec 2007 8:17] Bugs System
Pushed into 5.1.23-rc
[14 Dec 2007 8:21] Bugs System
Pushed into 6.0.5-alpha
[14 Jan 2008 19:24] Jon Stephens
Documented in the 5.0.54, 5.1.23, and 6.0.5 changelogs as follows:

        Referencing within a subquery an alias used in the
        SELECT list of the outer query was incorrectly permitted.