Bug #15102 select distinct returns empty result, select count distinct > 0 (correct)
Submitted: 21 Nov 2005 15:45 Modified: 22 Mar 2006 16:03
Reporter: Torben Frey Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.0.x OS:Linux (Linux on AMD64)
Assigned to: Georgi Kodinov CPU Architecture:Any

[21 Nov 2005 15:45] Torben Frey
Description:
Hi,
I have a strange problem in tables with millions of rows:

select mycolumn from mytable where mycolumn = 17;
17
17
17
17

returns some identical rows.
Trying to make these rows unique (in this case I'd only get ONE result) I do

select distinct mycolumn from mytable where mycolumn = 17;

returns - no, not the one single unique row containing value - but an empty result set, which is definitely wrong. SOME identical result rows will shrink to ONE after a distinct, not to NONE.

The complete strange thing is, putting a count() around the distinct, I get exactly the number I'd expect: 1

select count( distinct mycolumn ) from mytable where mycolumn = 17;
1

This only happens in tables with millions of rows - once I delete most of the other rows, it is working.

I tried this with mysql-5.0.12beta, 5.0.15, 5.0.16, 5.0.17nightly20051120.
I tried it with different tables, databases...all the same. Table types are MyISAM. This error occurs when I have millions of rows in a table :-(

I would be happy if you could help me.

Thanks a lot,
Torben

How to repeat:
use mydb

mysql> select mycolumn from mytable where mycolumn=17;
+----------+
| mycolumn |
+----------+
|       17 |
|       17 |
|       17 |
|       17 |
|       17 |
+----------+
5 rows in set (0.00 sec)

mysql> select distinct mycolumn from mytable where mycolumn=17;
Empty set (0.00 sec)

mysql> select count(distinct mycolumn) from mytable where mycolumn=17;
+--------------------------+
| count(distinct mycolumn) |
+--------------------------+
|                        1 |
+--------------------------+
1 row in set (0.00 sec)

It happens with 11399741 rows:
mysql> select count(mycolumn) from mytable;
+-----------------+
| count(mycolumn) |
+-----------------+
|        11399741 |
+-----------------+
1 row in set (0.00 sec)

After I delete all rows with mycolumn > 17 it is working:
mysql> select count(mycolumn) from mytable;
+-----------------+
| count(mycolumn) |
+-----------------+
|               9 |
+-----------------+
1 row in set (0.00 sec)

mysql> select distinct mycolumn from mytable where mycolumn=17;
+----------+
| mycolumn |
+----------+
|       17 |
+----------+
1 row in set (0.00 sec)

Suggested fix:
Downgrade to 4.1.x - or don't use distinct.
[21 Nov 2005 17:17] Valeriy Kravchuk
Thank you for a problem report. Please, send the SHOW CREATE TABLE mytable results and specify the exact version used (5.0.x, x=?). 

Is it MySAM table? Have you analyzed/checked it before testing? - that is what I want to figure out first.
[21 Nov 2005 17:52] Torben Frey
Hey Valeriy,

thanks for the fast reply. I discovered the bug on 5.0.15, but it is reproducable on all 5.0.x versions we have running here, I tried 5.0.12beta, 5.0.15, 5.0.16, 5.0.17nightly20051120.

No, I didn't analyze the table before testing because I can reproduce it on various servers with different tables - so they would ALL be corrupt :-)

Here's the output:

mysql> show create table mytable;
+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                |
+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| mytable | CREATE TABLE `mytable` (
  `mycolumn` int(11) NOT NULL default '0',
  `mycolumn2` varchar(16) collate latin1_general_ci NOT NULL default '',
  PRIMARY KEY  (`mycolumn`,`mycolumn2`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci |
+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
[21 Nov 2005 18:03] Valeriy Kravchuk
Sorry, but there should be something special in your data. Simplest test case works OK for me on 5.0.17-BK:

mysql> CREATE TABLE `mytable` (
    ->   `mycolumn` int(11) NOT NULL default '0',
    ->   `mycolumn2` varchar(16) collate latin1_general_ci NOT NULL default '',
    ->   PRIMARY KEY  (`mycolumn`,`mycolumn2`)
    -> ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
Query OK, 0 rows affected (0,00 sec)

mysql> insert into mytable values(1, 'I am 1');
Query OK, 1 row affected (0,00 sec)

mysql> insert into mytable values(2, 'I am 2');
Query OK, 1 row affected (0,00 sec)

mysql> insert into mytable values(3, 'I am 3');
Query OK, 1 row affected (0,00 sec)

mysql> insert into mytable values(4, 'I am 4');
Query OK, 1 row affected (0,00 sec)

mysql> insert into mytable values(5, 'I am 5');
Query OK, 1 row affected (0,00 sec)

mysql> insert into mytable values(6, 'I am 6');
Query OK, 1 row affected (0,00 sec)

mysql> insert into mytable values(7, 'I am 7');
Query OK, 1 row affected (0,00 sec)

mysql> insert into mytable select mycolumn + 10, mycolumn2 from mytable;
Query OK, 7 rows affected (0,01 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> insert into mytable select mycolumn + 20, mycolumn2 from mytable;
Query OK, 14 rows affected (0,01 sec)
Records: 14  Duplicates: 0  Warnings: 0

mysql> insert into mytable values(17, 'I am 17');
Query OK, 1 row affected (0,01 sec)

mysql> insert into mytable values(17, 'I am still 17');
Query OK, 1 row affected (0,00 sec)

mysql> insert into mytable values(17, 'I am also 17');
Query OK, 1 row affected (0,00 sec)

mysql> select mycolumn from mytable where mycolumn = 17;
+----------+
| mycolumn |
+----------+
|       17 |
|       17 |
|       17 |
|       17 |
+----------+
4 rows in set (0,00 sec)

mysql> select distinct mycolumn from mytable where mycolumn = 17;
+----------+
| mycolumn |
+----------+
|       17 |
+----------+
1 row in set (0,00 sec)

mysql> select count(distinct mycolumn) from mytable where mycolumn = 17;
+--------------------------+
| count(distinct mycolumn) |
+--------------------------+
|                        1 |
+--------------------------+
1 row in set (0,01 sec)

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.0.17    |
+-----------+
1 row in set (0,00 sec)

So, have you deleted some rows before the problem appeared? What is the smallest set of data it is reproduceable on? Have you tried to repeat on other platform (not AMD64 - mine is old Celeron)?
[21 Nov 2005 18:21] Torben Frey
Hi again,

at the moment I do not have a 32-bit machine to install but I will see what I can do. What I can say so far is that it only happens at tables with many millions of entries - the ones I used had 11 mio and 20+ mio entries. The small tables are not affected. I should write a script to insert row after row to find out when the distinct result is empty - hopefully I can do it tomorrow.

Right now I need to run to catch my train, sorry... but thank you for your help so far, I am sure we can debug this to the end :-)

Cu,
Torben
[22 Nov 2005 8:35] MySQL Verification Team
I repeated on 5.0.16 and todays 5.0.17-bk using random numbers. (will upload dump shortly).

Linux fc4.hti.co.za 2.6.11-1.1369_FC4 #1 Thu Jun 2 22:55:56 EDT 2005 i686 i686 i386 GNU/Linux

mysql> SELECT mycolumn FROM mytable WHERE mycolumn=0;
+----------+
| mycolumn |
+----------+
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
|        0 |
+----------+
89 rows in set (0.00 sec)

mysql> SELECT DISTINCT mycolumn FROM mytable WHERE mycolumn=0;
Empty set (0.00 sec)

mysql> SELECT COUNT(DISTINCT mycolumn) FROM mytable WHERE mycolumn=0;
+--------------------------+
| COUNT(DISTINCT mycolumn) |
+--------------------------+
|                        1 |
+--------------------------+
1 row in set (0.00 sec)

mysql> CHECK TABLE mytable EXTENDED;
+--------------+-------+----------+----------+
| Table        | Op    | Msg_type | Msg_text |
+--------------+-------+----------+----------+
| test.mytable | check | status   | OK       |
+--------------+-------+----------+----------+
1 row in set (2 min 43.08 sec)

mysql> SELECT VERSION();
+--------------+
| VERSION()    |
+--------------+
| 5.0.17-debug |
+--------------+
1 row in set (0.00 sec)

mysql>

mysql> SELECT COUNT(*) FROM mytable;
+----------+
| COUNT(*) |
+----------+
|  3996169 |
+----------+
1 row in set (0.00 sec)

mysql>
[22 Nov 2005 11:06] Torben Frey
Hi again,

I am glad Shane verified this bug. I am sorry I cannot provide a row count when it starts - seems to depend on the resultset size :-(

Torben
[10 Jan 2006 15:19] scott gusler
Extension to the bug, it's not limited to single value fields

Query:
select accountgroup from customers where accountgroup like 'N4K - MI - IRESA%'

Result:
+-------------------------+
| accountgroup            |
+-------------------------+
| N4K - MI - IRESA        |
| N4K - MI - IRESA        |
| N4K - MI - IRESA        |
| N4K - MI - IRESA        |
| N4K - MI - IRESA        |
| N4K - MI - IRESA        |
| N4K - MI - IRESA        |
| N4K - MI - IRESA        |
| N4K - MI - IRESA        |
| N4K - MI - IRESA - IRTC |
| N4K - MI - IRESA - IRTC |
| N4K - MI - IRESA - IRTC |
| N4K - MI - IRESA - IRTC |
| N4K - MI - IRESA - IRTC |
| N4K - MI - IRESA - IRTC |
| N4K - MI - IRESA - IRTC |
| N4K - MI - IRESA - IRTC |
| N4K - MI - IRESA - IRTC |
| N4K - MI - IRESA - IRTC |
| N4K - MI - IRESA - IRTC |
| N4K - MI - IRESA - IRTC |
| N4K - MI - IRESA - IRTC |
| N4K - MI - IRESA - IRTC |
| N4K - MI - IRESA - IRTC |
+-------------------------+
24 rows in set (0.00 sec)

Query:
select distinct accountgroup from customers where accountgroup like 'N4K - MI - IRESA%';

Result:
Empty set (0.00 sec)

Query:
 select accountgroup from customers where accountgroup like 'N4K - MI - IRESA%' group by accountgroup;

Result:
Empty set (0.00 sec)

Query:
 select count(accountgroup) from customers where accountgroup like 'N4K - MI - IRESA%' group by accountgroup;

Result:
+---------------------+
| count(accountgroup) |
+---------------------+
|                   9 |
|                  15 |
+---------------------+
2 rows in set (0.00 sec)

Query:
select count(distinct accountgroup) from customers where accountgroup like 'N4K - MI - IRESA%';

Result:
+------------------------------+
| count(distinct accountgroup) |
+------------------------------+
|                            2 |
+------------------------------+

Not sure if it's helpful or not in debugging but I figured it may be useful information
[10 Jan 2006 17:30] scott gusler
a quick fix to this problem, slows query for large datasets but works

select distinct CONCAT(field,'') from table
[22 Mar 2006 16:03] Georgi Kodinov
Tried Shane's dump on :
Linux www.progem.bg 2.6.15-1.1831_FC4 #1 Tue Feb 7 13:37:42 EST 2006 i686 i686 i386 GNU/Linux 
mysqld Ver 5.0.20-debug

The bug seems to have been fixed in the source repository by some other change, as it does not appear anymore (even after restarting the server). 
I get the following results :
mysql> SELECT DISTINCT mycolumn FROM mytable WHERE mycolumn=0;
+----------+
| mycolumn |
+----------+
| 0        |
+----------+
1 row in set (0.01 sec)

mysql> SELECT COUNT(DISTINCT mycolumn) FROM mytable WHERE mycolumn=0;
+--------------------------+
| COUNT(DISTINCT mycolumn) |
+--------------------------+
| 1                        |
+--------------------------+
1 row in set (0.00 sec)

If you are able to reproduce the bug with one of the latest versions,
please change the version on this bug report to the version you
tested and change the status back to "Open". 
If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information
about accessing the source trees is available at
http://www.mysql.com/doc/en/Installing_source_tree.html
[27 Mar 2006 9:20] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/4184