Bug #14440 CURDATE()/Distinct incorrect results
Submitted: 28 Oct 2005 20:36 Modified: 28 Oct 2005 23:13
Reporter: Chris DiMartino Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.0.15 OS:Linux (linux)
Assigned to: MySQL Verification Team CPU Architecture:Any

[28 Oct 2005 20:36] Chris DiMartino
Description:
When running a query:

SELECT DISTINCT(code) FROM bad_date WHERE CURDATE() BETWEEN start_date AND end_date;

The result set returned is not correct. A smaller result set is returned compared to:

SELECT COUNT(DISTINCT(code)) FROM bad_date WHERE CURDATE() BETWEEN start_date AND end_date;

which is actually returning the proper results.

Upon further investigation, I found interesting results.  The number of records returned by the "distinct only" query is inversely proportional to the number of rows in the entire data set.

For example:

select distinct(code)  |   select count(distinct...)   |   total rows
--------------------------------------------------------------------------------
38813                        |   67015                             |   800000
46290                        |   67015                             |   400000
49119                        |   67015                             |   200000
62165                        |   67015                             |   100000
67015                        |   67015                             |   70000

The problem does not appear to be dependant on the storage engine.

The same problem also happens with the following query:

SELECT code FROM bad_date WHERE curdate() BETWEEN start_date AND end_date GROUP BY code;

Adding a "LIMIT" statement to the broken query repairs the issue in both cases.

How to repeat:
Load dump file from attached file.
Run the above mentioned queries.

Suggested fix:
Repair problem
[28 Oct 2005 20:44] Chris DiMartino
As the problem disappears at the file size limit that I can upload to the server here, I've hosted the file which displays the problem here:  http://chris.asiscan.com/broken_date.dmp.tar.gz

Please download it and gunzip it in order to have a dataset which displays the issue.
[28 Oct 2005 23:13] MySQL Verification Team
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

|| ZZLY |
| ZZLZ |
| ZZNE |
| ZZOO |
| ZZQJ |
| ZZQP |
| ZZQT |
| ZZQV |
| ZZR0 |
| ZZU2 |
| ZZV5 |
| ZZWB |
| ZZXN |
+------+
67015 rows in set (5.12 sec)

mysql> SELECT COUNT(DISTINCT(code)) FROM bad_date WHERE CURDATE() BETWEEN start_date
    -> AND end_date;
+-----------------------+
| COUNT(DISTINCT(code)) |
+-----------------------+
|                 67015 |
+-----------------------+
1 row in set (5.15 sec)

mysql> select version();
+--------------+
| version()    |
+--------------+
| 5.0.16-debug |
+--------------+
1 row in set (0.00 sec)
[29 Oct 2005 2:50] Chris DiMartino
Is there any chance that the fix of this bug will have any effect on the outcome of this bug: http://bugs.mysql.com/bug.php?id=14360

They seemed to have somewhat similiar properties, and both are biting us hard.
[1 Nov 2005 16:51] Chris DiMartino
I've compiled and tested the current version from source, and the bug still exists, without any apparent change:

SELECT DISTINCT(code) FROM bad_date3 WHERE '2005-10-28' BETWEEN start_date AND end_date;
...
38813 rows in set (1.47 sec)

mysql> SELECT COUNT(DISTINCT(code)) FROM bad_date3 WHERE CURDATE() - interval 4 day BETWEEN start_date AND end_date;
+-----------------------+
| COUNT(DISTINCT(code)) |
+-----------------------+
|                 67015 |
+-----------------------+
1 row in set (0.93 sec)
[1 Nov 2005 16:53] Chris DiMartino
I've compiled and tested the current version from source, and the bug still exists, without any apparent change:

SELECT DISTINCT(code) FROM bad_date3 WHERE '2005-10-28' BETWEEN start_date AND end_date;
...
38813 rows in set (1.47 sec)

mysql> SELECT COUNT(DISTINCT(code)) FROM bad_date3 WHERE CURDATE() - interval 4 day BETWEEN start_date AND end_date;
+-----------------------+
| COUNT(DISTINCT(code)) |
+-----------------------+
|                 67015 |
+-----------------------+
1 row in set (0.93 sec)
[2 Nov 2005 14:25] Chris DiMartino
Is it possible that the fix that you made did not get commited in to the source tree?  I don't see any mention of it in the changelogs nor the "mysql internals" lists, and the problem still exists.
[3 Nov 2005 22:45] Chris DiMartino
Will this bug be re-opened?  I'm tempted to file a duplicate bug just to draw attention to this one as it seems like it is lost in the mix!
[3 Nov 2005 23:06] MySQL Verification Team
I did a test against the latest source from our BK repository and for
this reason I closed as fixed in source. Which source you used for
to compile ?
[3 Nov 2005 23:46] Chris DiMartino
I checked out and built against the source as descibed in http://www.mysql.com/doc/en/Installing_source_tree.html.  Is there a different source tree I should be working with?
[4 Nov 2005 0:09] MySQL Verification Team
You did according the correct instructions, however I am not sure
if you got an updated source. I did the test again, with source
I pulled today:

mysql> SELECT DISTINCT(code) FROM bad_date WHERE CURDATE() BETWEEN start_date AND
    -> end_date;
Empty set (4.83 sec)

mysql> SELECT COUNT(DISTINCT(code)) FROM bad_date WHERE CURDATE() BETWEEN start_date
    -> AND end_date;
+-----------------------+
| COUNT(DISTINCT(code)) |
+-----------------------+
|                     0 |
+-----------------------+
1 row in set (4.92 sec)

Could you please try to clone again the source and do the test.
[4 Nov 2005 16:58] Chris DiMartino
I've updated and recompiled.  The problem still exists:

mysql> select distinct(code) from route where curdate() between start_date and end_date;
| KZU0 |
| KZU8 |
| L0CY |
| L0DG |
| L0DM |
| L0DS |
| L0DW |
| L0G6 |
| L0G9 |
| L0GD |
| L0GE |
| L0GI |
| L0GO |
| L0GP |
| L0GR |
| L0GT |
| L0GU |
| L0H3 |
+------+
19614 rows in set (9.90 sec)

mysql> select count(distinct(code)) from route where curdate() between start_date and end_date;
+-----------------------+
| count(distinct(code)) |
+-----------------------+
|                 83160 |
+-----------------------+
1 row in set (8.46 sec)

This is in my main table that I took the original smaller dataset from.  I'm not sure what is different now, but I am having trouble reproducing the issue in a smaller dataset, yet the issue does still exist in the larger set (6600000 rows).

Any thoughts?