Bug #12139 MyISAM: DISTINCT/GROUP BY return duplicates
Submitted: 24 Jul 2005 14:32 Modified: 26 Aug 2005 8:25
Reporter: Ronald Weiss Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.0.9-nt OS:Windows (winXP)
Assigned to: CPU Architecture:Any

[24 Jul 2005 14:32] Ronald Weiss
Description:
With MyISAM under some special circumstances (see how-to-repeat), DISTINCT or GROUP BY return duplicate rows. With InnoDB, everything is OK.

Similar like bugs #10207 & #11484 (duplicates), but they are stated to be already fixed in 5.0.9, and this one seems more special.

It seems to be related to order of records in the table.

How to repeat:
mysql> CREATE TABLE xx( x VARCHAR(10) ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO xx VALUES ('a'),('bb'),('a'),('cc');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SELECT DISTINCT x FROM xx;
+------+
| x    |
+------+
| a    |
| bb   |
| a    |
| cc   |
+------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM xx GROUP BY x;
+------+
| x    |
+------+
| a    |
| a    |
| bb   |
| cc   |
+------+
4 rows in set (0.01 sec)
[24 Jul 2005 16:29] MySQL Verification Team
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

c:\mysql\bin>mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2 to server version: 5.0.11-beta-nt

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> CREATE TABLE xx( x VARCHAR(10) ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.05 sec)

mysql> INSERT INTO xx VALUES ('a'),('bb'),('a'),('cc');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SELECT DISTINCT x FROM xx;
+------+
| x    |
+------+
| a    |
| bb   |
| cc   |
+------+
3 rows in set (0.02 sec)

mysql> SELECT * FROM xx GROUP BY x;
+------+
| x    |
+------+
| a    |
| bb   |
| cc   |
+------+
3 rows in set (0.01 sec)
[13 Aug 2005 15:52] Ronald Weiss
This bug is not yet fixed. It still can be seen with non-ascii characters, if table character set is utf8. (For me, utf8 is default, so it is used when no explicit charset is given for the table, and after setting latin1 charset explicitly, the bug is not there.)

So how to repeat (with utf8 as default charset):

mysql> CREATE TABLE xx (x VARCHAR(100)) ENGINE=MyISAM;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO xx values('yyy'),('á'),('xxx'),('á');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SELECT DISTINCT * FROM xx;
+------+
| x    |
+------+
| yyy  |
| á    |
| xxx  |
| á    |
+------+
4 rows in set (0.00 sec)

Some more information:

mysql> SHOW variables LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| character_set_client     | latin1 |
| character_set_connection | latin1 |
| character_set_database   | utf8   |
| character_set_results    | latin1 |
| character_set_server     | utf8   |
| character_set_system     | utf8   |
+--------------------------+--------+
6 rows in set (0.00 sec)

mysql> SHOW variables LIKE 'version';
+---------------+----------------+
| Variable_name | Value          |
+---------------+----------------+
| version       | 5.0.11-beta-nt |
+---------------+----------------+
1 row in set (0.00 sec)
[13 Aug 2005 15:57] Ronald Weiss
Well, I forgot to mention some other things, which might help you maybe:

- if only ascii chars are used, everything seems OK
- if the column is VARCHAR(255) instead of VARCHAR(100), it seems OK
- it the other values ('xxx','yyy') are shorter, it seems OK
[26 Aug 2005 8:25] Vasily Kishkin
Sorry...I was not able to reproduce the bug on 5.0.11. 

mysql> CREATE TABLE xx (x VARCHAR(100)) ENGINE=MyISAM;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO xx values('yyy'),('á'),('xxx'),('á');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SELECT DISTINCT * FROM xx;
+------+
| x    |
+------+
| yyy  |
| á    |
| xxx  |
+------+
4 rows in set (0.00 sec)
[26 Aug 2005 8:41] Ronald Weiss
Hmm..., I still can reproduce it.
Did your test table use utf8?