Bug #62755 Unexpect "ERROR 1062 (23000): Duplicate entry '...' for key 'group_key'" error
Submitted: 16 Oct 2011 2:59 Modified: 23 Nov 2011 12:51
Reporter: WeiQi Huang Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:5.5.16; 5.1.59 OS:Microsoft Windows (XP SP3)
Assigned to: CPU Architecture:Any
Tags: GROUP BY

[16 Oct 2011 2:59] WeiQi Huang
Description:
'GROUP BY concat_ws(...)' raises an unexpected error in 'SELECT ...' statement when query a comparatively large table. (e.g. ERROR 1062 (23000): Duplicate entry '28524-32769' for key 'group_key'.)

How to repeat:
1. Download GO term DB from "http://archive.geneontology.org/termdb/2011-01-01/go_daily-termdb-tables.tar.gz". 
Import it into local MySQL database 'go', as described in "http://archive.geneontology.org/termdb/2011-01-01/README".

2. Successful query when table is small:

mysql>use test;
Database changed
mysql>CREATE TABLE d (id INT(11) PRIMARY KEY) SELECT DISTINCT term2_id AS id FROM go.graph_path WHERE term1_id=33275 AND distance>=1;
Query OK, 10 rows affected (0.30 sec)
Records: 10  Duplicates: 0  Warnings: 0
mysql>CREATE TABLE a (anc INT(11), term INT(11), dist INT(11)) SELECT DISTINCT g.term1_id AS anc, g.term2_id AS term, g.distance AS dist FROM d JOIN  go.graph_path AS g ON (g.term1_id=d.id);
Query OK, 21 rows affected (0.09 sec)
Records: 21  Duplicates: 0  Warnings: 0
mysql>CREATE TABLE ss (term1 INT(11), term2 INT(11), LCA INT(11), between_dist INT(11)) SELECT a1.term AS term1, a2.term AS term2, a1.anc AS LCA, MIN(a1.dist +
a2.dist) AS between_dist FROM a AS a1 JOIN a AS a2 ON (a1.anc=a2.anc AND a1
.term<a2.term) GROUP BY CONCAT_WS('-', CAST(a1.term AS CHAR), CAST(a2.term AS
CHAR));
Query OK, 22 rows affected (0.11 sec)
Records: 22  Duplicates: 0  Warnings: 0

(Change 'term1_id=33275' to 'term1_id=4498', we will success again on a moderate  large table.)

3. Failure occurs when table is comparatively large(with same statements as above):

mysql>RENAME TABLE d TO d_33275, a TO a_33275, ss TO ss_33275;
Query OK, 0 rows affected (0.03 sec)
mysql>CREATE TABLE d (id INT(11) PRIMARY KEY) SELECT DISTINCT term2_id AS id FROM go.graph_path WHERE term1_id=6534 AND distance>=1;
Query OK, 20091 rows affected (0.88 sec)
Records: 20091  Duplicates: 0  Warnings: 0
mysql>CREATE TABLE a (anc INT(11), term INT(11), dist INT(11)) SELECT DISTINCT g.term1_id AS anc, g.term2_id AS term, g.distance AS dist FROM d JOIN  go.graph_path AS g ON (g.term1_id=d.id);
Query OK, 467970 rows affected (3.58 sec)
Records: 467970  Duplicates: 0  Warnings: 0
(Here, creating some indexes on this table 'a' will increase running speed of the last SQL statement. It takes more than several hours' WAITING before error reported, on my Notebook with T2050@1.60GHz CPU and 2.0G memory).
mysql>CREATE TABLE ss (term1 INT(11), term2 INT(11), LCA INT(11), between_dist INT(11)) SELECT a1.term AS term1, a2.term AS term2, a1.anc AS LCA, MIN(a1.dist +
a2.dist) AS between_dist FROM a AS a1 JOIN a AS a2 ON (a1.anc=a2.anc AND a1
.term<a2.term) GROUP BY CONCAT_WS('-', CAST(a1.term AS CHAR), CAST(a2.term AS
CHAR));
ERROR 1062 (23000): Duplicate entry '28524-32769' for key 'group_key'

(The 'Duplicate entry' may not be '28524-32769' in your test cases, even not the same in two repeats.)

Suggested fix:
N/A
[16 Oct 2011 7:46] Valeriy Kravchuk
Please, send your my.ini file content.
[16 Oct 2011 9:51] WeiQi Huang
'my.ini' requested by Valeriy Kravchuk

Attachment: my.ini (application/octet-stream, text), 8.70 KiB.

[19 Oct 2011 9:02] Valeriy Kravchuk
Looks like a duplicate of Bug #58081. Please, check.
[21 Oct 2011 7:26] WeiQi Huang
agree.
verified in 5.1.59 also
[23 Nov 2011 12:51] Miguel Solorzano
According prior comment duplicate of http://bugs.mysql.com/bug.php?id=58081.