Bug #24985 | UTF8 ENUM primary key on MEMORY using BTREE causes incorrect duplicate entries | ||
---|---|---|---|
Submitted: | 12 Dec 2006 0:21 | Modified: | 5 Apr 2007 19:42 |
Reporter: | Ryan Korczykowski | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Memory storage engine | Severity: | S3 (Non-critical) |
Version: | 5.0.27/4.1BK/5.0BK/5.1BK | OS: | Windows (Windows 2K/Linux) |
Assigned to: | Ingo Strüwing | CPU Architecture: | Any |
Tags: | BTREE, duplicate, enum, Memory, primary key, utf8 |
[12 Dec 2006 0:21]
Ryan Korczykowski
[12 Dec 2006 0:45]
MySQL Verification Team
Thank you for the bug report. Verified as described on Suse Linux: miguel@hegel:~/dbs/5.0> bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.0.32-debug Source distribution Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> CREATE TABLE `espi_lp_hash` ( -> `BusPtid` -> enum('301362','401362','302362','403362','303362','304362','404362', -> '305362','405362','306362','406362','407362', -> '408362','Error') NOT NULL default '301362', -> `Date` date NOT NULL default '0000-00-00', -> `Hour` tinyint(4) NOT NULL default '0', -> `Fraction` double NOT NULL default '0', -> PRIMARY KEY USING BTREE (`BusPtid`,`Date`,`Hour`) -> ) ENGINE=MEMORY DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.10 sec) mysql> INSERT INTO `espi_lp_hash` (`BusPtid`, `Date`, `Hour`, `Fraction`) VALUES -> ('301362', '2004-04-02', 0, 0); Query OK, 1 row affected (0.01 sec) mysql> mysql> INSERT INTO `espi_lp_hash` (`BusPtid`, `Date`, `Hour`, `Fraction`) VALUES -> ('302362', '2004-04-02', 0, 0); ERROR 1062 (23000): Duplicate entry '302362-2004-04-02-0' for key 1 mysql> exit Bye
[23 Mar 2007 8:55]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/22734 ChangeSet@1.2609, 2007-03-23 09:54:49+01:00, istruewing@chilla.local +5 -0 Bug#24985 - UTF8 ENUM primary key on MEMORY using BTREE causes incorrect duplicate entries Keys for a BTREE index on an ENUM column of a MEMORY table with character set UTF8 were computed incorrectly. Many different column values got the same key value. Apart of possible performance problems it made unique indexes of this type unusable because it rejected many different values as duplicates. The problem was that multibyte character detection was tried on the internal numeric column value. Many values were not identified as characters. Their key value became blank filled. The solution is modelled after the equivalent behavior of MyISAM. A character set is defined for key segments of text type only. Binary segments get a pseudo binary charset attached. All other types do not get a charset attached.
[26 Mar 2007 5:32]
Alexander Barkov
Hello Ingo, the problem is in this piece code in ha_heap.cc: for (; key_part != key_part_end; key_part++, seg++) { Field *field= key_part->field; if (pos->algorithm == HA_KEY_ALG_BTREE) seg->type= field->key_type(); else { if ((seg->type = field->key_type()) != (int) HA_KEYTYPE_TEXT && seg->type != HA_KEYTYPE_VARTEXT1 && seg->type != HA_KEYTYPE_VARTEXT2 && seg->type != HA_KEYTYPE_VARBINARY1 && seg->type != HA_KEYTYPE_VARBINARY2) seg->type= HA_KEYTYPE_BINARY; } seg->start= (uint) key_part->offset; seg->length= (uint) key_part->length; seg->flag= key_part->key_part_flag; seg->charset= field->charset(); In the case of ENUM (and SET - see below), seg->charset should be set to &my_charset_bin instead of field->charset(). (most likely for both HA_KEY_ALG_BTREE and HA_KEY_ALG_HASH) If you change the last line to this: if (field->real_type() == MYSQL_TYPE_ENUM) set->charset= &my_charset_bin; else seg->charset= field->charset(); it should fix the problem. I think no extra changes in /heap required. Also, after fix, make sure SET is covered in the tests against the same problem: mysql> create table t1 (c1 set('1','2'), unique using btree(c1)) engine=memory default character set utf8; Query OK, 0 rows affected (1.03 sec) mysql> insert into t1 values ('1'); Query OK, 1 row affected (0.00 sec) mysql> insert into t1 values ('2'); ERROR 1062 (23000): Duplicate entry '2' for key 1
[26 Mar 2007 15:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/22941 ChangeSet@1.2609, 2007-03-26 17:02:15+02:00, istruewing@chilla.local +3 -0 Bug#24985 - UTF8 ENUM primary key on MEMORY using BTREE causes incorrect duplicate entries Keys for BTREE indexes on ENUM and SET columns of MEMORY tables with character set UTF8 were computed incorrectly. Many different column values got the same key value. Apart of possible performance problems, it made unique indexes of this type unusable because it rejected many different values as duplicates. The problem was that multibyte character detection was tried on the internal numeric column value. Many values were not identified as characters. Their key value became blank filled. Thanks to Alexander Barkov for the patch, which sets the character set of ENUM and SET key segments to the pseudo binary character set.
[26 Mar 2007 17:40]
Alexander Barkov
The patch http://lists.mysql.com/commits/22941 is ok to push.
[27 Mar 2007 8:50]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/22985 ChangeSet@1.2609, 2007-03-27 10:49:48+02:00, istruewing@chilla.local +3 -0 Bug#24985 - UTF8 ENUM primary key on MEMORY using BTREE causes incorrect duplicate entries Keys for BTREE indexes on ENUM and SET columns of MEMORY tables with character set UTF8 were computed incorrectly. Many different column values got the same key value. Apart of possible performance problems, it made unique indexes of this type unusable because it rejected many different values as duplicates. The problem was that multibyte character detection was tried on the internal numeric column value. Many values were not identified as characters. Their key value became blank filled. Thanks to Alexander Barkov and Ramil Kalimullin for the patch, which sets the character set of ENUM and SET key segments to the pseudo binary character set.
[27 Mar 2007 10:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/22994 ChangeSet@1.2483, 2007-03-27 12:39:31+02:00, istruewing@chilla.local +1 -0 Bug#24985 - UTF8 ENUM primary key on MEMORY using BTREE causes incorrect duplicate entries After merge fix
[30 Mar 2007 9:06]
Ingo Strüwing
Queued to 5.1-engines, 5.0-engines, 4.1-engines.
[30 Mar 2007 17:28]
Bugs System
Pushed into 5.1.18-beta
[30 Mar 2007 17:30]
Bugs System
Pushed into 5.0.40
[30 Mar 2007 17:42]
Ingo Strüwing
Pushed to 4.1.23
[30 Mar 2007 18:17]
Bugs System
Pushed into 5.0.40
[5 Apr 2007 19:42]
Paul DuBois
Noted in 4.1.23, 5.0.40, 5.1.18 changelogs. Duplicate entries were not assessed correctly in a MEMORY table with a BTREE primary key on a utf8 ENUM column.