Bug #28878 InnoDB tables with UTF8 character set and indexes cause wrong result for DML
Submitted: 4 Jun 2007 15:35 Modified: 22 Oct 2007 22:03
Reporter: Mark Leith Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.0.42, 4.1.23 OS:Any
Assigned to: Alexey Kopytov CPU Architecture:Any
Tags: bfsm_2007_10_18, regression

[4 Jun 2007 15:35] Mark Leith
Description:
InnoDB tables that use the UTF8 character set cause wrong results for queries that use an index on character based columns. For instance:

mysql> drop table t1;
Query OK, 0 rows affected (0.00 sec)

mysql> create table t1 (i char(2), key (i)) engine = innodb default charset=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t1 values ('uk');
Query OK, 1 row affected (0.01 sec)

mysql> select * from t1 where i = 'uk';
+------+
| i    |
+------+
| uk   | 
+------+
1 row in set (0.00 sec)

mysql> delete from t1 where i = 'uk';
Query OK, 0 rows affected (0.00 sec)

mysql> update t1 set i = 'us' where i = 'uk';
Query OK, 0 rows affected (0.00 sec)
Rows matched: 0  Changed: 0  Warnings: 0

This is a regression, as it works fine on 5.0.40:

[markleith@medusa:~/mysql] $ mysql -u root -P3307 test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.0.40-enterprise-gpl MySQL Enterprise Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> drop table if exists t1;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> create table t1 (i char(2), key (i)) engine = innodb default charset=utf8;
Query OK, 0 rows affected (0.70 sec)

mysql> insert into t1 values ('uk');
Query OK, 1 row affected (0.00 sec)

mysql> select * from t1 where i = 'uk';
+------+
| i    |
+------+
| uk   | 
+------+
1 row in set (0.00 sec)

mysql> delete from t1 where i = 'uk';
Query OK, 1 row affected (0.06 sec)

How to repeat:
/* Failing */

drop table if exists t1;
create table t1 (i char(2), key (i)) engine = innodb default charset=utf8;
insert into t1 values ('uk');
select * from t1 where i = 'uk';
delete from t1 where i = 'uk';
update t1 set i = 'us' where i = 'uk';

/* Working without index */

alter table t1 drop index i;
select * from t1 where i = 'uk';
delete from t1 where i = 'uk';

/* Working with MyISAM */

alter table t1 add index (i);
alter table t1 engine = myisam;
insert into t1 values ('uk');
select * from t1 where i = 'uk';
delete from t1 where i = 'uk';

/* InnoDB without UTF8 Working */
drop table t1;
create table t1 (i char(2), key (i)) engine = innodb;
insert into t1 values ('uk');
select * from t1 where i = 'uk';
delete from t1 where i = 'uk';
[4 Jun 2007 15:43] Mark Leith
Also seems to only affect CHAR columns, as VARCHAR is not affected:

mysql> drop table t1;
Query OK, 0 rows affected (0.28 sec)

mysql> create table t1 (i varchar(2), key (i)) engine = innodb default charset utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t1 values ('uk');
Query OK, 1 row affected (0.00 sec)

mysql> select * from t1 where i = 'uk';
+------+
| i    |
+------+
| uk   | 
+------+
1 row in set (0.00 sec)

mysql> delete from t1 where i = 'uk';
Query OK, 1 row affected (0.01 sec)
[4 Jun 2007 16:02] Heikki Tuuri
Right now looking at this critical bug...
[4 Jun 2007 16:02] Heikki Tuuri
Right now looking at this critical bug...
[4 Jun 2007 16:08] Mark Leith
Hi Heikki,

Could this be related to the fix for Bug #20095?

http://bugs.mysql.com/bug.php?id=20095

This seems to be the only bug fix remotely related to this bug within the 5.0.42 changesets. 

Cheers,

Mark
[4 Jun 2007 17:26] Heikki Tuuri
The bug is that MySQL gives the key value padded with 0x00 in the DELETE ... WHERE ...

It should be padded with 0x20 in UTF-8.

DELETE:

(gdb) bt
#0  ha_innobase::index_read (this=0x15cb0c8, buf=0x15cb278 "ýuk    ¥¥¥¥¥¥¥¥¥",
    key_ptr=0x16076e0 "", key_len=7, find_flag=HA_READ_KEY_EXACT)
    at ha_innodb.cc:3778
#1  0x000000000070fc2c in handler::read_range_first (this=0x15cb0c8,
    start_key=0x1607288, end_key=0x1607298, eq_range_arg=true, sorted=false)
    at handler.cc:2636
#2  0x0000000000710060 in handler::read_multi_range_first (this=0x15cb0c8,
    found_range_p=0x44087400, ranges=0x1607288, range_count=1, sorted=false,
    buffer=0x0) at handler.cc:2510
#3  0x00000000006f1795 in QUICK_RANGE_SELECT::get_next (this=0x160ac10)
    at opt_range.cc:6852
#4  0x0000000000709e20 in rr_quick (info=0x440877f0) at records.cc:224
#5  0x00000000006b62bb in mysql_delete (thd=0x1601058, table_list=0x160b330,
    conds=0x160b760, order=0x1602a58, limit=18446744073709551615, options=0,
    reset_auto_increment=false) at sql_delete.cc:234
#6  0x000000000062bce0 in mysql_execute_command (thd=0x1601058)
    at sql_parse.cc:3640
#7  0x0000000000630d47 in mysql_parse (thd=0x1601058,
    inBuf=0x160b268 "delete from t1 where i = 'uk'", length=29,
    found_semicolon=0x44088f10) at sql_parse.cc:6052
#8  0x00000000006336ae in dispatch_command (command=COM_QUERY, thd=0x1601058,
    packet=0x16031d9 "delete from t1 where i = 'uk'", packet_length=30)
    at sql_parse.cc:1806
#9  0x0000000000634e3b in do_command (thd=0x1601058) at sql_parse.cc:1583
#10 0x00000000006361e1 in handle_one_connection (arg=0x1601058)
    at sql_parse.cc:1194
#11 0x00002b535627df1a in start_thread () from /lib/libpthread.so.0
#12 0x00002b5356824602 in clone () from /lib/libc.so.6
#13 0x0000000000000000 in ?? ()
(gdb) x/7b key_ptr
0x16076e0:      0x00    0x75    0x6b    0x00    0x00    0x00    0x00
(gdb)

In SELECT it is padded correctly:

(gdb) bt
#0  ha_innobase::index_read (this=0x15cb0c8, buf=0x15cb278 "ýuk    ¥¥¥¥¥¥¥¥¥",
    key_ptr=0x160d240 "", key_len=7, find_flag=HA_READ_KEY_EXACT)
    at ha_innodb.cc:3778
#1  0x0000000000677e6f in join_read_always_key (tab=0x161a7d8)
    at sql_select.cc:11045
#2  0x000000000067999e in sub_select (join=0x160bac0, join_tab=0x161a7d8,
    end_of_records=false) at sql_select.cc:10525
#3  0x0000000000682267 in do_select (join=0x160bac0, fields=0x1602980,
    table=0x0, procedure=0x0) at sql_select.cc:10289
#4  0x000000000069a01e in JOIN::exec (this=0x160bac0) at sql_select.cc:2070
#5  0x0000000000695d06 in mysql_select (thd=0x1601058,
    rref_pointer_array=0x1602aa0, tables=0x160b428, wild_num=1,
    fields=@0x1602980, conds=0x160b868, og_num=0, order=0x0, group=0x0,
    having=0x0, proc_param=0x0, select_options=2156153344, result=0x160baa0,
    unit=0x16024c8, select_lex=0x1602878) at sql_select.cc:2237
#6  0x000000000069a2c9 in handle_select (thd=0x1601058, lex=0x1602438,
    result=0x160baa0, setup_tables_done_option=0) at sql_select.cc:255
#7  0x0000000000628e08 in mysql_execute_command (thd=0x1601058)
    at sql_parse.cc:2658
#8  0x0000000000630d47 in mysql_parse (thd=0x1601058,
    inBuf=0x160b268 "select * from t1 where i = 'uk'", length=31,
    found_semicolon=0x44088f10) at sql_parse.cc:6052
#9  0x00000000006336ae in dispatch_command (command=COM_QUERY, thd=0x1601058,
    packet=0x16031d9 "", packet_length=32) at sql_parse.cc:1806
#10 0x0000000000634e3b in do_command (thd=0x1601058) at sql_parse.cc:1583
#11 0x00000000006361e1 in handle_one_connection (arg=0x1601058)
    at sql_parse.cc:1194
#12 0x00002b535627df1a in start_thread () from /lib/libpthread.so.0
#13 0x00002b5356824602 in clone () from /lib/libc.so.6
#14 0x0000000000000000 in ?? ()
(gdb) x/7b key_ptr
0x160d240:      0x00    0x75    0x6b    0x20    0x20    0x20    0x20
(gdb)

Regards,

Heikki
[4 Jun 2007 18:17] Mark Leith
Thanks for the debugging there Heikki, it seems we have now tracked this down to the fix for Bug #13191. 

It looks like we were passing it correctly for an earlier commit:

http://lists.mysql.com/commits/25686

+  if (bytes < length)
+    memset(buff+bytes, 0x20, (length-bytes));

However this was not done in a later commit:

+  if (bytes < length)
+    bzero(buff + bytes, length - bytes);
[6 Jun 2007 10:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28184

ChangeSet@1.2514, 2007-06-06 13:08:49+03:00, gkodinov@magare.gmz +3 -0
  Bug#28878: InnoDB tables with UTF8 character set and indexes cause 
  wrong result for DML
  When making key reference buffers over CHAR fields whitespace (0x32) 
  must be used to fill in the remaining space in the field's buffer.
  This is what Field_string::store() does.
  Fixed Field_string::get_key_image() to do the same.
[7 Jun 2007 7:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28258

ChangeSet@1.2514, 2007-06-07 10:08:44+03:00, gkodinov@magare.gmz +3 -0
  Bug#28878: InnoDB tables with UTF8 character set and indexes cause 
  wrong result for DML
  When making key reference buffers over CHAR fields whitespace (0x20) 
  must be used to fill in the remaining space in the field's buffer.
  This is what Field_string::store() does.
  Fixed Field_string::get_key_image() to do the same.
[14 Jun 2007 18:59] Bugs System
Pushed into 5.1.20-beta
[14 Jun 2007 19:00] Bugs System
Pushed into 5.0.44
[15 Jun 2007 9:21] MC Brown
A note has been added to the 5.1.20 and 5.0.44 changelogs.
[29 Jun 2007 17:29] MySQL Verification Team
Bug: http://bugs.mysql.com/bug.php?id=29449 was marked as duplicate
of this one.
[29 Aug 2007 15:59] Heikki Tuuri
This should also be fixed in 4.1.23 because this is a critical bug.
[29 Aug 2007 16:00] Heikki Tuuri
Dathan reported this from 4.1.23 in this report: http://bugs.mysql.com/bug.php?id=30485
[29 Aug 2007 18:57] Timothy Smith
I'm taking this, to backport the fix to 4.1.
[7 Sep 2007 18:49] Dathan Pattishall
Any chance that this has been pushed into bk yet?
[26 Sep 2007 19:10] Dathan Pattishall
So, any update on when this will be fixed? This is a rather serious data corruption bug.
[4 Oct 2007 10:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/34888

ChangeSet@1.2685, 2007-10-04 14:22:35+04:00, kaa@polly.(none) +3 -0
  Backport of the 5.0 patch to 4.1
  
  Bug#28878: InnoDB tables with UTF8 character set and indexes cause  wrong result for DML
  When making key reference buffers over CHAR fields whitespace (0x20) must be used to fill in the remaining space in the field's buffer. This is what Field_string::store() does. Fixed Field_string::get_key_image() to do the same.
[6 Oct 2007 14:38] MySQL Verification Team
Bug #31395 was marked as a duplicate of this one.
[8 Oct 2007 7:35] Frank Dr. Ullrich
Bug report #31395: there a VARCHAR(255) field is affected whereas this bug relates to CHAR fields only!
[18 Oct 2007 14:53] Sven Neuhaus
Could you give a date for a 4.1 release that fixes this serious bug? Thank you.
[18 Oct 2007 21:34] Bugs System
Pushed into 5.1.23-beta
[18 Oct 2007 21:36] Bugs System
Pushed into 5.0.52
[18 Oct 2007 21:37] Bugs System
Pushed into 4.1.24
[22 Oct 2007 22:03] Paul DuBois
Noted in 4.1.24, 5.0.52, 5.1.23 changelogs.

For InnoDB tables that use the utf8 character set, incorrect results
could occur for DML statements such as DELETE or UPDATE that use an
index on character-based columns.
[24 Oct 2007 6:50] Jon Stephens
Reconciled duplicate changelog entries: bugfix appeared in 4.1.24, 5.0.44, 5.1.20 and is now noted in the corresponding changelogs.