Bug #19960 Inconsistent results when joining InnoDB tables using partial UTF8 indexes
Submitted: 19 May 2006 23:54 Modified: 6 Oct 2006 2:51
Reporter: Kyle Anderson Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.0.19, 5.0.20/5.0BK/5.1BK OS:MacOS (OSX 10.4, Linux (Debian))
Assigned to: Alexander Barkov CPU Architecture:Any

[19 May 2006 23:54] Kyle Anderson
Description:
Join conditions using partial indexes of UTF8 columns on InnoDB tables incorrectly ignore rows where the length of the actual value is greater than the length of the partial index.

How to repeat:
-------------------
-- Create Tables
-------------------
CREATE TABLE `test1` (
  `colA` int(11) NOT NULL,
  `colB` varchar(255) character set utf8 NOT NULL,
  PRIMARY KEY  (`colA`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `test1` (`colA`, `colB`) VALUES (1, 'foo'), (2, 'foo bar');

CREATE TABLE `test2` (
  `colA` int(11) NOT NULL,
  `colB` varchar(255) character set utf8 NOT NULL,
  KEY `bad`  (`colA`,`colB`(3))
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `test2` (`colA`, `colB`) VALUES (1, 'foo'),(2, 'foo bar');

----------------------
-- Queries
----------------------
SELECT * FROM `test2` WHERE `colA` =2 AND `colB` = "foo bar"
# returns [2, foo bar], as expected (using the 'bad' index)

SELECT * 
FROM `test1`
JOIN `test2` ON `test1`.`colA` = `test2`.`colA` AND `test1`.`colB` = `test2`.`colB`
# returns [1, foo, 1, foo] and [2, foo bar, 2, foo bar], as expected (no indexes used)

SELECT * 
FROM `test1`
JOIN `test2` ON `test1`.`colA` = `test2`.`colA` AND `test1`.`colB` = `test2`.`colB`
WHERE `test1`.`colA` < 3
# the where condition causes the optimizer to use the 'bad' index in the join,
# returns only [1, foo, 1, foo], expected [2, foo bar, 2, foo bar] as well!

Suggested fix:
The use of MyISAM tables, alternate character sets, or full indexes prevent the issue.  Using 'ignore index' works as well.
[23 May 2006 17:29] MySQL Verification Team
Thank you for the bug report.

miguel@hegel:~/dbs/5.0> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2 to server version: 5.0.22-debug

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> CREATE TABLE `test1` (
    ->   `colA` int(11) NOT NULL,
    ->   `colB` varchar(255) character set utf8 NOT NULL,
    ->   PRIMARY KEY  (`colA`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.01 sec)

mysql> 
mysql> INSERT INTO `test1` (`colA`, `colB`) VALUES (1, 'foo'), (2, 'foo bar');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> CREATE TABLE `test2` (
    ->   `colA` int(11) NOT NULL,
    ->   `colB` varchar(255) character set utf8 NOT NULL,
    ->   KEY `bad`  (`colA`,`colB`(3))
    -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.01 sec)

mysql> 
mysql> INSERT INTO `test2` (`colA`, `colB`) VALUES (1, 'foo'),(2, 'foo bar');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM `test2` WHERE `colA` =2 AND `colB` = "foo bar";
+------+---------+
| colA | colB    |
+------+---------+
|    2 | foo bar | 
+------+---------+
1 row in set (0.02 sec)

mysql> SELECT * 
    -> FROM `test1`
    -> JOIN `test2` ON `test1`.`colA` = `test2`.`colA` AND `test1`.`colB` =
    -> `test2`.`colB`;
+------+---------+------+---------+
| colA | colB    | colA | colB    |
+------+---------+------+---------+
|    1 | foo     |    1 | foo     | 
|    2 | foo bar |    2 | foo bar | 
+------+---------+------+---------+
2 rows in set (0.01 sec)

mysql> SELECT * 
    -> FROM `test1`
    -> JOIN `test2` ON `test1`.`colA` = `test2`.`colA` AND `test1`.`colB` =
    -> `test2`.`colB`
    -> WHERE `test1`.`colA` < 3;
+------+------+------+------+
| colA | colB | colA | colB |
+------+------+------+------+
|    1 | foo  |    1 | foo  | 
+------+------+------+------+
1 row in set (0.01 sec)

mysql> ALTER TABLE test1 ENGINE=MyISAM;
Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE test2 ENGINE=MyISAM;
Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> SELECT * 
    -> FROM `test1`
    -> JOIN `test2` ON `test1`.`colA` = `test2`.`colA` AND `test1`.`colB` =
    -> `test2`.`colB`
    -> WHERE `test1`.`colA` < 3;
+------+---------+------+---------+
| colA | colB    | colA | colB    |
+------+---------+------+---------+
|    1 | foo     |    1 | foo     | 
|    2 | foo bar |    2 | foo bar | 
+------+---------+------+---------+
2 rows in set (0.00 sec)
-------------------------------------------------------------------------------------
mysql> SELECT * 
    -> FROM `test1`
    -> JOIN `test2` ON `test1`.`colA` = `test2`.`colA` AND `test1`.`colB` =
    -> `test2`.`colB`
    -> WHERE `test1`.`colA` < 3;
+------+------+------+------+
| colA | colB | colA | colB |
+------+------+------+------+
|    1 | foo  |    1 | foo  | 
+------+------+------+------+
1 row in set (0.01 sec)

mysql> select version();
+-------------------+
| version()         |
+-------------------+
| 5.1.11-beta-debug | 
+-------------------+
1 row in set (0.00 sec)

mysql> 
------------------------------------------------------------------------------------
mysql> SELECT * 
    -> FROM `test1`
    -> JOIN `test2` ON `test1`.`colA` = `test2`.`colA` AND `test1`.`colB` =
    -> `test2`.`colB`
    -> WHERE `test1`.`colA` < 3;
+------+---------+------+---------+
| colA | colB    | colA | colB    |
+------+---------+------+---------+
|    1 | foo     |    1 | foo     |
|    2 | foo bar |    2 | foo bar |
+------+---------+------+---------+
2 rows in set (0.02 sec)

mysql> SELECT version();
+--------------+
| version()    |
+--------------+
| 4.1.19-debug |
+--------------+
1 row in set (0.00 sec)

mysql>
[26 May 2006 13:55] Heikki Tuuri
Hi!

This is probably a bug in MySQL's interpreter. It uses the multi-column index 'bad' in test2, and passes the WHOLE string 'foo bar' as the second field in the key value. Since only the 3 first characters are stored in the index, it should pass just 'foo'.

MySQL does reserve 3 * 3 = 9 bytes for the column prefix, but the bytes after the 3 first characters should not be used.

Regards,

Heikki

Stack trace from 5.0.18:

(gdb) bt
#0  row_sel_convert_mysql_key_to_innobase (tuple=0x42356468, buf=0x899fb4c "",
    buf_len=836, index=0x42359a68, key_ptr=0x89a0b58 "\002", key_len=15,
    trx=0x4234cc68) at row0sel.c:2177
#1  0x082196ff in ha_innobase::index_read (this=0x89807b8, buf=0x899fb4c "",
    key_ptr=0x89a0b58 "\002", key_len=144309068, find_flag=HA_READ_KEY_EXACT)
    at ha_innodb.cc:3976
#2  0x081b87d5 in join_read_always_key (tab=0x89807b8) at sql_select.cc:10150
#3  0x081b7840 in sub_select (join=0x899a370, join_tab=0x89a09f4,
    end_of_records=4) at sql_select.cc:9654
#4  0x081b7993 in evaluate_join_record (join=0x899a370, join_tab=0x89a0890,
    error=4, report_error=0x899fb4c "") at sql_select.cc:9770
#5  0x081b77bd in sub_select (join=0x899a370, join_tab=0x89a0890,
    end_of_records=4) at sql_select.cc:9661
#6  0x081b7392 in do_select (join=0x899a370, fields=0x897ee88, table=0x0,
    procedure=0x0) at sql_select.cc:9418
#7  0x081abf08 in JOIN::exec (this=0x899a370) at sql_select.cc:1721
#8  0x081ac86a in mysql_select (thd=0x897eb98, rref_pointer_array=0x897ef24,
    tables=0x8999670, wild_num=1, fields=@0x899fb4c, conds=0x899a228,
    og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0,
    select_options=2156153344, result=0x899a360, unit=0x897ebe4,
    select_lex=0x897ee04) at sql_select.cc:1885
#9  0x081a89a2 in handle_select (thd=0x897eb98, lex=0x897ebd4,
    result=0x899a360, setup_tables_done_option=0) at sql_select.cc:238
#10 0x08175d1c in mysql_execute_command (thd=0x897eb98) at sql_parse.cc:2494
#11 0x0817e610 in mysql_parse (thd=0x897eb98,
    inBuf=0x89994f0 "SELECT * \nFROM `test1`\nJOIN `test2` ON `test1`.`colA` = `test2`.`colA` AND `test1`.`colB` =\n`test2`.`colB`\nWHERE `test1`.`colA` < 3",
    length=144174036) at sql_parse.cc:5628
#12 0x08174483 in dispatch_command (command=144173976, thd=0x897eb98,
    packet=0x89914c1 "", packet_length=144282864) at sql_parse.cc:1713
#13 0x08173fcd in do_command (thd=0x897eb98) at sql_parse.cc:1514
#14 0x081733a2 in handle_one_connection (arg=0x897eb98) at sql_parse.cc:1158
#15 0x40041b63 in start_thread () from /lib/tls/libpthread.so.0
#16 0x4024b18a in clone () from /lib/tls/libc.so.6
(gdb) x/30c key_ptr
0x89a0b58:      2 '\002'        0 '\0'  0 '\0'  0 '\0'  7 '\a'  0 '\0'  102 'f'111 'o'
0x89a0b60:      111 'o' 32 ' '  98 'b'  97 'a'  114 'r' 0 '\0'  0 '\0'  0 '\0'
0x89a0b68:      0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'
0x89a0b70:      0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'  0 '\0'
(gdb)
[12 Sep 2006 11:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/11759

ChangeSet@1.2236, 2006-09-12 16:01:23+05:00, bar@mysql.com +3 -0
  Bug#19960 Inconsistent results when joining InnoDB tables using partial UTF8 indexes
  Adding a multibyte-aware VARCHAR copying function, to put correct column prefix,
  taking in account number of characters (instead just limiting on number of bytes).
  For example, for a KEY(col(3)) on a UTF8 column when copying the string 'foo bar foo',
  we should put only 3 leftmost characters: 'foo'.
  9 characters were incorrectly put before this fix.
[29 Sep 2006 11:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/12830

ChangeSet@1.2282, 2006-09-29 16:15:57+05:00, bar@mysql.com +3 -0
  Bug#19960 Inconsistent results when joining InnoDB tables using partial UTF8 indexes
    Adding a multibyte-aware VARCHAR copying function, to put correct column prefix,
    taking in account number of characters (instead just limiting on number of bytes).
    For example, for a KEY(col(3)) on a UTF8 column when copying the string 'foo bar foo',
    we should put only 3 leftmost characters: 'foo'.
    9 characters were incorrectly put before this fix.
[3 Oct 2006 20:02] Chad MILLER
Available in 5.0.26.
[3 Oct 2006 20:09] Chad MILLER
Available in 5.1.12-beta.
[6 Oct 2006 2:51] Paul DuBois
Noted in 5.0.26, 5.1.12 changelogs.