Bug #3957 | Wildcard character of LIKE pattern-matching doesn't match zero characters. | ||
---|---|---|---|
Submitted: | 1 Jun 2004 20:14 | Modified: | 14 Sep 2004 13:12 |
Reporter: | Shuichi Tamagawa | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 4.1.2-alpha-Max-log | OS: | Linux (Linux) |
Assigned to: | Heikki Tuuri | CPU Architecture: | Any |
[1 Jun 2004 20:14]
Shuichi Tamagawa
[1 Jun 2004 20:18]
Shuichi Tamagawa
Test result
Attachment: like-bug.out (application/octet-stream, text), 586 bytes.
[1 Jun 2004 21:02]
Matthew Lord
I was able to verify this as an innodb specific problem. The verification was done using 4.1.2 on Red Hat 9 2.4.21 #12 SMP. root@localhost:bugs~> select t, hex(t) from bug3957 where t like cast(concat(0xA4A2,'%') as char character set ujis); +------+----------+ | t | hex(t) | +------+----------+ | ¤ | A4A2 | | ¤¤ | A4A2A4A4 | | ¤¤ | A4A2A4A6 | +------+----------+ 3 rows in set (0.00 sec) root@localhost:bugs~> alter table bug3957 type=innodb; Query OK, 10 rows affected, 1 warning (0.14 sec) Records: 10 Duplicates: 0 Warnings: 0 root@localhost:bugs~> select t, hex(t) from bug3957 where t like cast(concat(0xA4A2,'%') as char character set ujis); +------+----------+ | t | hex(t) | +------+----------+ | ¤¤ | A4A2A4A4 | | ¤¤ | A4A2A4A6 | +------+----------+ 2 rows in set (0.00 sec) root@localhost:bugs~>
[5 Jun 2004 5:08]
tony chen
Me too! create table AAA ( A_ID VARCHAR(5) NOT NULL, PRIMARY KEY (A_ID) )TYPE=INNODB DEFAULT CHARSET=gb2312 ; INSERT INTO AAA VALUES ('ABC'); 1. SELECT * FROM AAA WHERE A_ID='ABC' ------------------------- A_ID ------- ABC ------------------------ 2. SELECT * FROM AAA WHERE A_ID LIEK 'ABC%' --------------------------- EMPTY
[7 Jun 2004 11:44]
Alexander Barkov
Right, this happens only with Innobase for non-latin1 character set. Latin1 works fine even in Innodb. I guess this problem appeared due to 4.1.2 change in string comparison. I also found the following: create table t1 (a char(10) character set utf8 not null, key (a)) engine=innodb; insert into t1 values ('b'); select * from t1 where a like _utf8'b%'; drop table t1; This script doesn't return rows. If I set a breakpoint in ha_innodb.cc:2822, i.e. after row_sel_convert_mysql_key_to_innobase() call, and type: set key_buf[0]='a' then this query returns the row: select * from t1 where a like _utf8'b%';
[7 Jun 2004 14:36]
Heikki Tuuri
Hi! The reason apparently is this. When we insert the string 'b' to the table, MySQL pads it with spaces (0x20) at the end. When MySQL does the LIKE 'b%' search, it pads the end of the search string with null bytes (0x00). InnoDB contains the following code where it removes spaces from non-latin1 strings to be compared: " case DATA_VARMYSQL: case DATA_MYSQL: case DATA_BLOB: if (data_type == DATA_BLOB && 0 != (type->prtype & DATA_BINARY_TYPE)) { ut_print_timestamp(stderr); fprintf(stderr, " InnoDB: Error: comparing a binary BLOB with a character set sensitive\n" "InnoDB: comparison!\n"); } /* MySQL does not pad the ends of strings with spaces in a comparison. That would cause a foreign key check to fail for non-latin1 character sets if we have different length columns. To prevent that we remove trailing spaces here before doing the comparison. NOTE that if we in the future map more MySQL types to DATA_MYSQL or DATA_VARMYSQL, we have to change this code. */ while (a_length > 0 && a[a_length - 1] == ' ') { a_length--; } while (b_length > 0 && b[b_length - 1] == ' ') { b_length--; } return(innobase_mysql_cmp( (int)(type->prtype & DATA_MYSQL_TYPE_MASK), (uint)dtype_get_charset_coll(type->prtype), a, a_length, b, b_length)); " The question is how we can fix this without breaking FOREIGN KEY constraint handling? I have to study this more. Regards, Heikki
[6 Jul 2004 23:12]
Shuichi Tamagawa
It still have the problem if the strage engine is bdb. mysql> select version(); +----------------+ | version() | +----------------+ | 4.1.3-beta-log | +----------------+ 1 row in set (0.01 sec) mysql> create table t_like(c char(3), index(c)) engine = bdb; Query OK, 0 rows affected (1.67 sec) mysql> insert into t_like values ('aaa'),('a'),('abc'),('bcd'); Query OK, 4 rows affected (0.62 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> select * from t_like where c like 'a%'; +------+ | c | +------+ | aaa | | abc | +------+ 2 rows in set (0.01 sec) mysql> alter table t_like drop index c; Query OK, 4 rows affected (1.90 sec) Records: 4 Duplicates: 0 Warnings: 0 mysql> select * from t_like where c like 'a%'; +------+ | c | +------+ | aaa | | a | | abc | +------+ 3 rows in set (0.58 sec)
[14 Sep 2004 13:12]
Heikki Tuuri
Hi! This was fixed in 4.1.3. InnoDB started then to use a different comparison function. ha_innodb.cc: " /* Starting from 4.1.3, we use strnncollsp() in comparisons of non-latin1_swedish_ci strings. NOTE that the collation order changes then: 'b\0\0...' is ordered BEFORE 'b ...'. Users having indexes on such data need to rebuild their tables! */ ret = charset->coll->strnncollsp(charset, a, a_length, b, b_length); " Note also: " Important: Starting from MySQL 4.1.3, InnoDB uses the same character set comparison functions as MySQL for non-latin1_swedish_ci character strings that are not BINARY. This changes the sorting order of space and characters < ASCII(32) in those character sets. For latin1_swedish_ci character strings and BINARY strings, InnoDB uses its own pad-spaces-at-end comparison method, which stays unchanged. If you have an InnoDB table created with MySQL 4.1.2 or earlier, with an index on a non-latin1 character set (in the case of 4.1.0 and 4.1.1 with any character set) CHAR/VARCHAR/or TEXT column that is not BINARY but may contain characters < ASCII(32), then you should do ALTER TABLE or OPTIMIZE table on it to regenerate the index, after upgrading to MySQL 4.1.3 or later. " A patch by Jan Lindström will fix similar LIKE 'a%' problems with UTF-8 in connection with column prefix indexes. Regards, Heikki