| Bug #116584 | Query produces different resultset with force index | ||
|---|---|---|---|
| Submitted: | 7 Nov 2024 11:58 | Modified: | 7 Nov 2024 14:01 |
| Reporter: | Brian Yue (OCA) | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S2 (Serious) |
| Version: | 8.0.36, 8.0.40 | OS: | Any (Redhat 7.4) |
| Assigned to: | CPU Architecture: | Any (Intel) | |
| Tags: | charset, like operator, Result Set | ||
[7 Nov 2024 12:46]
MySQL Verification Team
pls ignore my previous comment. Thank you.
[7 Nov 2024 14:01]
MySQL Verification Team
Hello Brian Yue, Thank you for the report and test case. Verified as described. regards, Umesh
[7 Nov 2024 14:01]
MySQL Verification Team
-- 8.0.40
[umshastr@ellex07]~/binaries/mysql-8.0.40: bin/mysql -uroot -S /tmp/mysql.sock test -e "insert into ecattdat1 values ('817', '8170A100000000213850', ' ', '控制\t');"
[umshastr@ellex07]~/binaries/mysql-8.0.40: bin/mysql -uroot -S /tmp/mysql.sock test -e "select *from ecattdat1 where ec_full_name like '控制%';"
[umshastr@ellex07]~/binaries/mysql-8.0.40: bin/mysql -uroot -S /tmp/mysql.sock test -e "select *from ecattdat1 force index (primary) where ec_full_name like '控制%';"
+--------+----------------------+-----------+--------------+
| sd_key | ecatt_key | ec_cer_no | ec_full_name |
+--------+----------------------+-----------+--------------+
| 817 | 8170A100000000213850 | | 控制 |
+--------+----------------------+-----------+--------------+
[umshastr@ellex07]~/binaries/mysql-8.0.40:
[8 Nov 2024 14:02]
Bernt Marius Johnsen
Posted by developer: A simplified variant gives a slightly different, but albeit wrong, behavior: mysql> CREATE TABLE t1 (pk integer, v varchar(10), PRIMARY KEY (pk), KEY idx (v, pk) ) COLLATE=utf8mb4_bin; Query OK, 0 rows affected (0.03 sec) mysql> insert into t1 values (1, '控制\t'); Query OK, 1 row affected (0.00 sec) mysql> select * from t1 where v like '控制%'; +----+---------+ | pk | v | +----+---------+ | 1 | 控制 | +----+---------+ 1 row in set (0.00 sec) mysql> insert into t1 values (2, '控制X'); Query OK, 1 row affected (0.01 sec) mysql> select * from t1 where v like '控制%'; +----+---------+ | pk | v | +----+---------+ | 2 | 控制X | +----+---------+ 1 row in set (0.00 sec) mysql> select * from t1 force index(primary) where v like '控制%'; +----+---------+ | pk | v | +----+---------+ | 1 | 控制 | | 2 | 控制X | +----+---------+ 2 rows in set (0.00 sec)
[9 Nov 2024 15:37]
Mike Wang
I can reproduce this bug with a more simplified case:
--------------------
mysql> create table t1 ( c1 varchar(16), key idx (c1) ) collate=utf8mb4_bin;
Query OK, 0 rows affected (0.06 sec)
mysql> insert into t1 values ('000'), ('111'), ('abc\n');
Query OK, 3 rows affected (0.01 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select * from t1 where c1 like 'abc%';
Empty set (0.00 sec)
--------------------
The 'abc\n' record was not matched.
Ignoring the index, 'abc\n' can be matched.
--------------------
mysql> select * from t1 ignore index(idx) where c1 like 'abc%';
+------+
| c1 |
+------+
| abc
|
+------+
1 row in set (0.00 sec)
--------------------
From the explain infomation, a range query was performed then the server missed the record.
--------------------
mysql> explain select * from t1 where c1 like 'abc%';
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | t1 | NULL | range | idx | idx | 67 | NULL | 1 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
--------------------
A strange but interesting fact is that 'abc\n' is actually less than 'abc' with collation utf8mb4_bin.
--------------------
mysql> select 'abc\n' < 'abc' collate utf8mb4_bin;
+-------------------------------------+
| 'abc\n' < 'abc' collate utf8mb4_bin |
+-------------------------------------+
| 1 |
+-------------------------------------+
1 row in set (0.00 sec)
--------------------
It could be a space padding issue, and this bug is not only affecting collation utf8mb4_bin, but also latin1_bin and other 'PAD SPACE' bin collations.
[10 Nov 2024 6:46]
Mike Wang
A fix with testcase is uploaded at https://github.com/mysql/mysql-server/pull/576 . The commit 0b7ea5204596eae95f20336991eaead8c0a7bf78 is suspicious. The space padding is added at strings/ctype-mb.cc:my_like_range_mb() for collations that have MY_CS_BINSORT or no NO_PAD feature. For NO_PAD collations, the padded spaces will not be counted in during comparison as they are dropped at storage/innobase/rem/rem0cmp.cc:innobase_mysql_cmp(). But for MY_CS_BINSORT collations, the padded spaces are left and cause wrong comparison with '\n' or '\r' whose ordinal is lower than ' '(0x20). My suggestion is removing the (cs->state & MY_CS_BINSORT) condition as it is unnecessary and may cause bugs. Just pad it with min_sort_char (usually \0 for MY_CS_BINSORT).
[18 Nov 2024 15:34]
OCA Admin
Contribution submitted via Github - Bug #116584: Query produces different resultset with force index (*) Contribution by Mike Wang (Github MikeWang000000, mysql-server/pull/576): https://bugs.mysql.com/bug.php?id=116584 I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: git_patch_2171319168.txt (text/plain), 2.25 KiB.

Description: hello, Recently we find a case that, a query with like operator cannot match expected rows when ICP is used. But if `force index (PRIMARY)` is specified in the statement, the expected row is matched, which is unreasonable. How to repeat: [yxxdb_8036@localhost ~]$ mysql -uroot -p'db1x@NJ+1' -S ~/bin/mysql1.sock test mysql: [Warning] Using a password on the command line interface can be insecure. Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 16 Server version: 8.0.36 Source distribution Copyright (c) 2000, 2024, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> mysql> mysql> mysql> CREATE TABLE `ecattdat1` ( `sd_key` char(3) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '', `ecatt_key` char(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '', `ec_cer_no` varchar(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '', `ec_full_name` char(120) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '', PRIMARY KEY (`ecatt_key`), KEY `idx_becdaty5` (`ec_full_name`,`ecatt_key`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; Query OK, 0 rows affected (0.01 sec) mysql> insert into ecattdat1 values ('817', '8170A100000000213850', ' ', '控制\t'); Query OK, 1 row affected (0.00 sec) # The inserted row is not matched mysql> select *from ecattdat1 where ec_full_name like '控制%'; Empty set (0.00 sec) # The inserted row is not matched when force index (primary) mysql> select *from ecattdat1 force index (primary) where ec_full_name like '控制%'; +--------+----------------------+-----------+--------------+ | sd_key | ecatt_key | ec_cer_no | ec_full_name | +--------+----------------------+-----------+--------------+ | 817 | 8170A100000000213850 | | 控制 | +--------+----------------------+-----------+--------------+ 1 row in set (0.00 sec) mysql>