Bug #116584 | Query produces different resultset with force index | ||
---|---|---|---|
Submitted: | 7 Nov 2024 11:58 | Modified: | 7 Nov 2024 14:01 |
Reporter: | Brian Yue (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S2 (Serious) |
Version: | 8.0.36, 8.0.40 | OS: | Any (Redhat 7.4) |
Assigned to: | CPU Architecture: | Any (Intel) | |
Tags: | charset, like operator, Result Set |
[7 Nov 2024 11:58]
Brian Yue
[7 Nov 2024 12:46]
MySQL Verification Team
pls ignore my previous comment. Thank you.
[7 Nov 2024 14:01]
MySQL Verification Team
Hello Brian Yue, Thank you for the report and test case. Verified as described. regards, Umesh
[7 Nov 2024 14:01]
MySQL Verification Team
-- 8.0.40 [umshastr@ellex07]~/binaries/mysql-8.0.40: bin/mysql -uroot -S /tmp/mysql.sock test -e "insert into ecattdat1 values ('817', '8170A100000000213850', ' ', '控制\t');" [umshastr@ellex07]~/binaries/mysql-8.0.40: bin/mysql -uroot -S /tmp/mysql.sock test -e "select *from ecattdat1 where ec_full_name like '控制%';" [umshastr@ellex07]~/binaries/mysql-8.0.40: bin/mysql -uroot -S /tmp/mysql.sock test -e "select *from ecattdat1 force index (primary) where ec_full_name like '控制%';" +--------+----------------------+-----------+--------------+ | sd_key | ecatt_key | ec_cer_no | ec_full_name | +--------+----------------------+-----------+--------------+ | 817 | 8170A100000000213850 | | 控制 | +--------+----------------------+-----------+--------------+ [umshastr@ellex07]~/binaries/mysql-8.0.40:
[8 Nov 2024 14:02]
Bernt Marius Johnsen
Posted by developer: A simplified variant gives a slightly different, but albeit wrong, behavior: mysql> CREATE TABLE t1 (pk integer, v varchar(10), PRIMARY KEY (pk), KEY idx (v, pk) ) COLLATE=utf8mb4_bin; Query OK, 0 rows affected (0.03 sec) mysql> insert into t1 values (1, '控制\t'); Query OK, 1 row affected (0.00 sec) mysql> select * from t1 where v like '控制%'; +----+---------+ | pk | v | +----+---------+ | 1 | 控制 | +----+---------+ 1 row in set (0.00 sec) mysql> insert into t1 values (2, '控制X'); Query OK, 1 row affected (0.01 sec) mysql> select * from t1 where v like '控制%'; +----+---------+ | pk | v | +----+---------+ | 2 | 控制X | +----+---------+ 1 row in set (0.00 sec) mysql> select * from t1 force index(primary) where v like '控制%'; +----+---------+ | pk | v | +----+---------+ | 1 | 控制 | | 2 | 控制X | +----+---------+ 2 rows in set (0.00 sec)
[9 Nov 2024 15:37]
Mike Wang
I can reproduce this bug with a more simplified case: -------------------- mysql> create table t1 ( c1 varchar(16), key idx (c1) ) collate=utf8mb4_bin; Query OK, 0 rows affected (0.06 sec) mysql> insert into t1 values ('000'), ('111'), ('abc\n'); Query OK, 3 rows affected (0.01 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> select * from t1 where c1 like 'abc%'; Empty set (0.00 sec) -------------------- The 'abc\n' record was not matched. Ignoring the index, 'abc\n' can be matched. -------------------- mysql> select * from t1 ignore index(idx) where c1 like 'abc%'; +------+ | c1 | +------+ | abc | +------+ 1 row in set (0.00 sec) -------------------- From the explain infomation, a range query was performed then the server missed the record. -------------------- mysql> explain select * from t1 where c1 like 'abc%'; +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+ | 1 | SIMPLE | t1 | NULL | range | idx | idx | 67 | NULL | 1 | 100.00 | Using where; Using index | +----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+ 1 row in set, 1 warning (0.00 sec) -------------------- A strange but interesting fact is that 'abc\n' is actually less than 'abc' with collation utf8mb4_bin. -------------------- mysql> select 'abc\n' < 'abc' collate utf8mb4_bin; +-------------------------------------+ | 'abc\n' < 'abc' collate utf8mb4_bin | +-------------------------------------+ | 1 | +-------------------------------------+ 1 row in set (0.00 sec) -------------------- It could be a space padding issue, and this bug is not only affecting collation utf8mb4_bin, but also latin1_bin and other 'PAD SPACE' bin collations.
[10 Nov 2024 6:46]
Mike Wang
A fix with testcase is uploaded at https://github.com/mysql/mysql-server/pull/576 . The commit 0b7ea5204596eae95f20336991eaead8c0a7bf78 is suspicious. The space padding is added at strings/ctype-mb.cc:my_like_range_mb() for collations that have MY_CS_BINSORT or no NO_PAD feature. For NO_PAD collations, the padded spaces will not be counted in during comparison as they are dropped at storage/innobase/rem/rem0cmp.cc:innobase_mysql_cmp(). But for MY_CS_BINSORT collations, the padded spaces are left and cause wrong comparison with '\n' or '\r' whose ordinal is lower than ' '(0x20). My suggestion is removing the (cs->state & MY_CS_BINSORT) condition as it is unnecessary and may cause bugs. Just pad it with min_sort_char (usually \0 for MY_CS_BINSORT).
[18 Nov 2024 15:34]
OCA Admin
Contribution submitted via Github - Bug #116584: Query produces different resultset with force index (*) Contribution by Mike Wang (Github MikeWang000000, mysql-server/pull/576): https://bugs.mysql.com/bug.php?id=116584 I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: git_patch_2171319168.txt (text/plain), 2.25 KiB.