MySQL Bugs: #116584: Query produces different resultset with force index

Bug #116584	Query produces different resultset with force index
Submitted:	7 Nov 2024 11:58	Modified:	7 Nov 2024 14:01
Reporter:	Brian Yue (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Charsets	Severity:	S2 (Serious)
Version:	8.0.36, 8.0.40	OS:	Any (Redhat 7.4)
Assigned to:		CPU Architecture:	Any (Intel)
Tags:	charset, like operator, Result Set

Description:
hello,
  Recently we find a case that, a query with like operator cannot match expected rows when ICP is used. But if `force index (PRIMARY)` is specified in the statement, the expected row is matched, which is unreasonable.

How to repeat:
[yxxdb_8036@localhost ~]$ mysql -uroot -p'db1x@NJ+1' -S ~/bin/mysql1.sock test
mysql: [Warning] Using a password on the command line interface can be insecure.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 16
Server version: 8.0.36 Source distribution

Copyright (c) 2000, 2024, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>
mysql>
mysql>
mysql> CREATE TABLE `ecattdat1` (   `sd_key` char(3) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '',   `ecatt_key` char(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '',   `ec_cer_no` varchar(60) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '',   `ec_full_name` char(120) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '',   PRIMARY KEY (`ecatt_key`),   KEY `idx_becdaty5` (`ec_full_name`,`ecatt_key`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into ecattdat1 values ('817', '8170A100000000213850', '    ', '控制\t');
Query OK, 1 row affected (0.00 sec)

# The inserted row is not matched
mysql> select *from ecattdat1 where ec_full_name like '控制%';
Empty set (0.00 sec)

# The inserted row is not matched when force index (primary)
mysql> select *from ecattdat1 force index (primary) where ec_full_name like '控制%';
+--------+----------------------+-----------+--------------+
| sd_key | ecatt_key            | ec_cer_no | ec_full_name |
+--------+----------------------+-----------+--------------+
| 817    | 8170A100000000213850 |           | 控制              |
+--------+----------------------+-----------+--------------+
1 row in set (0.00 sec)

mysql>

pls ignore my previous comment. Thank you.

Hello Brian Yue,

Thank you for the report and test case.
Verified as described.

regards,
Umesh

-- 8.0.40

[umshastr@ellex07]~/binaries/mysql-8.0.40: bin/mysql -uroot -S /tmp/mysql.sock test -e "insert into ecattdat1 values ('817', '8170A100000000213850', '    ', '控制\t');"
[umshastr@ellex07]~/binaries/mysql-8.0.40: bin/mysql -uroot -S /tmp/mysql.sock test -e "select *from ecattdat1 where ec_full_name like '控制%';"
[umshastr@ellex07]~/binaries/mysql-8.0.40: bin/mysql -uroot -S /tmp/mysql.sock test -e "select *from ecattdat1 force index (primary) where ec_full_name like '控制%';"
+--------+----------------------+-----------+--------------+
| sd_key | ecatt_key            | ec_cer_no | ec_full_name |
+--------+----------------------+-----------+--------------+
| 817    | 8170A100000000213850 |           | 控制	        |
+--------+----------------------+-----------+--------------+
[umshastr@ellex07]~/binaries/mysql-8.0.40:

Posted by developer:
 
A simplified variant gives a slightly different, but albeit wrong, behavior:

mysql> CREATE TABLE t1 (pk integer, v varchar(10), PRIMARY KEY (pk),  KEY idx (v, pk) ) COLLATE=utf8mb4_bin;
Query OK, 0 rows affected (0.03 sec)

mysql> insert into t1 values (1, '控制\t');
Query OK, 1 row affected (0.00 sec)

mysql> select * from t1 where v like '控制%';
+----+---------+
| pk | v       |
+----+---------+
|  1 | 控制	   |
+----+---------+
1 row in set (0.00 sec)

mysql> insert into t1 values (2, '控制X');
Query OK, 1 row affected (0.01 sec)

mysql> select * from t1 where v like '控制%';
+----+---------+
| pk | v       |
+----+---------+
|  2 | 控制X   |
+----+---------+
1 row in set (0.00 sec)

mysql> select * from t1 force index(primary) where v like '控制%';
+----+---------+
| pk | v       |
+----+---------+
|  1 | 控制	   |
|  2 | 控制X   |
+----+---------+
2 rows in set (0.00 sec)

I can reproduce this bug with a more simplified case:

--------------------
mysql> create table t1 ( c1 varchar(16), key idx (c1) ) collate=utf8mb4_bin;
Query OK, 0 rows affected (0.06 sec)

mysql> insert into t1 values ('000'), ('111'), ('abc\n');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from t1 where c1 like 'abc%';
Empty set (0.00 sec)
--------------------

The 'abc\n' record was not matched.

Ignoring the index, 'abc\n' can be matched.

--------------------
mysql> select * from t1 ignore index(idx) where c1 like 'abc%';
+------+
| c1   |
+------+
| abc
 |
+------+
1 row in set (0.00 sec)
--------------------

From the explain infomation, a range query was performed then the server missed the record.

--------------------
mysql> explain select * from t1 where c1 like 'abc%';
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | t1    | NULL       | range | idx           | idx  | 67      | NULL |    1 |   100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
--------------------

A strange but interesting fact is that 'abc\n' is actually less than 'abc' with collation utf8mb4_bin.

--------------------
mysql> select 'abc\n' < 'abc' collate utf8mb4_bin;
+-------------------------------------+
| 'abc\n' < 'abc' collate utf8mb4_bin |
+-------------------------------------+
|                                   1 |
+-------------------------------------+
1 row in set (0.00 sec)
--------------------

It could be a space padding issue, and this bug is not only affecting collation utf8mb4_bin, but also latin1_bin and other 'PAD SPACE' bin collations.

A fix with testcase is uploaded at https://github.com/mysql/mysql-server/pull/576 .

The commit 0b7ea5204596eae95f20336991eaead8c0a7bf78 is suspicious.

The space padding is added at strings/ctype-mb.cc:my_like_range_mb() for collations that have MY_CS_BINSORT or no NO_PAD feature.

For NO_PAD collations, the padded spaces will not be counted in during comparison as they are dropped at storage/innobase/rem/rem0cmp.cc:innobase_mysql_cmp().

But for MY_CS_BINSORT collations, the padded spaces are left and cause wrong comparison with '\n' or '\r' whose ordinal is lower than ' '(0x20).

My suggestion is removing the (cs->state & MY_CS_BINSORT) condition as it is unnecessary and may cause bugs. Just pad it with min_sort_char (usually \0 for MY_CS_BINSORT).

Contribution submitted via Github - Bug #116584: Query produces different resultset with force index (*) Contribution by Mike Wang (Github MikeWang000000, mysql-server/pull/576): https://bugs.mysql.com/bug.php?id=116584 I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: git_patch_2171319168.txt (text/plain), 2.25 KiB.