| Bug #118510 | InnoDB Match Against does not work properly with utf8mb4_0900_as_ci | ||
|---|---|---|---|
| Submitted: | 23 Jun 20:09 | Modified: | 1 Jul 12:42 |
| Reporter: | Rudolf Run-Oracle | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server: FULLTEXT search | Severity: | S2 (Serious) |
| Version: | 8.0.39 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | fulltext, innodb, utf8mb4, utf8mb4_0900_as_ci | ||
[1 Jul 11:51]
MySQL Verification Team
mysql [localhost:8042] {msandbox} (test) > CREATE TABLE `FT_TestTable` (
-> `ID` int NOT NULL AUTO_INCREMENT,
-> `LastName` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci DEFAULT NULL,
-> PRIMARY KEY (`ID`),
-> FULLTEXT KEY `FT_TestTable_LastName_IDX` (`LastName`)
-> ) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_as_ci;
Query OK, 0 rows affected (0.45 sec)
mysql [localhost:8042] {msandbox} (test) > INSERT INTO FT_TestTable (LastName) VALUES ('König'), ('Kónig'), ('Königs'), ('Königsberger'), ('John'), ('Jóhn'), ('Johnson'), ('Jóhnson')
-> ;
Query OK, 8 rows affected (0.01 sec)
Records: 8 Duplicates: 0 Warnings: 0
mysql [localhost:8042] {msandbox} (test) > SELECT
-> IF(MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE) > 0, 1, 0) as 'MATCH_LastName',
-> IF(u.lastname LIKE 'John%' > 0, 1, 0) as 'LIKE_lastname',
-> u.lastname
-> FROM FT_TestTable u WHERE
-> MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE)
-> OR
-> u.lastname LIKE 'John%';
+----------------+---------------+----------+
| MATCH_LastName | LIKE_lastname | lastname |
+----------------+---------------+----------+
| 1 | 1 | John |
| 0 | 1 | Johnson |
+----------------+---------------+----------+
2 rows in set (0.01 sec)
mysql [localhost:8042] {msandbox} (test) > select @@version
-> ;
+-----------+
| @@version |
+-----------+
| 8.0.42 |
+-----------+
1 row in set (0.00 sec)
mysql [localhost:8042] {msandbox} (test) > drop table FT_TestTable;
Query OK, 0 rows affected (0.11 sec)
mysql [localhost:8042] {msandbox} (test) > CREATE TABLE `FT_TestTable` (
-> `ID` int NOT NULL AUTO_INCREMENT,
-> `LastName` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
-> PRIMARY KEY (`ID`),
-> FULLTEXT KEY `FT_TestTable_LastName_IDX` (`LastName`)
-> ) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Query OK, 0 rows affected (0.46 sec)
mysql [localhost:8042] {msandbox} (test) > INSERT INTO FT_TestTable (LastName) VALUES ('König'), ('Kónig'), ('Königs'), ('Königsberger'), ('John'), ('Jóhn'), ('Johnson'), ('Jóhnson')
-> ;
Query OK, 8 rows affected (0.01 sec)
Records: 8 Duplicates: 0 Warnings: 0
mysql [localhost:8042] {msandbox} (test) > SELECT
u.lastname
FROM FT_TestTable u WHERE
-> IF(MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE) > 0, 1, 0) as 'MATCH_LastName',
-> IF(u.lastname LIKE 'John%' > 0, 1, 0) as 'LIKE_lastname',
-> u.lastname
-> FROM FT_TestTable u WHERE
-> MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE)
-> OR
-> u.lastname LIKE 'John%';
+----------------+---------------+----------+
| MATCH_LastName | LIKE_lastname | lastname |
+----------------+---------------+----------+
| 1 | 1 | John |
| 1 | 1 | Jóhn |
| 1 | 1 | Johnson |
| 1 | 1 | Jóhnson |
+----------------+---------------+----------+
4 rows in set (0.00 sec)
mysql [localhost:8042] {msandbox} (test) >
[1 Jul 11:52]
MySQL Verification Team
Please upgrade. Thank you for using MySQL Server.
[1 Jul 12:20]
MySQL Verification Team
Server version: 9.3.0 MySQL Community Server - GPL
Copyright (c) 2000, 2025, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql [localhost:9300] {msandbox} ((none)) > use test
Database changed
mysql [localhost:9300] {msandbox} (test) > CREATE TABLE `FT_TestTable` (
-> `ID` int NOT NULL AUTO_INCREMENT,
-> `LastName` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
-> PRIMARY KEY (`ID`),
-> FULLTEXT KEY `FT_TestTable_LastName_IDX` (`LastName`)
-> ) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Query OK, 0 rows affected (0.853 sec)
mysql [localhost:9300] {msandbox} (test) > INSERT INTO FT_TestTable (LastName) VALUES ('König'), ('Kónig'), ('Königs'), ('Königsberger'), ('John'), ('Jóhn'), ('Johnson'), ('Jóhnson')
-> ;
Query OK, 8 rows affected (0.007 sec)
Records: 8 Duplicates: 0 Warnings: 0
mysql [localhost:9300] {msandbox} (test) > SELECT
FROM FT_TestTable u WHERE
-> IF(MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE) > 0, 1, 0) as 'MATCH_LastName',
-> IF(u.lastname LIKE 'John%' > 0, 1, 0) as 'LIKE_lastname',
-> u.lastname
-> FROM FT_TestTable u WHERE
-> MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE)
-> OR
-> u.lastname LIKE 'John%';
+----------------+---------------+----------+
| MATCH_LastName | LIKE_lastname | lastname |
+----------------+---------------+----------+
| 1 | 1 | John |
| 1 | 1 | Jóhn |
| 1 | 1 | Johnson |
| 1 | 1 | Jóhnson |
+----------------+---------------+----------+
4 rows in set (0.001 sec)
mysql [localhost:9300] {msandbox} (test) >
[1 Jul 12:42]
MySQL Verification Team
There is def. something not ideal when accent sensitive collation is used. With accent insensitive collation everything works ok but with accent sensitive I have some weird results. I will verify the bug. I suggest you upgrade non the less and see if accent insensitive collation works for you as a workaround. Thank you for the report
[1 Jul 12:43]
MySQL Verification Team
mysql [localhost:8042] {msandbox} (test) > CREATE TABLE `FT_TestTable` (
-> `ID` int NOT NULL AUTO_INCREMENT,
-> `LastName` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci DEFAULT NULL,
-> PRIMARY KEY (`ID`),
-> FULLTEXT KEY `FT_TestTable_LastName_IDX` (`LastName`)
-> ) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_as_ci;
Query OK, 0 rows affected (0.44 sec)
mysql [localhost:8042] {msandbox} (test) > INSERT INTO FT_TestTable (LastName) VALUES ('König'), ('Kónig'), ('Königs'), ('Königsberger'), ('John'), ('Jóhn'), ('Johnson'), ('Jóhnson');
Query OK, 8 rows affected (0.02 sec)
Records: 8 Duplicates: 0 Warnings: 0
mysql [localhost:8042] {msandbox} (test) > select MATCH(LastName) AGAINST('Joh' IN BOOLEAN MODE) FROM FT_TestTable;
+------------------------------------------------+
| MATCH(LastName) AGAINST('Joh' IN BOOLEAN MODE) |
+------------------------------------------------+
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
| 0 |
+------------------------------------------------+
8 rows in set (0.01 sec)
mysql [localhost:8042] {msandbox} (test) > select MATCH(LastName) AGAINST('Joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+-------------------------------------------------+
| MATCH(LastName) AGAINST('Joh*' IN BOOLEAN MODE) |
+-------------------------------------------------+
| 0 |
| 0 |
| 0 |
| 0 |
| 0.8155715465545654 |
| 0 |
| 0 |
| 0 |
+-------------------------------------------------+
8 rows in set (0.00 sec)
mysql [localhost:8042] {msandbox} (test) > select LastName, MATCH(LastName) AGAINST('Joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+---------------+-------------------------------------------------+
| LastName | MATCH(LastName) AGAINST('Joh*' IN BOOLEAN MODE) |
+---------------+-------------------------------------------------+
| König | 0 |
| Kónig | 0 |
| Königs | 0 |
| Königsberger | 0 |
| John | 0.8155715465545654 |
| Jóhn | 0 |
| Johnson | 0 |
| Jóhnson | 0 |
+---------------+-------------------------------------------------+
8 rows in set (0.00 sec)
mysql [localhost:8042] {msandbox} (test) > select LastName, MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+---------------+--------------------------------------------------+
| LastName | MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) |
+---------------+--------------------------------------------------+
| König | 0 |
| Kónig | 0 |
| Königs | 0 |
| Königsberger | 0 |
| John | 0.8155715465545654 |
| Jóhn | 0 |
| Johnson | 0 |
| Jóhnson | 0 |
+---------------+--------------------------------------------------+
8 rows in set (0.00 sec)
[1 Jul 12:43]
MySQL Verification Team
mysql [localhost:8042] {msandbox} (test) > INSERT INTO FT_TestTable (LastName) VALUES ('Johxxx'), ('johan'), ('johhhhhhh');
Query OK, 3 rows affected (0.03 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql [localhost:8042] {msandbox} (test) > select LastName, MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+---------------+--------------------------------------------------+
| LastName | MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) |
+---------------+--------------------------------------------------+
| König | 0 |
| Kónig | 0 |
| Königs | 0 |
| Königsberger | 0 |
| John | 0.3184022605419159 |
| Jóhn | 0 |
| Johnson | 0 |
| Jóhnson | 0 |
| Johxxx | 0 |
| johan | 0.3184022605419159 |
| johhhhhhh | 0.3184022605419159 |
+---------------+--------------------------------------------------+
11 rows in set (0.00 sec)
mysql [localhost:8042] {msandbox} (test) > INSERT INTO FT_TestTable (LastName) VALUES ('Johxxx'), ('Johan'), ('Johhhhhhh');
Query OK, 3 rows affected (0.01 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql [localhost:8042] {msandbox} (test) > select LastName, MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+---------------+--------------------------------------------------+
| LastName | MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) |
+---------------+--------------------------------------------------+
| König | 0 |
| Kónig | 0 |
| Königs | 0 |
| Königsberger | 0 |
| John | 0.19995030760765076 |
| Jóhn | 0 |
| Johnson | 0 |
| Jóhnson | 0 |
| Johxxx | 0 |
| johan | 0.19995030760765076 |
| johhhhhhh | 0.19995030760765076 |
| Johxxx | 0 |
| Johan | 0.19995030760765076 |
| Johhhhhhh | 0.19995030760765076 |
+---------------+--------------------------------------------------+
14 rows in set (0.00 sec)
mysql [localhost:8042] {msandbox} (test) > INSERT INTO FT_TestTable (LastName) VALUES ('Johnson'), ('Johnson');
Query OK, 2 rows affected (0.14 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql [localhost:8042] {msandbox} (test) > select LastName, MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+---------------+--------------------------------------------------+
| LastName | MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) |
+---------------+--------------------------------------------------+
| König | 0 |
| Kónig | 0 |
| Königs | 0 |
| Königsberger | 0 |
| John | 0.25517651438713074 |
| Jóhn | 0 |
| Johnson | 0 |
| Jóhnson | 0 |
| Johxxx | 0 |
| johan | 0.25517651438713074 |
| johhhhhhh | 0.25517651438713074 |
| Johxxx | 0 |
| Johan | 0.25517651438713074 |
| Johhhhhhh | 0.25517651438713074 |
| Johnson | 0 |
| Johnson | 0 |
+---------------+--------------------------------------------------+
16 rows in set (0.05 sec)
mysql [localhost:8042] {msandbox} (test) > select LastName, MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+---------------+--------------------------------------------------+
| LastName | MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) |
+---------------+--------------------------------------------------+
| König | 0 |
| Kónig | 0 |
| Königs | 0 |
| Königsberger | 0 |
| John | 0.25517651438713074 |
| Jóhn | 0 |
| Johnson | 0 |
| Jóhnson | 0 |
| Johxxx | 0 |
| johan | 0.25517651438713074 |
| johhhhhhh | 0.25517651438713074 |
| Johxxx | 0 |
| Johan | 0.25517651438713074 |
| Johhhhhhh | 0.25517651438713074 |
| Johnson | 0 |
| Johnson | 0 |
+---------------+--------------------------------------------------+
16 rows in set (0.18 sec)
mysql [localhost:8042] {msandbox} (test) > select LastName, MATCH(LastName) AGAINST('+joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+---------------+--------------------------------------------------+
| LastName | MATCH(LastName) AGAINST('+joh*' IN BOOLEAN MODE) |
+---------------+--------------------------------------------------+
| König | 0 |
| Kónig | 0 |
| Königs | 0 |
| Königsberger | 0 |
| John | 0.25517651438713074 |
| Jóhn | 0 |
| Johnson | 0 |
| Jóhnson | 0 |
| Johxxx | 0 |
| johan | 0.25517651438713074 |
| johhhhhhh | 0.25517651438713074 |
| Johxxx | 0 |
| Johan | 0.25517651438713074 |
| Johhhhhhh | 0.25517651438713074 |
| Johnson | 0 |
| Johnson | 0 |
+---------------+--------------------------------------------------+
16 rows in set (0.01 sec)
mysql [localhost:8042] {msandbox} (test) >

Description: In some cases when using utf8mb4_0900_as_ci, the match against fulltext search does not return correct results. In the 'How to repeat' section are two name groups 'John' and 'König'. The first time I identified this problem in our DB with 500k+ entries, I was searching with match-against for '+König*' and the DB did not return names like 'Königsberger' - I only got exact hits despite the asterisk. How to repeat: 1. Create Test table CREATE TABLE `FT_TestTable` ( `ID` int NOT NULL AUTO_INCREMENT, `LastName` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci DEFAULT NULL, PRIMARY KEY (`ID`), FULLTEXT KEY `FT_TestTable_LastName_IDX` (`LastName`) ) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_as_ci; 2. Inser test data INSERT INTO FT_TestTable (LastName) VALUES ('König'), ('Kónig'), ('Königs'), ('Königsberger'), ('John'), ('Jóhn'), ('Johnson'), ('Jóhnson') 3. Query Data (Match+Like in comparison) SELECT IF(MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE) > 0, 1, 0) as 'MATCH_LastName', IF(u.lastname LIKE 'John%' > 0, 1, 0) as 'LIKE_lastname', u.lastname FROM FT_TestTable u WHERE MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE) OR u.lastname LIKE 'John%' 4. Compare Results: - Match ('+John*') does find 'John' but not 'Johnson' - Like (LIKE 'John%') will find both - Result in CSV structure: MATCH_LastName, LIKE_lastname, lastname 1, 1, John 0, 1, Johnson Suggested fix: Make it work :)