Bug #118510 InnoDB Match Against does not work properly with utf8mb4_0900_as_ci
Submitted: 23 Jun 20:09 Modified: 1 Jul 12:42
Reporter: Rudolf Run-Oracle Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S2 (Serious)
Version:8.0.39 OS:Any
Assigned to: CPU Architecture:Any
Tags: fulltext, innodb, utf8mb4, utf8mb4_0900_as_ci

[23 Jun 20:09] Rudolf Run-Oracle
Description:
In some cases when using utf8mb4_0900_as_ci, the match against fulltext search does not return correct results.

In the 'How to repeat' section are two name groups 'John' and 'König'.
The first time I identified this problem in our DB with 500k+ entries, I was searching with match-against for '+König*' and the DB did not return names like 'Königsberger' - I only got exact hits despite the asterisk.

How to repeat:
1. Create Test table
	CREATE TABLE `FT_TestTable` (
	  `ID` int NOT NULL AUTO_INCREMENT,
	  `LastName` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci DEFAULT NULL,
	  PRIMARY KEY (`ID`),
	  FULLTEXT KEY `FT_TestTable_LastName_IDX` (`LastName`)
	) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_as_ci;

2. Inser test data
	INSERT INTO FT_TestTable (LastName) VALUES ('König'), ('Kónig'), ('Königs'), ('Königsberger'), ('John'), ('Jóhn'), ('Johnson'), ('Jóhnson')

3. Query Data (Match+Like in comparison)
	SELECT 
	IF(MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE) > 0, 1, 0)  as 'MATCH_LastName',
	IF(u.lastname LIKE 'John%' > 0, 1, 0)  as 'LIKE_lastname',
	u.lastname
	FROM FT_TestTable u WHERE 
	MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE)
	OR
	u.lastname LIKE 'John%'

4. Compare Results:
	- Match ('+John*') does find 'John' but not 'Johnson'
	- Like (LIKE 'John%') will find both
	- Result in CSV structure:
		MATCH_LastName, LIKE_lastname, lastname
		1, 1, John
		0, 1, Johnson

Suggested fix:
Make it work :)
[1 Jul 11:51] MySQL Verification Team
mysql [localhost:8042] {msandbox} (test) > CREATE TABLE `FT_TestTable` (
    ->   `ID` int NOT NULL AUTO_INCREMENT,
    ->   `LastName` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci DEFAULT NULL,
    ->   PRIMARY KEY (`ID`),
    ->   FULLTEXT KEY `FT_TestTable_LastName_IDX` (`LastName`)
    -> ) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_as_ci;
Query OK, 0 rows affected (0.45 sec)

mysql [localhost:8042] {msandbox} (test) > INSERT INTO FT_TestTable (LastName) VALUES ('König'), ('Kónig'), ('Königs'), ('Königsberger'), ('John'), ('Jóhn'), ('Johnson'), ('Jóhnson')
    -> ;
Query OK, 8 rows affected (0.01 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql [localhost:8042] {msandbox} (test) > SELECT
    -> IF(MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE) > 0, 1, 0)  as 'MATCH_LastName',
    -> IF(u.lastname LIKE 'John%' > 0, 1, 0)  as 'LIKE_lastname',
    -> u.lastname
    -> FROM FT_TestTable u WHERE
    -> MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE)
    -> OR
    -> u.lastname LIKE 'John%';
+----------------+---------------+----------+
| MATCH_LastName | LIKE_lastname | lastname |
+----------------+---------------+----------+
|              1 |             1 | John     |
|              0 |             1 | Johnson  |
+----------------+---------------+----------+
2 rows in set (0.01 sec)

mysql [localhost:8042] {msandbox} (test) > select @@version
    -> ;
+-----------+
| @@version |
+-----------+
| 8.0.42    |
+-----------+
1 row in set (0.00 sec)

mysql [localhost:8042] {msandbox} (test) > drop table FT_TestTable;
Query OK, 0 rows affected (0.11 sec)

mysql [localhost:8042] {msandbox} (test) > CREATE TABLE `FT_TestTable` (
    ->   `ID` int NOT NULL AUTO_INCREMENT,
    ->   `LastName` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
    ->   PRIMARY KEY (`ID`),
    ->   FULLTEXT KEY `FT_TestTable_LastName_IDX` (`LastName`)
    -> ) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Query OK, 0 rows affected (0.46 sec)

mysql [localhost:8042] {msandbox} (test) > INSERT INTO FT_TestTable (LastName) VALUES ('König'), ('Kónig'), ('Königs'), ('Königsberger'), ('John'), ('Jóhn'), ('Johnson'), ('Jóhnson')
    -> ;
Query OK, 8 rows affected (0.01 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql [localhost:8042] {msandbox} (test) > SELECT
        u.lastname
        FROM FT_TestTable u WHERE
    -> IF(MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE) > 0, 1, 0)  as 'MATCH_LastName',
    -> IF(u.lastname LIKE 'John%' > 0, 1, 0)  as 'LIKE_lastname',
    -> u.lastname
    -> FROM FT_TestTable u WHERE
    -> MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE)
    -> OR
    -> u.lastname LIKE 'John%';
+----------------+---------------+----------+
| MATCH_LastName | LIKE_lastname | lastname |
+----------------+---------------+----------+
|              1 |             1 | John     |
|              1 |             1 | Jóhn     |
|              1 |             1 | Johnson  |
|              1 |             1 | Jóhnson  |
+----------------+---------------+----------+
4 rows in set (0.00 sec)

mysql [localhost:8042] {msandbox} (test) >
[1 Jul 11:52] MySQL Verification Team
Please upgrade.

Thank you for using MySQL Server.
[1 Jul 12:20] MySQL Verification Team
Server version: 9.3.0 MySQL Community Server - GPL

Copyright (c) 2000, 2025, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql [localhost:9300] {msandbox} ((none)) > use test
Database changed
mysql [localhost:9300] {msandbox} (test) > CREATE TABLE `FT_TestTable` (
    ->   `ID` int NOT NULL AUTO_INCREMENT,
    ->   `LastName` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
    ->   PRIMARY KEY (`ID`),
    ->   FULLTEXT KEY `FT_TestTable_LastName_IDX` (`LastName`)
    -> ) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Query OK, 0 rows affected (0.853 sec)

mysql [localhost:9300] {msandbox} (test) > INSERT INTO FT_TestTable (LastName) VALUES ('König'), ('Kónig'), ('Königs'), ('Königsberger'), ('John'), ('Jóhn'), ('Johnson'), ('Jóhnson')
    -> ;
Query OK, 8 rows affected (0.007 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql [localhost:9300] {msandbox} (test) > SELECT
        FROM FT_TestTable u WHERE
    -> IF(MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE) > 0, 1, 0)  as 'MATCH_LastName',
    -> IF(u.lastname LIKE 'John%' > 0, 1, 0)  as 'LIKE_lastname',
    -> u.lastname
    -> FROM FT_TestTable u WHERE
    -> MATCH(u.lastname) AGAINST ('+John*' IN BOOLEAN MODE)
    -> OR
    -> u.lastname LIKE 'John%';
+----------------+---------------+----------+
| MATCH_LastName | LIKE_lastname | lastname |
+----------------+---------------+----------+
|              1 |             1 | John     |
|              1 |             1 | Jóhn     |
|              1 |             1 | Johnson  |
|              1 |             1 | Jóhnson  |
+----------------+---------------+----------+
4 rows in set (0.001 sec)

mysql [localhost:9300] {msandbox} (test) >
[1 Jul 12:42] MySQL Verification Team
There is def. something not ideal when accent sensitive collation is used. With accent insensitive collation everything works ok but with accent sensitive I have some weird results. I will verify the bug. I suggest you upgrade non the less and see if accent insensitive collation works for you as a workaround.

Thank you for the report
[1 Jul 12:43] MySQL Verification Team
mysql [localhost:8042] {msandbox} (test) > CREATE TABLE `FT_TestTable` (
    ->   `ID` int NOT NULL AUTO_INCREMENT,
    ->   `LastName` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci DEFAULT NULL,
    ->   PRIMARY KEY (`ID`),
    ->   FULLTEXT KEY `FT_TestTable_LastName_IDX` (`LastName`)
    -> ) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_as_ci;
Query OK, 0 rows affected (0.44 sec)

mysql [localhost:8042] {msandbox} (test) > INSERT INTO FT_TestTable (LastName) VALUES ('König'), ('Kónig'), ('Königs'), ('Königsberger'), ('John'), ('Jóhn'), ('Johnson'), ('Jóhnson');
Query OK, 8 rows affected (0.02 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql [localhost:8042] {msandbox} (test) > select MATCH(LastName) AGAINST('Joh' IN BOOLEAN MODE) FROM FT_TestTable;
+------------------------------------------------+
| MATCH(LastName) AGAINST('Joh' IN BOOLEAN MODE) |
+------------------------------------------------+
|                                              0 |
|                                              0 |
|                                              0 |
|                                              0 |
|                                              0 |
|                                              0 |
|                                              0 |
|                                              0 |
+------------------------------------------------+
8 rows in set (0.01 sec)

mysql [localhost:8042] {msandbox} (test) > select MATCH(LastName) AGAINST('Joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+-------------------------------------------------+
| MATCH(LastName) AGAINST('Joh*' IN BOOLEAN MODE) |
+-------------------------------------------------+
|                                               0 |
|                                               0 |
|                                               0 |
|                                               0 |
|                              0.8155715465545654 |
|                                               0 |
|                                               0 |
|                                               0 |
+-------------------------------------------------+
8 rows in set (0.00 sec)

mysql [localhost:8042] {msandbox} (test) > select LastName, MATCH(LastName) AGAINST('Joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+---------------+-------------------------------------------------+
| LastName      | MATCH(LastName) AGAINST('Joh*' IN BOOLEAN MODE) |
+---------------+-------------------------------------------------+
| König         |                                               0 |
| Kónig         |                                               0 |
| Königs        |                                               0 |
| Königsberger  |                                               0 |
| John          |                              0.8155715465545654 |
| Jóhn          |                                               0 |
| Johnson       |                                               0 |
| Jóhnson       |                                               0 |
+---------------+-------------------------------------------------+
8 rows in set (0.00 sec)

mysql [localhost:8042] {msandbox} (test) > select LastName, MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+---------------+--------------------------------------------------+
| LastName      | MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) |
+---------------+--------------------------------------------------+
| König         |                                                0 |
| Kónig         |                                                0 |
| Königs        |                                                0 |
| Königsberger  |                                                0 |
| John          |                               0.8155715465545654 |
| Jóhn          |                                                0 |
| Johnson       |                                                0 |
| Jóhnson       |                                                0 |
+---------------+--------------------------------------------------+
8 rows in set (0.00 sec)
[1 Jul 12:43] MySQL Verification Team
mysql [localhost:8042] {msandbox} (test) > INSERT INTO FT_TestTable (LastName) VALUES ('Johxxx'), ('johan'), ('johhhhhhh');
Query OK, 3 rows affected (0.03 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql [localhost:8042] {msandbox} (test) > select LastName, MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+---------------+--------------------------------------------------+
| LastName      | MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) |
+---------------+--------------------------------------------------+
| König         |                                                0 |
| Kónig         |                                                0 |
| Königs        |                                                0 |
| Königsberger  |                                                0 |
| John          |                               0.3184022605419159 |
| Jóhn          |                                                0 |
| Johnson       |                                                0 |
| Jóhnson       |                                                0 |
| Johxxx        |                                                0 |
| johan         |                               0.3184022605419159 |
| johhhhhhh     |                               0.3184022605419159 |
+---------------+--------------------------------------------------+
11 rows in set (0.00 sec)

mysql [localhost:8042] {msandbox} (test) > INSERT INTO FT_TestTable (LastName) VALUES ('Johxxx'), ('Johan'), ('Johhhhhhh');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql [localhost:8042] {msandbox} (test) > select LastName, MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+---------------+--------------------------------------------------+
| LastName      | MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) |
+---------------+--------------------------------------------------+
| König         |                                                0 |
| Kónig         |                                                0 |
| Königs        |                                                0 |
| Königsberger  |                                                0 |
| John          |                              0.19995030760765076 |
| Jóhn          |                                                0 |
| Johnson       |                                                0 |
| Jóhnson       |                                                0 |
| Johxxx        |                                                0 |
| johan         |                              0.19995030760765076 |
| johhhhhhh     |                              0.19995030760765076 |
| Johxxx        |                                                0 |
| Johan         |                              0.19995030760765076 |
| Johhhhhhh     |                              0.19995030760765076 |
+---------------+--------------------------------------------------+
14 rows in set (0.00 sec)

mysql [localhost:8042] {msandbox} (test) > INSERT INTO FT_TestTable (LastName) VALUES ('Johnson'), ('Johnson');
Query OK, 2 rows affected (0.14 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql [localhost:8042] {msandbox} (test) > select LastName, MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+---------------+--------------------------------------------------+
| LastName      | MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) |
+---------------+--------------------------------------------------+
| König         |                                                0 |
| Kónig         |                                                0 |
| Königs        |                                                0 |
| Königsberger  |                                                0 |
| John          |                              0.25517651438713074 |
| Jóhn          |                                                0 |
| Johnson       |                                                0 |
| Jóhnson       |                                                0 |
| Johxxx        |                                                0 |
| johan         |                              0.25517651438713074 |
| johhhhhhh     |                              0.25517651438713074 |
| Johxxx        |                                                0 |
| Johan         |                              0.25517651438713074 |
| Johhhhhhh     |                              0.25517651438713074 |
| Johnson       |                                                0 |
| Johnson       |                                                0 |
+---------------+--------------------------------------------------+
16 rows in set (0.05 sec)

mysql [localhost:8042] {msandbox} (test) > select LastName, MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+---------------+--------------------------------------------------+
| LastName      | MATCH(LastName) AGAINST('+Joh*' IN BOOLEAN MODE) |
+---------------+--------------------------------------------------+
| König         |                                                0 |
| Kónig         |                                                0 |
| Königs        |                                                0 |
| Königsberger  |                                                0 |
| John          |                              0.25517651438713074 |
| Jóhn          |                                                0 |
| Johnson       |                                                0 |
| Jóhnson       |                                                0 |
| Johxxx        |                                                0 |
| johan         |                              0.25517651438713074 |
| johhhhhhh     |                              0.25517651438713074 |
| Johxxx        |                                                0 |
| Johan         |                              0.25517651438713074 |
| Johhhhhhh     |                              0.25517651438713074 |
| Johnson       |                                                0 |
| Johnson       |                                                0 |
+---------------+--------------------------------------------------+
16 rows in set (0.18 sec)

mysql [localhost:8042] {msandbox} (test) > select LastName, MATCH(LastName) AGAINST('+joh*' IN BOOLEAN MODE) FROM FT_TestTable;
+---------------+--------------------------------------------------+
| LastName      | MATCH(LastName) AGAINST('+joh*' IN BOOLEAN MODE) |
+---------------+--------------------------------------------------+
| König         |                                                0 |
| Kónig         |                                                0 |
| Königs        |                                                0 |
| Königsberger  |                                                0 |
| John          |                              0.25517651438713074 |
| Jóhn          |                                                0 |
| Johnson       |                                                0 |
| Jóhnson       |                                                0 |
| Johxxx        |                                                0 |
| johan         |                              0.25517651438713074 |
| johhhhhhh     |                              0.25517651438713074 |
| Johxxx        |                                                0 |
| Johan         |                              0.25517651438713074 |
| Johhhhhhh     |                              0.25517651438713074 |
| Johnson       |                                                0 |
| Johnson       |                                                0 |
+---------------+--------------------------------------------------+
16 rows in set (0.01 sec)

mysql [localhost:8042] {msandbox} (test) >