Bug #98427 InnoDB FullText AUX Tables are broken in 8.0
Submitted: 29 Jan 2020 11:59 Modified: 19 Aug 2020 7:44
Reporter: Satya Bodapati (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:8.0, 8.0.19 OS:Any
Assigned to: CPU Architecture:Any
Tags: fts, innodb

[29 Jan 2020 11:59] Satya Bodapati
Description:
InnoDB Fulltext Index tokens are stored in AUX INDEX tables. The tokens are distributed across AUX INDEX TABLES (_INDEX_*.IBD files).

But starting from 8.0, all tokens are stored always in the first AUX INDEX TABLE (INDEX_1.IBD).

How to repeat:
CREATE TABLE t1 (a VARCHAR(200), b TEXT, FULLTEXT (a,b)) ENGINE = InnoDB charset utf8mb4;

INSERT INTO t1 VALUES('MySQL has now support', 'for full-text search'),                        ('Full-text indexes', 'are called collections'), ('Only MyISAM tables','support collections'), ('Function MATCH ... AGAINST()','is used to do a search'),         ('Full-text search in MySQL', 'implements vector space model');

shutdown the server;

do 
"strings fts_0000000000000425_000000000000009c_index_1.ibd"
(note that the tableid and indexid can change. Ensure there is only fts table with one fts index for easy identification).
This will list all strings and you will find that all tokens are in index_1.ibd

strings fts_0000000000000425_000000000000009c_index_1.ibd
:infimum
supremum
MvZF
{{?x
vs-T
WW5(
"tIO
_<SH
Cl]3
:GRL
[.|?XL
QKi>
&7_!6
(2P,&9&
TzR,
Tlo}
E]lF
infimum
supremum
3against
2called
:collections
6full
4function
/has
6implements
3indexes
1match
1model
2myisam
4mysql
/now
0only
8search
1space
6support
2tables
6text
0used
vector

"strings" on other IBDs (from index_2 to index_6.ibd) will return empty

---------------------------------------
the same operation on 5.7 shows the tokens are distributed across AUX IBDs

 strings var/mysqld.1/data/test/FTS_0000000000000026_0000000000000033_INDEX_1.ibd 
w*6I
infimum
supremum
satya@satya-lenovo-yoga-920:~/WORK/PS-5.7-MAIN/bld/mysql-test$ strings var/mysqld.1/data/test/FTS_0000000000000026_0000000000000033_INDEX_2.ibd 
MbJ;
infimum
supremum
3against
2called
collections
satya@satya-lenovo-yoga-920:~/WORK/PS-5.7-MAIN/bld/mysql-test$ strings var/mysqld.1/data/test/FTS_0000000000000026_0000000000000033_INDEX_3.ibd 
$/?%	
2+-9y
infimum
supremum	
6full
4function
/has
6implements
!indexes
c-9y
satya@satya-lenovo-yoga-920:~/WORK/PS-5.7-MAIN/bld/mysql-test$ strings var/mysqld.1/data/test/FTS_0000000000000026_0000000000000033_INDEX_4.ibd 
infimum
supremum
1match
1model
2myisam
4mysql
/now
only
satya@satya-lenovo-yoga-920:~/WORK/PS-5.7-MAIN/bld/mysql-test$ strings var/mysqld.1/data/test/FTS_0000000000000026_0000000000000033_INDEX_5.ibd 
infimum
supremum	
8search
1space
6support
2tables
text
satya@satya-lenovo-yoga-920:~/WORK/PS-5.7-MAIN/bld/mysql-test$ strings var/mysqld.1/data/test/FTS_0000000000000026_0000000000000033_INDEX_6.ibd 
infimum
supremum
0used
vector

Suggested fix:
Its seems the FTS index selector relies on innobase_strnxfrm().

Event though both 5.7 and 8.0 uses utf8mb4, the collation is different.
8.0 uses my_charset_utf8mb4_0900_ai_ci
5.7 uses my_charset_utf8mb4_general_ci

innobase_strnxfrm() cannot properly partition/distribute tokens in 8.0 because of  my_charset_utf8mb4_0900_ai_ci

Fix innobase_strnxfrm() to distribute the tokens properly.

Workaround for now is to use latin1 character set. Works well in 8.0 too.
[30 Jan 2020 9:01] MySQL Verification Team
Hello Satya,

Thank you for the report and feedback!

regards,
Umesh
[30 Jan 2020 9:01] MySQL Verification Team
Test results - 8.0.19

Attachment: 98427_8.0.19.results (application/octet-stream, text), 3.29 KiB.

[30 Jan 2020 9:01] MySQL Verification Team
Test results - 5.7.29

Attachment: 98427_5.7.29.results (application/octet-stream, text), 2.71 KiB.

[30 Jan 2020 9:02] MySQL Verification Team
Pls ignore 98434_8.0.19.results
[19 Aug 2020 7:44] Satya Bodapati
We might as well remove all FTS Index partitions if we do not intend to fix this bug. Just keep one FTS_AUX.IBD