Description:
InnoDB Fulltext Index tokens are stored in AUX INDEX tables. The tokens are distributed across AUX INDEX TABLES (_INDEX_*.IBD files).
But starting from 8.0, all tokens are stored always in the first AUX INDEX TABLE (INDEX_1.IBD).
How to repeat:
CREATE TABLE t1 (a VARCHAR(200), b TEXT, FULLTEXT (a,b)) ENGINE = InnoDB charset utf8mb4;
INSERT INTO t1 VALUES('MySQL has now support', 'for full-text search'), ('Full-text indexes', 'are called collections'), ('Only MyISAM tables','support collections'), ('Function MATCH ... AGAINST()','is used to do a search'), ('Full-text search in MySQL', 'implements vector space model');
shutdown the server;
do
"strings fts_0000000000000425_000000000000009c_index_1.ibd"
(note that the tableid and indexid can change. Ensure there is only fts table with one fts index for easy identification).
This will list all strings and you will find that all tokens are in index_1.ibd
strings fts_0000000000000425_000000000000009c_index_1.ibd
:infimum
supremum
MvZF
{{?x
vs-T
WW5(
"tIO
_<SH
Cl]3
:GRL
[.|?XL
QKi>
&7_!6
(2P,&9&
TzR,
Tlo}
E]lF
infimum
supremum
3against
2called
:collections
6full
4function
/has
6implements
3indexes
1match
1model
2myisam
4mysql
/now
0only
8search
1space
6support
2tables
6text
0used
vector
"strings" on other IBDs (from index_2 to index_6.ibd) will return empty
---------------------------------------
the same operation on 5.7 shows the tokens are distributed across AUX IBDs
strings var/mysqld.1/data/test/FTS_0000000000000026_0000000000000033_INDEX_1.ibd
w*6I
infimum
supremum
satya@satya-lenovo-yoga-920:~/WORK/PS-5.7-MAIN/bld/mysql-test$ strings var/mysqld.1/data/test/FTS_0000000000000026_0000000000000033_INDEX_2.ibd
MbJ;
infimum
supremum
3against
2called
collections
satya@satya-lenovo-yoga-920:~/WORK/PS-5.7-MAIN/bld/mysql-test$ strings var/mysqld.1/data/test/FTS_0000000000000026_0000000000000033_INDEX_3.ibd
$/?%
2+-9y
infimum
supremum
6full
4function
/has
6implements
!indexes
c-9y
satya@satya-lenovo-yoga-920:~/WORK/PS-5.7-MAIN/bld/mysql-test$ strings var/mysqld.1/data/test/FTS_0000000000000026_0000000000000033_INDEX_4.ibd
infimum
supremum
1match
1model
2myisam
4mysql
/now
only
satya@satya-lenovo-yoga-920:~/WORK/PS-5.7-MAIN/bld/mysql-test$ strings var/mysqld.1/data/test/FTS_0000000000000026_0000000000000033_INDEX_5.ibd
infimum
supremum
8search
1space
6support
2tables
text
satya@satya-lenovo-yoga-920:~/WORK/PS-5.7-MAIN/bld/mysql-test$ strings var/mysqld.1/data/test/FTS_0000000000000026_0000000000000033_INDEX_6.ibd
infimum
supremum
0used
vector
Suggested fix:
Its seems the FTS index selector relies on innobase_strnxfrm().
Event though both 5.7 and 8.0 uses utf8mb4, the collation is different.
8.0 uses my_charset_utf8mb4_0900_ai_ci
5.7 uses my_charset_utf8mb4_general_ci
innobase_strnxfrm() cannot properly partition/distribute tokens in 8.0 because of my_charset_utf8mb4_0900_ai_ci
Fix innobase_strnxfrm() to distribute the tokens properly.
Workaround for now is to use latin1 character set. Works well in 8.0 too.