Bug #91567 innodb_ft_enable_stopword doesn't work as intended
Submitted: 7 Jul 2018 16:46 Modified: 31 Jul 2018 2:54
Reporter: David Webb Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:8.0.11 OS:Windows
Assigned to: MySQL Verification Team CPU Architecture:Any

[7 Jul 2018 16:46] David Webb
Description:
When building a fulltext index, if the innodb_ft_enable_stopword option is set to off or 0, then that should stop it using any stopword table. However, I have found that it continues to use the default stopword table unless I also set the variable innodb_ft_server_stopword_table to point to a real table (with nothing in it). So to workaround, I created a table called "stopwords" with a single column called "value" varchar, as required. But if I turn off innodb_ft_enable_stopword then that should stop it. Period. See the description:
https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_ft_enable_sto...

How to repeat:
Set the innodb_ft_enable_stopwords=OFF.
Create a simple innodb table with a fulltext index including the words "To" and "Or" in the table. These should be ignored if the stopwords are disabled.
[26 Jul 2018 9:22] MySQL Verification Team
Hi,

What is your innodb_ft_min_token_size ? Default is 3 so that will ignore "To" irrelevant to the stopwords table.

as you can see turning off the stopword works:

mysql [localhost] {msandbox} (test) > SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
+-------+
| value |
+-------+
| a     |
| about |
| an    |
| are   |
| as    |
| at    |
| be    |
| by    |
| com   |
| de    |
| en    |
| for   |
| from  |
| how   |
| i     |
| in    |
| is    |
| it    |
| la    |
| of    |
| on    |
| or    |
| that  |
| the   |
| this  |
| to    |
| was   |
| what  |
| when  |
| where |
| who   |
| will  |
| with  |
| und   |
| the   |
| www   |
+-------+
36 rows in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select @@innodb_ft_min_token_size;
+----------------------------+
| @@innodb_ft_min_token_size |
+----------------------------+
|                          2 |
+----------------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select @innodb_ft_min_token_size;
+---------------------------+
| @innodb_ft_min_token_size |
+---------------------------+
| NULL                      |
+---------------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select @@innodb_ft_enable_stopword;
+-----------------------------+
| @@innodb_ft_enable_stopword |
+-----------------------------+
|                           0 |
+-----------------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > create table ftt(id int unsigned not null auto_increment primary key, x varchar(100), fulltext key (x))engine = innodb;
Query OK, 0 rows affected (0.58 sec)

mysql [localhost] {msandbox} (test) > insert into ftt (x) values ('to'), ('or'), ('when'), ('the'), ('that'), ('keetah');
Query OK, 6 rows affected (0.05 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql [localhost] {msandbox} (test) > select * from ftt;
+----+--------+
| id | x      |
+----+--------+
|  1 | to     |
|  2 | or     |
|  3 | when   |
|  4 | the    |
|  5 | that   |
|  6 | keetah |
+----+--------+
6 rows in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from ftt where match (x) against ('keetah');
+----+--------+
| id | x      |
+----+--------+
|  6 | keetah |
+----+--------+
1 row in set (0.02 sec)

mysql [localhost] {msandbox} (test) > select * from ftt where match (x) against ('that');
+----+------+
| id | x    |
+----+------+
|  5 | that |
+----+------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from ftt where match (x) against ('or');
+----+------+
| id | x    |
+----+------+
|  2 | or   |
+----+------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from ftt where match (x) against ('to');
+----+------+
| id | x    |
+----+------+
|  1 | to   |
+----+------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from ftt where match (x) against ('to or');
+----+------+
| id | x    |
+----+------+
|  1 | to   |
|  2 | or   |
+----+------+
2 rows in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from ftt where match (x) against ('to or that');
+----+------+
| id | x    |
+----+------+
|  1 | to   |
|  2 | or   |
|  5 | that |
+----+------+
3 rows in set (0.00 sec)

mysql [localhost] {msandbox} (test) >
[26 Jul 2018 10:08] MySQL Verification Team
I hope this clears a behavior for you a bit:

mysql [localhost] {msandbox} (test) > select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.11    |
+-----------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > set innodb_ft_enable_stopword=on;
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from ftt where match (x) against ('to or that');
+----+------+
| id | x    |
+----+------+
|  1 | to   |
|  2 | or   |
|  5 | that |
+----+------+
3 rows in set (0.00 sec)

mysql [localhost] {msandbox} (test) > alter table ftt engine=innodb;
Query OK, 6 rows affected (0.13 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql [localhost] {msandbox} (test) > select * from ftt where match (x) against ('to or that');
Empty set (0.00 sec)

mysql [localhost] {msandbox} (test) > set innodb_ft_enable_stopword=off;
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {msandbox} (test) > select * from ftt where match (x) against ('to or that');
Empty set (0.01 sec)

mysql [localhost] {msandbox} (test) > alter table ftt engine=innodb;
Query OK, 6 rows affected (0.08 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql [localhost] {msandbox} (test) > select * from ftt where match (x) against ('to or that');
+----+------+
| id | x    |
+----+------+
|  1 | to   |
|  2 | or   |
|  5 | that |
+----+------+
3 rows in set (0.00 sec)
[29 Jul 2018 16:44] David Webb
Thanks for looking at this. I am indeed using innodb_ft_min_token_size=2. This is because in Romanised Chinese names, there are many 2-letter words including surnames such as Ho, Li, Or and To. The last two of those examples are in the default stopword table. In my fulltext searches I am using "IN BOOLEAN MODE" with + to require the specified words.

Now I remember that I set the variable dynamically, because the guide said I could, as in:
SET GLOBAL innodb_ft_enable_stopword=OFF;
(the dynamic column here says "YES"):
https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html

So I have just repeated the test by restarting the server (with the setting in the option file) then adding the index. This time, the search works. 

So I suspect that this is a bug - when setting innodb_ft_enable_stopword=OFF dynamically, it doesn't take effect because the server is still using the default stopword table. I need to do a server restart to make it "forget" (or unload) the stopword table before adding the fulltext index.
[29 Jul 2018 17:52] MySQL Verification Team
Hi,

> Now I remember that I set the variable dynamically, because the guide said I could

It is dynamic, but the variable states if the index will be created, not if the results will contain or not those words.

So if you look at my config, when you recreate the index when the variable is off you can find those strings in the table (as they are created in index) and then the value of the variable is irrelevant. So the innodb_ft_enable_stopword affects next insert/delete/update, not the select.

all best
Bogdan
[30 Jul 2018 2:56] David Webb
You wrote: "It is dynamic, but the variable states if the index will be created, not if the results will contain or not those words."

No, the index will always be created with CREATE FULLTEXT INDEX. The variable innodb_ft_enable_stopword should determine whether or not the index contains words which are in the stopword table. See the description here:

https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_ft_enable_sto...

"Specifies that a set of stopwords is associated with an InnoDB FULLTEXT index at the time the index is created. If the innodb_ft_user_stopword_table option is set, the stopwords are taken from that table. Else, if the innodb_ft_server_stopword_table option is set, the stopwords are taken from that table. Otherwise, a built-in set of default stopwords is used. "

I have discovered that this description is only correct if the server is started with innodb_ft_enable_stopword=OFF, in effect unloading the default stopword table, or if the user dynamically sets innodb_ft_server_stopword_table to an empty table without stopwords. That is, for the dynamic route to work, you need:

SET GLOBAL innodb_ft_enable_stopword=OFF;
SET GLOBAL innodb_ft_server_stopword_table=mydb.mystopwords;
[30 Jul 2018 19:32] MySQL Verification Team
Hi David,

> No, the index will always be created with CREATE FULLTEXT INDEX. 

You are nitpicking and misinterpreting my words. 

Yes "index IS created", when I say "will be created / or not" I'm talking about if the word that's in the stopwords table will be or will not be inserted in to that index.

So as I said, that value is dynamic and the "current value of the variable determines if the word will be added or not". How server is started is irrelevant, weather insert/update/delete will use the stopwords in the index or not depends on the current (dynamic) value of the variable. The result of a select does not depend on the variable at all.

I hope this clears it up, it works as defined, there's no bug.

Thanks
Bogdan
[31 Jul 2018 2:54] David Webb
Thanks Bogdan.

I figured out what the problem is. It's not a bug, but my problem was caused by setting the GLOBAL variable and assuming that this would automatically change the SESSION variable. It doesn't. That is, session variables adopt the extant values of global variables when the session is opened but are not affected by subsequent changes to global variables. 

If I set the GLOBAL variable then start a new session by reconnecting to the DMBS in Workbench, then the session takes the new global value.