Description:
I actually don't know whether this is a bug or not.
I'm filing this bug because I suppose this is not the right behavior.
I suppose there is a misbehavior of the fulltext parser when creating indexes.
Consider a column named title (type tinytext) with a fulltext index and a row of that column: "L'altro delitto"
When executing this query:
SELECT `id` FROM `table` WHERE MATCH (`title`) AGAINST ('+altro* +delitto*' IN BOOLEAN MODE ) >= 2
The table and column collation is latin1_swedish_ci.
This is my <ctype><map> for latin1 collation
<ctype>
<map>
00
20 20 20 20 20 20 20 20 20 28 28 28 28 28 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
48 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
84 84 84 84 84 84 84 84 84 84 10 10 10 10 10 10
10 81 81 81 81 81 81 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 10 10 10 10 10
10 82 82 82 82 82 82 02 02 02 02 02 02 02 02 02
02 02 02 02 02 02 02 02 02 02 02 10 10 10 10 20
10 00 10 02 10 10 10 10 10 10 01 10 01 00 01 00
00 10 10 10 10 10 10 10 10 10 02 10 02 00 02 01
48 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 10 01 01 01 01 01 01 01 02
02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
02 02 02 02 02 02 02 10 02 02 02 02 02 02 02 02
</map>
</ctype>
<collation name="latin1_swedish_ci">
<map>
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F
80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F
90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F
A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF
B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF
41 41 41 41 5C 5B 5C 43 45 45 45 45 49 49 49 49
44 4E 4F 4F 4F 4F 5D D7 D8 55 55 55 59 59 DE DF
41 41 41 41 5C 5B 5C 43 45 45 45 45 49 49 49 49
44 4E 4F 4F 4F 4F 5D F7 D8 55 55 55 59 59 DE FF
</map>
</collation>
As you can see, single apostrophe 0x27 is mapped to 0x10, Not-white, not a word.
I've seen that in the file myisam/ftdefs.h at row 26 is defined the single apostrophe ad as a word:
#define misc_word_char(X) ((X)=='\'')
I'm not a C/C++ developer, but i suppose that the single apostrophe is hard coded in the source code of mysql, and that this specific condition overrides the charset <ctype><map>.
This is a great trouble for those languages, such as Italian, where "L'" is like "The" in English, and Italian grammar states that between the apostrophe and the noun must not be any white space.
So "L'altro" is correct, "L' altro" is wrong in Italian.
Fulltext index considers "L'altro" as a single word, which is not wrong, because it is an article "L" plus apostrophe "'" plus noun "altro".
How to repeat:
Crate a table with a fulltext index on the colum "title"
+----------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| title | tinytext | NO | MUL | | |
+----------+---------------------+------+-----+---------+----------------+
insert into that table (`id`,`title`)('1','L\'altro delitto')
run the query
mysql> SELECT `id` FROM `table` WHERE MATCH ( `title` ) AGAINST ( '+altro* +delitto*' IN BOOLEAN MODE ) >= 2;
Empty set (0.00 sec)
The query returns an empty result set.
It should return 1 result.
Suggested fix:
Remove the apostrophe as a built in word from the source code.
Consider using <ctype><map> for correct behaviour.