| Bug #8159 | Inconsistent full-text search for diacriticals in MyISAM table | ||
|---|---|---|---|
| Submitted: | 27 Jan 2005 12:21 | Modified: | 4 Feb 2005 12:32 |
| Reporter: | Name Withheld | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: MyISAM storage engine | Severity: | S3 (Non-critical) |
| Version: | 4.1.9 | OS: | Windows (Windows XP) |
| Assigned to: | Sergei Golubchik | CPU Architecture: | Any |
[27 Jan 2005 12:24]
Name Withheld
MyISAM database with default charset "utf8"
Attachment: utf8_test.zip (application/zip, text), 876 bytes.
[27 Jan 2005 12:25]
Name Withheld
MyISAM database with default charset "latin1"
Attachment: latin1_test.zip (application/zip, text), 891 bytes.
[27 Jan 2005 13:36]
Sergei Golubchik
Also adding fulltext indexes (to test tables) makes the results more inconsistent
[4 Feb 2005 12:32]
Sergei Golubchik
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.
If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information
about accessing the source trees is available at
http://www.mysql.com/doc/en/Installing_source_tree.html
Additional info:
fixed in 4.1.10

Description: When using "MATCH...AGAINST...IN BOOLEAN MODE" in a MyISAM table, apparently the search is diacritical-insensitive if the server's default charset matches the table's default charset. Otherwise, the search is diacritical-sensitive. How to repeat: Two MyISAM databases are attached: "utf8_test" and "latin1_test". These are identical except for their default charset. Each contains an instance of the string "video" and an instance of the string "vidéo". When running the search "SELECT str1 FROM tab1 WHERE MATCH (str1) AGAINST ("video" IN BOOLEAN MODE)", the results are inconsistent, as follows: 1) if the server was booted with "--default-character-set=utf8", the search in "latin1_test" will return only "video", whereas the search in utf8_test" will return both "video" and "vidéo". 2) if the server was booted without an explicit charset specification, then the results are the opposite: the search in "latin1_test" returns both "video" and "vidéo", whereas the search in "utf8_test" will return only "video".