| Bug #19580 | FULLTEXT search produces wrong results on UTF-8 columns | ||
|---|---|---|---|
| Submitted: | 6 May 2006 14:39 | Modified: | 22 Jun 2006 16:16 |
| Reporter: | Steffen Weber | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server | Severity: | S3 (Non-critical) |
| Version: | 4.1.18 | OS: | Linux (Gentoo Linux) |
| Assigned to: | Sergey Vojtovich | CPU Architecture: | Any |
[25 May 2006 18:16]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/6883
[29 May 2006 11:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/6965
[19 Jun 2006 19:05]
Sergey Vojtovich
Pushed into tree currently marked as 5.1.12.
[22 Jun 2006 14:39]
Sergey Vojtovich
Paul, since there is a need to rebuild index only, REPAIR TABLE tbl_name QUICK is fine.
[22 Jun 2006 16:16]
Paul DuBois
Noted in 5.1.12 changelog, and in the upgrading-to-5.1 section. Incompatible change: For utf8 columns, the full-text parser incorrectly considered several non-word punctuation and whitespace characters as word characters, causing some searches to return incorrect results. The fix involves a change to the full-text parser, so any tables that have FULLTEXT indexes on utf8 columns must be repaired with REPAIR TABLE: REPAIR TABLE tbl_name QUICK;
[13 Jun 2007 8:33]
Sergey Vojtovich
BUG#28861 was marked as duplicate.

Description: MySQL handles the characters „ (HTML „) and “ (HTML “) as if they belong to the word in between them, although they should be treated like a blank by the FULLTEXT index. How to repeat: Execute the following SQL queries: CREATE TABLE `fulltext_utf8` ( `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY , `text` VARCHAR( 255 ) NOT NULL , FULLTEXT (`text`) ) ENGINE = MYISAM DEFAULT CHARSET=utf8; INSERT INTO `fulltext_utf8` VALUES (NULL , '„MySQL“'), (NULL , 'MySQL'); SELECT `id`, `text` FROM `fulltext_utf8` WHERE MATCH (`text`) AGAINST ('MySQL' IN BOOLEAN MODE); +----+-------+ | id | text | +----+-------+ | 2 | MySQL | +----+-------+ 1 row in set (0.00 sec) As you can see the SELECT statement only matches the second row, although it should match both rows. Suggested fix: Treat the mentioned characters as if they were blanks in the FULLTEXT index.