Bug #1935 | Full-text Search doesn't work with HTML-Entities | ||
---|---|---|---|
Submitted: | 24 Nov 2003 13:41 | Modified: | 25 Nov 2003 2:55 |
Reporter: | [ name withheld ] | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 4.0.15 | OS: | Linux (linux) |
Assigned to: | CPU Architecture: | Any |
[24 Nov 2003 13:41]
[ name withheld ]
[24 Nov 2003 14:42]
Alexander Keremidarski
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.mysql.com/documentation/ and the instructions on how to report a bug at http://bugs.mysql.com/how-to-report.php As Manual explicitly mentions: "The MATCH() function performs a natural language search..." It is in TODO list to add: Support for "always-index words". They could be any strings the user wants to treat as words, examples are "C++", "AS/400", "TCP/IP", etc. Make stopword list to depend of the language of the data.
[25 Nov 2003 2:55]
Sergei Golubchik
To clarify Alexander's reply a bit - the manual also says: MySQL uses a very simple parser to split text into words. A "word" is any sequence of characters consisting of letters, digits, `'', and `_'. Any "word" that is present in the stopword list or is just too short is ignored. And I do not agree that ; is commonly used as a part of the word. It is mainly not. And even in HTML it is NOT part of the word, but a part of the HTML entity. But we have in the todo a smart html-parser that properly recognizes HTML entities (and tags, btw).