Bug #7671 | exact phrase on FullText search | ||
---|---|---|---|
Submitted: | 5 Jan 2005 0:01 | Modified: | 31 Jan 2005 19:18 |
Reporter: | Ulterior | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Documentation | Severity: | S3 (Non-critical) |
Version: | 4.0.3 beta | OS: | Windows (WinNT) |
Assigned to: | Paul DuBois | CPU Architecture: | Any |
[5 Jan 2005 0:01]
Ulterior
[5 Jan 2005 4:45]
MySQL Verification Team
Your server version is very older, please look the below output: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 to server version: 4.0.23-nt Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> select count(*) from t1 where match(ft_col) against ( '"t12.ace"' in bool ean -> mode ); +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.02 sec) mysql> select count(*) from t1 where match(ft_col) against ( '"t123.ace"' in boo lean -> mode ); +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.04 sec) mysql> select count(*) from t1 where match(ft_col) against ( '"t12.ace1"' in boo lean -> mode ); +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.04 sec) mysql> mysql> select * from t1 where match(ft_col) against ( '"t12.ace"' in boolean -> mode ); Empty set (0.00 sec) mysql> select * from t1 where match(ft_col) against ( '"t123.ace"' in boolean -> mode ); +----------+ | ft_col | +----------+ | t123.ace | +----------+ 1 row in set (0.02 sec) mysql> select * from t1 where match(ft_col) against ( '"t12.ace1"' in boolean -> mode ); Empty set (0.01 sec) mysql>
[5 Jan 2005 10:13]
Ulterior
> mysql> select count(*) from t1 where match(ft_col) against ( '"t12.ace"' in boolean -> mode ); +----------+ | count(*) | +----------+ | 0 | +----------+ It should be '1' - exact phrase "t12.ace" is more than 4 chars in length. I must repeat myself and say that - 'In fulltext search using exact phrase no processing of 'exact phrase' should be taken and used as 'it is' when searching. In the current situation mysql parser 'exact phrase' with dot as delimeter, than uses fts on table with 2 words ( and it should not be so ) and then string compares result with exact phrase. Please read again my submision comments on every mysql select I made.
[5 Jan 2005 11:11]
Hartmut Holzgraefe
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.mysql.com/documentation/ and the instructions on how to report a bug at http://bugs.mysql.com/how-to-report.php Additional info: This is expected behaviour documented on the manual page quoted below, a fulltext index is always based on and queried against 'words', not general phrases. As the '.' is not a 'word character' it is treated as a word boundary. see http://dev.mysql.com/doc/mysql/en/Fulltext_Search.html : MySQL uses a very simple parser to split text into words. A ``word'' is any sequence of true word characters (letters, digits, and underscores), optionally separated by no more than one sequential `'' character. For example, wasn't is parsed as a single word, but wasn''t is parsed as two words wasn and t. (And then t would be ignored as too short according to the rules following.) Also, single quotes at the ends of words are stripped; only embedded single quotes are retained.
[5 Jan 2005 11:23]
Ulterior
I am sorry, but I'll have to open that bug again :) Did you noticed that in my bug-post I've mentioned 'EXACT PHRASE', that means I used DOUBLE-QUOTES ( \" ) on fulltext search to match EXACT PHRASE. You missunderstood me in that way. I know how mysql behave using AGAINST ( 'some word' ). I point out situation when AGAINST ( '\"some word\"' ) is used or in other words EXACT PHRASE Please re-read my whole posting again and carefully. Thank you
[5 Jan 2005 20:25]
Sergei Golubchik
You are right - this is exactly as phrase search works (split the phrase into words, search for them in the index, substring search in the found records). So to have it working you need the phrase to contain at least one word that is indexed (longer than ft_min_word_len, and not stopword). It looks like the manual is not clear on this, we'll fix it.
[5 Jan 2005 21:28]
Ulterior
I think I'm getting to understand such fts behavior. You cannot perform exact phrase search not splitting it to words, becouse index is build in that way. Well, if it can't be fixed, will have to live with it.
[7 Jan 2005 21:08]
Brian Aker
Hi! One suggestion that I might give (though I don't know if it will completely work for you) is to write a UDF to validate the entire phrase. Have the full text index return values, but before sending them to the client have a UDF parse them for accuracy and use a HAVING to remove bad results.
[31 Jan 2005 19:18]
Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant product(s).