Bug #21459 | myisam_ftdump gives bad counts for common words | ||
---|---|---|---|
Submitted: | 5 Aug 2006 16:18 | Modified: | 4 Oct 2006 20:29 |
Reporter: | Paul Cosway | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: FULLTEXT search | Severity: | S3 (Non-critical) |
Version: | 5.0.21 | OS: | Linux (linux) |
Assigned to: | Sergey Vojtovich | CPU Architecture: | Any |
[5 Aug 2006 16:18]
Paul Cosway
[10 Aug 2006 7:48]
Sveta Smirnova
Thank you for the report. But 5.0.21 is old enough. Could you please check with current 5.0.24 version?
[18 Aug 2006 11:34]
Paul Cosway
As far as I can tell the source for 5.0.24 is identical to that for 5.0.9 beta (which I used to determine there might be a problem). When was the code for myisam_ftdump.c last changed?
[23 Aug 2006 18:14]
Sveta Smirnova
Paul, you are right: mysqm_ftdump.c last changed in April. But I can not catch the problem using my test data and last BK sources: CREATE TABLE `bug21459` ( `foo` text NOT NULL, FULLTEXT KEY `foo` (`foo`) ) ENGINE=MyISAM; $myisam_ftdump data/test/bug21459 0 Total rows: 500 Total words: 920 Unique words: 4 Longest word: 7 chars (chicken) Median length: 4 Average global weight: 0.200815 Most common word: 309 times, weight: -0.481068 (chicken) $myisam_ftdump -c data/test/bug21459 0 283 -0.2655495 bird 309 -0.4810678 chicken 225 0.2006707 pups 103 1.3492073 tiger $myisam_ftdump -d data/test/bug21459 0 ... 3480 0.9775171 bird 34a8 0.9775171 bird 34bc 0.9775171 bird 0 0.9666505 chicken 28 0.9886308 chicken 58 0.9886308 chicken ... repeats 309 times ... 342c 0.9666505 chicken 3450 0.9886308 chicken 34e4 0.9775171 chicken 0 0.9666505 pups 74 0.9666505 pups 98 0.9666505 pups Test data has filled with help of PHP script: <?php $link = mysql_connect('localhost:3306', 'root'); mysql_select_db('test'); $data = array('tiger', 'cat', 'dog', 'egg', 'chicken', 'cow', 'bird', 'pups',); $quan = count($data) - 1; for ($i =0; $i < 500; $i ++) { $p1 = rand(0, $quan); $p2 = rand(0, $quan); $temp = array_slice($data, $p1 <= $p2 ? $p1 : $p2, $p1 <= $p2 ? $p2 : $p1); mysql_query("insert into bug21459 values('" . mysql_escape_string(implode(' ', $temp)) . "')"); } ?>
[23 Aug 2006 18:16]
Sveta Smirnova
Paul, if you can to provide data I can repeat the issue, please provide it and change status of bug to "Open".
[24 Aug 2006 6:59]
Sergey Vojtovich
A test case for this bug: CREATE TABLE t1 (a TEXT, FULLTEXT(a)); INSERT INTO t1 VALUES ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'), ('w003'),('w003'),('w003');
[24 Aug 2006 7:08]
Sveta Smirnova
Verified on Linux using test case provided by Sergey and last BK sources: ssmirnova@shella ~/mysql5.0b $bin/myisam_ftdump data/test/t1 0 Total rows: 203 Total words: 4 Unique words: 2 Longest word: 4 chars (w003) Median length: 4 Average global weight: 4.753986 Most common word: 3 times, weight: 4.199705 (w003) ssmirnova@shella ~/mysql5.0b $bin/myisam_ftdump -c data/test/t1 0 3 4.1997051 w003 1 5.3082677 w200 ssmirnova@shella ~/mysql5.0b $bin/myisam_ftdump -d data/test/t1 0 fa0 0.9886308 w003 fb4 0.9886308 w003 fc8 0.9886308 w003 c00 => 200 w200
[20 Sep 2006 19:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/12270 ChangeSet@1.2277, 2006-09-20 17:12:37+05:00, svoj@mysql.com +1 -0 BUG#21459 - myisam_ftdump gives bad counts for common words This problem affects myisam_ftdump tool only. For fulltext index positive subkeys means word weight, negative subkeys means number of documents in level 2 fulltext index. Fixed that document counter was not properly updated for keys having level 2 fulltext index. No test case for this bug.
[3 Oct 2006 8:50]
Sergey Vojtovich
Fixed in 5.0.26, 5.1.12.
[4 Oct 2006 20:29]
Paul DuBois
Noted in 5.0.26, 5.1.12 changelogs. mysql_ftdump produced bad counts for common words.