| Bug #21459 | myisam_ftdump gives bad counts for common words | ||
|---|---|---|---|
| Submitted: | 5 Aug 2006 18:18 | Modified: | 4 Oct 2006 22:29 |
| Reporter: | Paul Cosway | ||
| Status: | Closed | ||
| Category: | Server: FTS | Severity: | S3 (Non-critical) |
| Version: | 5.0.21 | OS: | Linux (linux) |
| Assigned to: | Sergey Vojtovich | Target Version: | |
[5 Aug 2006 18:18]
Paul Cosway
[10 Aug 2006 9:48]
Sveta Smirnova
Thank you for the report. But 5.0.21 is old enough. Could you please check with current 5.0.24 version?
[18 Aug 2006 13:34]
Paul Cosway
As far as I can tell the source for 5.0.24 is identical to that for 5.0.9 beta (which I used to determine there might be a problem). When was the code for myisam_ftdump.c last changed?
[23 Aug 2006 20:14]
Sveta Smirnova
Paul, you are right: mysqm_ftdump.c last changed in April. But I can not catch the problem
using my test data and last BK sources:
CREATE TABLE `bug21459` (
`foo` text NOT NULL,
FULLTEXT KEY `foo` (`foo`)
) ENGINE=MyISAM;
$myisam_ftdump data/test/bug21459 0
Total rows: 500
Total words: 920
Unique words: 4
Longest word: 7 chars (chicken)
Median length: 4
Average global weight: 0.200815
Most common word: 309 times, weight: -0.481068 (chicken)
$myisam_ftdump -c data/test/bug21459 0
283 -0.2655495 bird
309 -0.4810678 chicken
225 0.2006707 pups
103 1.3492073 tiger
$myisam_ftdump -d data/test/bug21459 0
...
3480 0.9775171 bird
34a8 0.9775171 bird
34bc 0.9775171 bird
0 0.9666505 chicken
28 0.9886308 chicken
58 0.9886308 chicken
... repeats 309 times ...
342c 0.9666505 chicken
3450 0.9886308 chicken
34e4 0.9775171 chicken
0 0.9666505 pups
74 0.9666505 pups
98 0.9666505 pups
Test data has filled with help of PHP script:
<?php
$link = mysql_connect('localhost:3306', 'root');
mysql_select_db('test');
$data = array('tiger', 'cat', 'dog', 'egg', 'chicken', 'cow', 'bird', 'pups',);
$quan = count($data) - 1;
for ($i =0; $i < 500; $i ++) {
$p1 = rand(0, $quan);
$p2 = rand(0, $quan);
$temp = array_slice($data, $p1 <= $p2 ? $p1 : $p2, $p1 <= $p2 ? $p2 : $p1);
mysql_query("insert into bug21459 values('" . mysql_escape_string(implode(' ', $temp)) .
"')");
}
?>
[23 Aug 2006 20:16]
Sveta Smirnova
Paul, if you can to provide data I can repeat the issue, please provide it and change status of bug to "Open".
[24 Aug 2006 8:59]
Sergey Vojtovich
A test case for this bug:
CREATE TABLE t1 (a TEXT, FULLTEXT(a));
INSERT INTO t1 VALUES
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),('w200'),
('w003'),('w003'),('w003');
[24 Aug 2006 9:08]
Sveta Smirnova
Verified on Linux using test case provided by Sergey and last BK sources:
ssmirnova@shella ~/mysql5.0b
$bin/myisam_ftdump data/test/t1 0
Total rows: 203
Total words: 4
Unique words: 2
Longest word: 4 chars (w003)
Median length: 4
Average global weight: 4.753986
Most common word: 3 times, weight: 4.199705 (w003)
ssmirnova@shella ~/mysql5.0b
$bin/myisam_ftdump -c data/test/t1 0
3 4.1997051 w003
1 5.3082677 w200
ssmirnova@shella ~/mysql5.0b
$bin/myisam_ftdump -d data/test/t1 0
fa0 0.9886308 w003
fb4 0.9886308 w003
fc8 0.9886308 w003
c00 => 200 w200
[20 Sep 2006 21:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/12270 ChangeSet@1.2277, 2006-09-20 17:12:37+05:00, svoj@mysql.com +1 -0 BUG#21459 - myisam_ftdump gives bad counts for common words This problem affects myisam_ftdump tool only. For fulltext index positive subkeys means word weight, negative subkeys means number of documents in level 2 fulltext index. Fixed that document counter was not properly updated for keys having level 2 fulltext index. No test case for this bug.
[3 Oct 2006 10:50]
Sergey Vojtovich
Fixed in 5.0.26, 5.1.12.
[4 Oct 2006 22:29]
Paul DuBois
Noted in 5.0.26, 5.1.12 changelogs. mysql_ftdump produced bad counts for common words.
