Bug #17299 | Inefficient optimizer for fulltext queries. | ||
---|---|---|---|
Submitted: | 10 Feb 2006 12:35 | Modified: | 23 Jan 2014 9:53 |
Reporter: | Andre Timmer | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.0.18 | OS: | Any (All) |
Assigned to: | CPU Architecture: | Any |
[10 Feb 2006 12:35]
Andre Timmer
[27 Apr 2006 16:21]
Eric Jensen
On my dataset, the converse of this is also true: Any boolean query with disjunction (OR's) and a fulltext operation in the where clause is estimated to return all rows in the table, so no index (whether fulltext or any other applicable one) is used. In fact, if two MATCH functions are OR'ed, the fulltext index doesn't even show up in possible_keys (although this can be solved by lumping all the MATCH'es together). For example: select version(); +------------+ | version() | +------------+ | 5.0.17-log | +------------+ 1 row in set (0.00 sec) CREATE TABLE `bigtable` ( `rank` int(10) unsigned NOT NULL auto_increment, `text` char(48) NOT NULL, `frequency` int(10) unsigned NOT NULL, `sumFreq` bigint(20) unsigned default NULL, PRIMARY KEY (`rank`), UNIQUE KEY `textndx` (`text`), FULLTEXT KEY `fulltextndx` (`text`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 DATA DIRECTORY='/vol28/mysql/data/textdb/' INDEX DIRECTORY='/vol32/mysql/data/textdb/' select count(*) from bigtable; +----------+ | count(*) | +----------+ | 4329556 | +----------+ 1 row in set (0.00 sec) explain select * from bigtable where match(text) against ("test" in boolean mode) or text like "car%"; +----+-------------+-------------------------------------------------------------+------+-----------------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------------------------------------------------------+------+-----------------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | bigtable | ALL | textndx,fulltextndx | NULL | NULL | NULL | 4329556 | Using where | +----+-------------+-------------------------------------------------------------+------+-----------------------+------+---------+------+---------+-------------+ 1 row in set (0.00 sec) explain select * from bigtable where match(text) against ("test" in boolean mode) or match(text) against ("car" in boolean mode); +----+-------------+-------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | bigtable | ALL | NULL | NULL | NULL | NULL | 4329556 | Using where | +----+-------------+-------------------------------------------------------------+------+---------------+------+---------+------+---------+-------------+ 1 row in set (0.00 sec) My text values are not particularly odd, but I could not duplicate this with a small test table, except for possible_keys being NULL...but in this case it uses the normal index despite that: mysql> create table fulltext_and_normal (text_column char(48) not null, unique key text_column_idx (text_column), fulltext key text_column_fulltext_idx (text_column)); Query OK, 0 rows affected (0.05 sec) mysql> insert into fulltext_and_normal values("eric jensen"), ("e. jensen"), ("jensen"); Query OK, 3 rows affected (0.01 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> explain select * from fulltext_and_normal where match(text_column) against ("jensen" in boolean mode) or text_column like "eric%"; +----+-------------+---------------------+-------+------------------------------------------+-----------------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------------+-------+------------------------------------------+-----------------+---------+------+------+--------------------------+ | 1 | SIMPLE | fulltext_and_normal | index | text_column_idx,text_column_fulltext_idx | text_column_idx | 48 | NULL | 3 | Using where; Using index | +----+-------------+---------------------+-------+------------------------------------------+-----------------+---------+------+------+--------------------------+ 1 row in set (0.00 sec) mysql> explain select * from fulltext_and_normal where match(text_column) against ("jensen" in boolean mode) or match(text_column) against ("eric" in boolean mode); +----+-------------+---------------------+-------+---------------+-----------------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------------+-------+---------------+-----------------+---------+------+------+--------------------------+ | 1 | SIMPLE | fulltext_and_normal | index | NULL | text_column_idx | 48 | NULL | 3 | Using where; Using index | +----+-------------+---------------------+-------+---------------+-----------------+---------+------+------+--------------------------+ 1 row in set (0.00 sec) The statistics on bigtable should be fine, as I did a REPAIR TABLE when upgrading from mysql 4 to 5 to fix the indicies. The problem existed in mysql 4 too.
[27 Apr 2006 17:43]
Eric Jensen
For what it's worth, here is the output of myisam_ftdump on that fulltext index of bigtable: Total rows: 4329556 Total words: 14197741 Unique words: 1106119 Longest word: 48 chars (000000000000000000000000000000000000000000000000) Median length: 5 Average global weight: 14.807066 Most common word: 425577 times, weight: 2.216306 (com)
[23 Nov 2006 0:08]
Igor Babaev
This will be fixed not earlier than in 5.2.