Bug #73437 InnoDB Fulltext Search WITH QUERY EXPANSION is very slow
Submitted: 30 Jul 2014 15:19 Modified: 17 Oct 2014 6:30
Reporter: Oli Sennhauser Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:5.6.19 OS:Linux
Assigned to: CPU Architecture:Any
Tags: fulltext search, innodb, performance, query expansion, slow

[30 Jul 2014 15:19] Oli Sennhauser
Description:
Running InnoDB Fulltext Search Queries with QUERY EXPANSION runs horribly slow compared to with QE.
We could accept factor 2 - 4 slower. But it is factor 100 slower.

From MySQL Doku: "It works by performing the search twice, where the search phrase for the second search is the original search phrase concatenated with the few most highly relevant documents from the first search."

How to repeat:
explain
SELECT SQL_CALC_FOUND_ROWS v.ID
     , MATCH (v.title, v.description) AGAINST ("gezielt und getroffen" IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION) AS score
  FROM cameras_previews v
 RIGHT JOIN cameras_general c ON v.VID=c.ID
 WHERE v.active=1
   AND v.isDeleted = 0
   AND v.cover != ''
   AND c.licenseby NOT IN (4, 8865)
   AND v.price > 0
   AND FIND_IN_SET(314, v.excludefromdomains)=0
   AND v.ID != 71378
   AND (c.connector = 314 OR c.connector = 314)
HAVING score > 5
 ORDER BY score DESC
 LIMIT 0, 5
;

+----+-------------+-------+--------+------------------------------------------+-----------+---------+----------------+-------+----------------------------------------------------+
| id | select_type | table | type   | possible_keys                            | key       | key_len | ref            | rows  | Extra                                              |
+----+-------------+-------+--------+------------------------------------------+-----------+---------+----------------+-------+----------------------------------------------------+
|  1 | SIMPLE      | v     | ref    | PRIMARY,VID,price,isDeleted,active,cover | isDeleted | 1       | const          | 36080 | Using index condition; Using where; Using filesort |
|  1 | SIMPLE      | c     | eq_ref | PRIMARY,connector,licenseby              | PRIMARY   | 4       | omnia_v3.v.VID |     1 | Using where                                        |
+----+-------------+-------+--------+------------------------------------------+-----------+---------+----------------+-------+----------------------------------------------------+

+-------+--------------------+
| ID    | score              |
+-------+--------------------+
| 12563 | 1013.9348754882812 |
| 47179 |  994.4049072265625 |
| 15122 |   985.728271484375 |
| 12585 |  943.0606689453125 |
| 68438 |  932.5598754882812 |
+-------+--------------------+
5 rows in set (7.87 sec)

show profile;
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000052 |
| Waiting for query cache lock   | 0.000012 |
| init                           | 0.000011 |
| checking query cache for query | 0.000163 |
| checking permissions           | 0.000019 |
| checking permissions           | 0.000014 |
| Opening tables                 | 0.000042 |
| init                           | 0.000089 |
| System lock                    | 0.000026 |
| Waiting for query cache lock   | 0.000010 |
| System lock                    | 0.000050 |
| optimizing                     | 0.000044 |
| statistics                     | 0.000491 |
| preparing                      | 0.000052 |
| FULLTEXT initialization        | 9.332620 |
| Sorting result                 | 0.000057 |
| executing                      | 0.000012 |
| Sending data                   | 0.000021 |
| Creating sort index            | 0.429925 |
| end                            | 0.000084 |
| query end                      | 0.000038 |
| closing tables                 | 0.000062 |
| freeing items                  | 0.024773 |
| logging slow query             | 0.000217 |
| cleaning up                    | 0.000884 |
+--------------------------------+----------+
25 rows in set, 1 warning (0.01 sec)
[30 Jul 2014 15:20] Oli Sennhauser
Without QE it runs in 40 - 80 ms.
[17 Sep 2014 6:30] MySQL Verification Team
Hello Oli,

Thank you for the report and test case.
I'm not seeing much difference with/without QE with the dummy data i.e 1.4s vs 6.2s(had to adjust score to >0 in place of >5). 

mysql> SELECT SQL_CALC_FOUND_ROWS v.ID      , MATCH (v.title, v.description) AGAINST ("gezielt und getroffen" IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION) AS score   FROM cameras_previews v  RIGHT JOIN cameras_general c ON v.VID=c.ID  WHERE v.active=1    AND v.isDeleted = 0    AND v.cover != ''    AND c.licenseby NOT IN (4, 8865)    AND v.price > 0    AND FIND_IN_SET(314, v.excludefromdomains)=0    AND v.ID != 71378    AND (c.connector = 314 OR c.connector = 314) HAVING score > 0  ORDER BY score DESC  LIMIT 0, 5;
+--------+-----------------------+
| ID     | score                 |
+--------+-----------------------+
| 153215 | 0.0001641505805309862 |
|  83270 | 0.0001641505805309862 |
| 181748 | 0.0001641505805309862 |
|  67487 | 0.0001641505805309862 |
|  43340 | 0.0001641505805309862 |
+--------+-----------------------+
5 rows in set (6.21 sec)

mysql> SELECT SQL_CALC_FOUND_ROWS v.ID      , MATCH (v.title, v.description) AGAINST ("gezielt und getroffen" IN NATURAL LANGUAGE MODE) AS score   FROM cameras_previews v  RIGHT JOIN cameras_general c ON v.VID=c.ID  WHERE v.active=1    AND v.isDeleted = 0    AND v.cover != ''    AND c.licenseby NOT IN (4, 8865)    AND v.price > 0    AND FIND_IN_SET(314, v.excludefromdomains)=0    AND v.ID != 71378    AND (c.connector = 314 OR c.connector = 314) HAVING score > 0  ORDER BY score DESC  LIMIT 0, 5;
+--------+------------------------+
| ID     | score                  |
+--------+------------------------+
| 153215 | 0.00004103764513274655 |
|  83270 | 0.00004103764513274655 |
| 181748 | 0.00004103764513274655 |
|  67487 | 0.00004103764513274655 |
|  43340 | 0.00004103764513274655 |
+--------+------------------------+
5 rows in set (1.41 sec)

Could you please provide exact repeatable test case(may be with private note), i.e where you observed factor 100 slower?

Thanks,
Umesh
[18 Oct 2014 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".