Bug #74686 Wrong relevance ranking for InnoDB full text searches under certain conditions
Submitted: 4 Nov 2014 15:29 Modified: 10 Mar 2015 17:52
Reporter: Tim McLaughlin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S2 (Serious)
Version:5.6.21 OS:Linux
Assigned to: CPU Architecture:Any
Tags: full text, fulltext, innodb, unique index

[4 Nov 2014 15:29] Tim McLaughlin
Description:
An InnoDB boolean mode full text search produces the wrong relevancy ranking in some cases. It appears to happen when there's a separate unique index and full text index that includes the same field.

How to repeat:
DROP TABLE IF EXISTS `characters`;

# Create a table with a fulltext and unique index that includes 
# the same column (i.e. first_name)
CREATE TABLE `characters` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `first_name` varchar(50) NOT NULL,
  `last_name` varchar(50) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx_1` (`first_name`, `last_name`),
  FULLTEXT KEY `idx_2` (`first_name`)
) ENGINE=InnoDB;

INSERT INTO `characters` (`id`, `first_name`, `last_name`) VALUES
(1, 'Bart', 'Simpson'),
(2, 'Homer', 'Simpson'),
(3, 'Marge', 'Simpson'),
(4, 'Lisa', 'Simpson'),
(5, 'Maggie', 'Simpson'),
(6, 'Ned', 'Flanders'),
(7, 'Nelson', 'Muntz');

# Perform a full text search for "Homer". 
# This returns the wrong results; Homer Simpson's score should be > 0
SELECT id, first_name, last_name, MATCH(first_name) AGAINST('Homer' IN BOOLEAN MODE) AS score FROM characters;
+----+------------+-----------+-------+
| id | first_name | last_name | score |
+----+------------+-----------+-------+
|  1 | Bart       | Simpson   |     0 |
|  2 | Homer      | Simpson   |     0 |
|  4 | Lisa       | Simpson   |     0 |
|  5 | Maggie     | Simpson   |     0 |
|  3 | Marge      | Simpson   |     0 |
|  6 | Ned        | Flanders  |     0 |
|  7 | Nelson     | Muntz     |     0 |
+----+------------+-----------+-------+

# Run the same full text search as above except add "ORDER BY id".
# This returns the expected results.
SELECT id, first_name, last_name, MATCH(first_name) AGAINST('Homer' IN BOOLEAN MODE) AS score FROM characters ORDER BY id;
+----+------------+-----------+--------------------+
| id | first_name | last_name | score              |
+----+------------+-----------+--------------------+
|  1 | Bart       | Simpson   |                  0 |
|  2 | Homer      | Simpson   | 0.7141907215118408 |
|  3 | Marge      | Simpson   |                  0 |
|  4 | Lisa       | Simpson   |                  0 |
|  5 | Maggie     | Simpson   |                  0 |
|  6 | Ned        | Flanders  |                  0 |
|  7 | Nelson     | Muntz     |                  0 |
+----+------------+-----------+--------------------+

# Run the same full text search as above except add "ORDER BY score".
# This returns the expected results.
SELECT id, first_name, last_name, MATCH(first_name) AGAINST('Homer' IN BOOLEAN MODE) AS score FROM characters ORDER BY score;
+----+------------+-----------+--------------------+
| id | first_name | last_name | score              |
+----+------------+-----------+--------------------+
|  1 | Bart       | Simpson   |                  0 |
|  3 | Marge      | Simpson   |                  0 |
|  4 | Lisa       | Simpson   |                  0 |
|  5 | Maggie     | Simpson   |                  0 |
|  6 | Ned        | Flanders  |                  0 |
|  7 | Nelson     | Muntz     |                  0 |
|  2 | Homer      | Simpson   | 0.7141907215118408 |
+----+------------+-----------+--------------------+

# Repeat the original full text search
# Wrong results; this time all scores are > 0
SELECT id, first_name, last_name, MATCH(first_name) AGAINST('Homer' IN BOOLEAN MODE) AS score FROM characters;
+----+------------+-----------+--------------------+
| id | first_name | last_name | score              |
+----+------------+-----------+--------------------+
|  1 | Bart       | Simpson   | 0.7141907215118408 |
|  2 | Homer      | Simpson   | 0.7141907215118408 |
|  4 | Lisa       | Simpson   | 0.7141907215118408 |
|  5 | Maggie     | Simpson   | 0.7141907215118408 |
|  3 | Marge      | Simpson   | 0.7141907215118408 |
|  6 | Ned        | Flanders  | 0.7141907215118408 |
|  7 | Nelson     | Muntz     | 0.7141907215118408 |
+----+------------+-----------+--------------------+

# Drop the unique index
ALTER TABLE `characters` DROP INDEX `idx_1`;

# The original full text search returns the expected results
SELECT id, first_name, last_name, MATCH(first_name) AGAINST('Homer' IN BOOLEAN MODE) AS score FROM characters;
+----+------------+-----------+--------------------+
| id | first_name | last_name | score              |
+----+------------+-----------+--------------------+
|  1 | Bart       | Simpson   |                  0 |
|  2 | Homer      | Simpson   | 0.7141907215118408 |
|  3 | Marge      | Simpson   |                  0 |
|  4 | Lisa       | Simpson   |                  0 |
|  5 | Maggie     | Simpson   |                  0 |
|  6 | Ned        | Flanders  |                  0 |
|  7 | Nelson     | Muntz     |                  0 |
+----+------------+-----------+--------------------+

# The full text search with the "ORDER BY score" clause returns
# the expected results as well.
SELECT id, first_name, last_name, MATCH(first_name) AGAINST('Homer' IN BOOLEAN MODE) AS score FROM characters ORDER BY score;
+----+------------+-----------+--------------------+
| id | first_name | last_name | score              |
+----+------------+-----------+--------------------+
|  1 | Bart       | Simpson   |                  0 |
|  3 | Marge      | Simpson   |                  0 |
|  4 | Lisa       | Simpson   |                  0 |
|  5 | Maggie     | Simpson   |                  0 |
|  6 | Ned        | Flanders  |                  0 |
|  7 | Nelson     | Muntz     |                  0 |
|  2 | Homer      | Simpson   | 0.7141907215118408 |
+----+------------+-----------+--------------------+

DROP TABLE IF EXISTS `characters`;
[4 Nov 2014 16:20] Miguel Solorzano
C:\dbs>c:\dbs\5.6\bin\mysql -uroot --port=3560 --debug-info --prompt="mysql 5.6 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.22-log Source distribution

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.6 > CREATE DATABASE dx;
Query OK, 1 row affected (0.00 sec)

mysql 5.6 > USE dx
Database changed
mysql 5.6 > CREATE TABLE `characters` (
    ->   `id` int(11) NOT NULL AUTO_INCREMENT,
    ->   `first_name` varchar(50) NOT NULL,
    ->   `last_name` varchar(50) NOT NULL,
    ->   PRIMARY KEY (`id`),
    ->   UNIQUE KEY `idx_1` (`first_name`, `last_name`),
    ->   FULLTEXT KEY `idx_2` (`first_name`)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.31 sec)

mysql 5.6 > INSERT INTO `characters` (`id`, `first_name`, `last_name`) VALUES
    -> (1, 'Bart', 'Simpson'),
    -> (2, 'Homer', 'Simpson'),
    -> (3, 'Marge', 'Simpson'),
    -> (4, 'Lisa', 'Simpson'),
    -> (5, 'Maggie', 'Simpson'),
    -> (6, 'Ned', 'Flanders'),
    -> (7, 'Nelson', 'Muntz');
Query OK, 7 rows affected (0.03 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql 5.6 > # Perform a full text search for "Homer".
mysql 5.6 > # This returns the wrong results; Homer Simpson's score should be > 0
mysql 5.6 > SELECT id, first_name, last_name, MATCH(first_name) AGAINST('Homer' IN BOOLEAN MODE) AS score FROM characters;
+----+------------+-----------+-------+
| id | first_name | last_name | score |
+----+------------+-----------+-------+
|  1 | Bart       | Simpson   |     0 |
|  2 | Homer      | Simpson   |     0 |
|  4 | Lisa       | Simpson   |     0 |
|  5 | Maggie     | Simpson   |     0 |
|  3 | Marge      | Simpson   |     0 |
|  6 | Ned        | Flanders  |     0 |
|  7 | Nelson     | Muntz     |     0 |
+----+------------+-----------+-------+
7 rows in set (0.02 sec)

mysql 5.6 > ALTER TABLE characters ENGINE=MyISAM;
Query OK, 7 rows affected (0.30 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql 5.6 > SELECT id, first_name, last_name, MATCH(first_name) AGAINST('Homer' IN BOOLEAN MODE) AS score FROM characters;
+----+------------+-----------+-------+
| id | first_name | last_name | score |
+----+------------+-----------+-------+
|  1 | Bart       | Simpson   |     0 |
|  2 | Homer      | Simpson   |     1 |
|  3 | Marge      | Simpson   |     0 |
|  4 | Lisa       | Simpson   |     0 |
|  5 | Maggie     | Simpson   |     0 |
|  6 | Ned        | Flanders  |     0 |
|  7 | Nelson     | Muntz     |     0 |
+----+------------+-----------+-------+
7 rows in set (0.00 sec)
[4 Nov 2014 16:24] Miguel Solorzano
Thank you for the bug report. Not repeatable on 5.7 source server.

C:\dbs>c:\dbs\5.7\bin\mysql -uroot --port=3570 --prompt="mysql 5.7 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.6-m16 Source distribution

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.7 > CREATE DATABASE dx;
Query OK, 1 row affected (0.00 sec)

mysql 5.7 > USE dx
Database changed
mysql 5.7 > CREATE TABLE `characters` (
    ->   `id` int(11) NOT NULL AUTO_INCREMENT,
    ->   `first_name` varchar(50) NOT NULL,
    ->   `last_name` varchar(50) NOT NULL,
    ->   PRIMARY KEY (`id`),
    ->   UNIQUE KEY `idx_1` (`first_name`, `last_name`),
    ->   FULLTEXT KEY `idx_2` (`first_name`)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (1.72 sec)

mysql 5.7 > INSERT INTO `characters` (`id`, `first_name`, `last_name`) VALUES
    -> (1, 'Bart', 'Simpson'),
    -> (2, 'Homer', 'Simpson'),
    -> (3, 'Marge', 'Simpson'),
    -> (4, 'Lisa', 'Simpson'),
    -> (5, 'Maggie', 'Simpson'),
    -> (6, 'Ned', 'Flanders'),
    -> (7, 'Nelson', 'Muntz');
Query OK, 7 rows affected (0.05 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql 5.7 > # Perform a full text search for "Homer".
mysql 5.7 > # This returns the wrong results; Homer Simpson's score should be > 0
mysql 5.7 > SELECT id, first_name, last_name, MATCH(first_name) AGAINST('Homer' IN BOOLEAN MODE) AS score FROM characters;
+----+------------+-----------+--------------------+
| id | first_name | last_name | score              |
+----+------------+-----------+--------------------+
|  1 | Bart       | Simpson   |                  0 |
|  2 | Homer      | Simpson   | 0.7141907215118408 |
|  3 | Marge      | Simpson   |                  0 |
|  4 | Lisa       | Simpson   |                  0 |
|  5 | Maggie     | Simpson   |                  0 |
|  6 | Ned        | Flanders  |                  0 |
|  7 | Nelson     | Muntz     |                  0 |
+----+------------+-----------+--------------------+
7 rows in set (0.00 sec)
[10 Mar 2015 17:52] Paul Dubois
Noted in 5.6.25, 5.7.7, 5.8.0 changelogs.

For full-text searches, the optimizer could choose an index not
appropriate for producing correct relevancy rankings.
[23 Jun 2015 15:26] Laurynas Biveinis
commit 27362717a24d4563774f2a99caab7bf13546da93
Author: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
Date:   Thu Feb 26 15:53:52 2015 +0530

    Bug #19950568: WRONG RELEVANCE RANKING FOR INNODB FULL TEXT SEARCHES UNDER
               CERTAIN CONDITIONS
    
    Problem:
    Optimizer chooses a wrong index to fetch the doc_id which is
    present only in a full-text index
    
    Analysis:
    For the testcase presented, score can be got only when idx2 is
    used. But optimizer checks that idx_1 is covering and sets
    that index to be used. Since doc_id is needed to get score,
    and as doc_id can only be got from idx_2, optimizer
    should not set idx_1 as covering.
    
    Solution:
    This is backport from 5.7. As part of the solution a new field
    is added in table structure and is set when FTS_DOC_ID field
    is present. no_keyread is set to true in this case indicating
    that no key can be set as covering.
[23 Jun 2015 15:32] Laurynas Biveinis
commit 76568a2799764fff566a95edcd3b7c8aee9a075d
Author: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
Date:   Mon Mar 2 13:47:00 2015 +0530

    Bug#19950568 - WRONG RELEVANCE RANKING FOR INNODB FULL TEXT SEARCHES
               UNDER CERTAIN CONDITIONS
    
    Post push fix for test case failure