MySQL Bugs: #57216: InnoDB column prefix index not treated as a covering one

Bug #57216	InnoDB column prefix index not treated as a covering one
Submitted:	4 Oct 2010 12:59	Modified:	8 Oct 2010 20:08
Reporter:	Marko Mäkelä	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.1, 5.6.99 bzr	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	column prefix, innodb

Description:
I am not sure if this is a bug in the optimizer or in the InnoDB build_template() function, or both. This came up when I was trying to test that my Bug #56680 fix does not break anything in column prefix indexes.

When a secondary index on a column prefix is being used, InnoDB is fetching the clustered index record, even though the secondary index record would be covering for the query.

How to repeat:
CREATE TABLE t1(
       a INT AUTO_INCREMENT PRIMARY KEY,
       b CHAR(10),
       c INT,
       INDEX(b(2)))
ENGINE=InnoDB;

INSERT INTO t1 VALUES(1,'ABCDEFGHIJ',1);
COMMIT;
-- prebuilt->index->name="PRIMARY", but this is OK, as this is a table scan
SELECT SUBSTR(b,1,2) FROM t1;

-- this could be translated into LIKE 'ab%', but will be a table scan instead
SELECT SUBSTR(b,1,2) FROM t1 WHERE SUBSTR(b,1,2)='ab';
-- prebuilt->index->name="b", but fetches from the clustered index too
SELECT SUBSTR(b,1,2) FROM t1 WHERE b LIKE 'ab%';
-- this does not need a clustered index lookup
SELECT a,SUBSTR(b,1,2) FROM t1 WHERE b LIKE 'ab%';

Suggested fix:
Translate
WHERE SUBSTR(col,1,N)='some_string_of_length_N'
to
col LIKE 'some_string_of_length_N%'
and make sure that prebuilt->needs_to_access_clustered=FALSE in these cases.

Furthermore, if InnoDB despite the needs_to_access=clustered=FALSE needs to access the clustered index record for a consistent read (fetching an older version), we could request a long enough prefix of externally stored columns if MySQL told us how long a prefix it needs for evaluating the query. That would avoid unnecessary BLOB I/O.

After fixing this bug, create a test case for Bug #56680 that exercises column prefix indexes, and adapt the code if needed. I cannot see any immediate problems there, but it is better to be safe than sorry.

Problem exists in next-mr. I set breakpoints in mysql_parse and row_search_for_mysql, then checked prebuilt->need_to_access_clustered. It was always 1.

Similar situation here.

CREATE TABLE `users` (
  `user_id` bigint(20) NOT NULL DEFAULT '0',
  `gender` binary(1) NOT NULL DEFAULT 'U',
  `age` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `msa_id` smallint(5) unsigned NOT NULL DEFAULT '0',
  `region_id` smallint(5) unsigned NOT NULL DEFAULT '0',
  `last_seen` date NOT NULL,
  `segments` varbinary(1000) DEFAULT NULL,
  PRIMARY KEY (`user_id`),
  KEY `last_seen` (`last_seen`),
  KEY `segments` (`segments`(1))
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT

Can't get it to just use the index using either of these methods, searching for rows with blank segments field, tried with blank string or is null...

mysql> explain select users.user_id FROM dare.users FORCE INDEX (`segments`) WHERE users.segments !='';+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
|  1 | SIMPLE      | users | range | segments      | segments | 4       | NULL |    1 | Using where |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select users.user_id FROM dare.users FORCE INDEX (`segments`) WHERE users.segments is not null;
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
|  1 | SIMPLE      | users | range | segments      | segments | 4       | NULL |    1 | Using where |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
1 row in set (0.00 sec)

this table has about 350 million rows, and that data page lookup makes such a query take hours (without the force index, it does a full table scan, which is awful as at any given time only about 5% of them will have data there).  This should be a very fast query, just pulling 10-15m records directly from the index dictionary.

I would vote for having code to handle Trey's test for NULL.

As a side issue, I wonder if, when doing the table scan, InnoDB might unnecessarily be fetching the entire field (past 767 bytes, into the extension) when checking for NULL?

But I would warn you that implementing Marko's case (such LEFT(), LIKE), can lead to collation errors.  In uft8_unicode_ci, there could be a multi-character (not just multi-byte) sequence that is broken by the prefix, making it impossible to tell where the record really belongs.  A few other non-"_bin" collations also suffer from this complication, so I suggest looking at the collation to decide whether to make any attempt at using the prefix.

At that point, you may as well disallow creating a prefix index on any field with a collation that needs to compare multi-character (not just multi-byte) sequences to determine equality.