MySQL Bugs: #59266: EXPLAIN output does not show full length of clustered indexes

Bug #59266	EXPLAIN output does not show full length of clustered indexes
Submitted:	4 Jan 2011 1:28	Modified:	5 Jan 2011 13:25
Reporter:	Andrew Dalgleish	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S4 (Feature request)
Version:		OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
With InnoDB, a secondary index includes the primary key columns.

When one of these indexes is used, the output of EXPLAIN only lists the length of the actual index part, not the length of the primary key columns.

A customer reported the following situation:

We have a table with a primary index on (col1, col2) and an secondary index on (col3).
A query is done which refers to col2 in the WHERE clause:
SELECT COUNT(*) FROM table1 WHERE col2="X";

Initially you would expect a full-table scan, or to use the primary key.
Instead, the optimizer uses the secondary index on col3 as this includes a reference to col2.
This was confusing initially as the output of EXPLAIN only shows the length of col3 for the key_len.

Can the EXPLAIN output be expanded to include this extra length?

How to repeat:
Test script:

-----

DROP TABLE IF EXISTS table1;
CREATE TABLE table1
(
        col1    INT,
        col2    VARCHAR(100),
        col3    INT,
        col4    VARCHAR(100),
        col5    VARCHAR(100),
        col6    VARCHAR(100),
        col7    VARCHAR(100),
        col8    VARCHAR(100),
        col9    VARCHAR(100),
        col10   VARCHAR(100),
        PRIMARY KEY (col1, col2),
        KEY (col3)
) ENGINE=InnoDB;

INSERT INTO table1 (col1, col2, col3) VALUES (1, "A", 1), (1, "B", 2), (1, "C", 3), (1, "D", 4);
INSERT INTO table1 (col1, col2, col3) VALUES (2, "A", 1), (2, "B", 2), (2, "C", 3), (2, "D", 4);

EXPLAIN EXTENDED SELECT COUNT(*) FROM table1 WHERE col2 = "B";
SHOW WARNINGS;

FLUSH STATUS;
SELECT COUNT(*) FROM table1 WHERE col2 = "B";
SHOW STATUS LIKE 'Handler_read_%';

DROP TABLE IF EXISTS table1;
-----

Output is as follows:

+----+-------------+--------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+--------+-------+---------------+------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | table1 | index | NULL          | col3 | 5       | NULL |    8 |   100.00 | Using where; Using index |
+----+-------------+--------+-------+---------------+------+---------+------+------+----------+--------------------------+

+-------+------+-----------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                 |
+-------+------+-----------------------------------------------------------------------------------------+
| Note  | 1003 | select count(0) AS `COUNT(*)` from `test`.`table1` where (`test`.`table1`.`col2` = 'B') |
+-------+------+-----------------------------------------------------------------------------------------+

+----------+
| COUNT(*) |
+----------+
|        2 |
+----------+

+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 1     |
| Handler_read_key      | 2     |
| Handler_read_next     | 8     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+

If index on col3 is used it is a bug by itself...

Now, what additional information this full length will give us, taking into account that access path is "index", that is, entire index is scanned anyway (see http://dev.mysql.com/doc/refman/5.1/en/explain-output.html#jointype_index)? Just check index definition to see columns it is defined on.

The use of the index on col3 is not a bug; using the secondary index on col3 (which also includes col1 and col2) uses fewer disk IOs than using either the primary index on (col1, col2) or a full table scan.

This is mentioned in the manual:
-----
It is possible that key will name an index that is not present in the possible_keys value. This can happen if none of the possible_keys indexes are suitable for looking up rows, but all the columns selected by the query are columns of some other index. That is, the named index covers the selected columns, so although it is not used to determine which rows to retrieve, an index scan is more efficient than a data row scan. 
-----

The only issue is that the EXPLAIN output doesn't make it clear when the primary columns of a secondary index are used in this way, and doesn't show the full length of the index used.

OK, you are right. For InnoDB table with that many long columns in the table scan of entire primary key means scan of entire big table. Scan of index on col3, where only col3 + two columns from primary key are used, is, indeed, faster.

Anyway, as "index" access path means that entire index is scanned, what is the point to add exact size of index record for this specific case of InnoDB table?

The only point is that the length displayed is not the length actually used.

It is not critical as you can infer it once you understand what is happening, but it would be nice to have it spelled out though.