Bug #104659 Select * reporting less rows_examined than only fields in an index.
Submitted: 18 Aug 2021 22:19 Modified: 14 Mar 16:45
Reporter: Jean-François Gagné Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:8.0.26, 5.7.35 OS:Any
Assigned to: CPU Architecture:Any

[18 Aug 2021 22:19] Jean-François Gagné
Description:
Hi,

I have a weird situation where select * is reporting less rows_examined than only selecting the fields of an index.  The query is not fully optimized by the index: one part of the where-clause is an equality on the first field of the index, and the other part is filtering on the third field of the index, so I would expect all rows matching the first part of the where-clause to be "examined" to perform the 2nd filtering.  This is what I get when I select the fields of the index (38,743 in How to repeat), but when I do select *, rows_examined is much less and equal to the row sent (3 in How to repeat).  This discrepancy in rows_examined does not make sense to me as I do not understand how the select * could be served more than 10,000 times "more efficiently" than selecting only the fields of the index.  See How to repeat for details.

As shown in How to repeat, all of Handler_read_next from statues, rows_examined from slow log and SUM_ROWS_EXAMINED from p_s.events_statements_summary_by_digest are showing this weirdness.

I am reporting this as an InnoDB bug because the global status Innodb_rows_read is also showing this weirdness.

Also shown in How to repeat, the select * is "faster" than the other select by 33%, which I do not explain, but not to the point of doing 10,000 times "less work" as reported by rows_examined.

This bad reporting of rows_examined is misleading as it is making us think that a query is relatively efficient, even though it is not that efficient and would greatly benefit from adding another index.  And as this is impeding query optimisation, I am filing this as a S2 (serious) bug

How to repeat is with 8.0.26, but 5.7.35 is also affected (for 5.7.35, select * is 50% faster than the other), reproduction in 1st comment.

Many thanks for looking into this,

Jean-François Gagné

How to repeat:
# Create a sandbox with the slow log enabled in a table and logging everything.
$ dbdeployer deploy single mysql_8.0.26 -c "slow_query_log = ON" -c "log_output = 'TABLE'" -c "long_query_time = 0"

# Initialize data.
$ ./use <<< "create database test_jfg"
$ zcat ~/tmp/t.sql.gz | ./use test_jfg

# Get in a mysql client skipping hex encoding to be able to select from mysql.slow_log without encoding sql_text.
$ ./use test_jfg --skip-binary-as-hex

# Show the structure of the table we are querying.
mysql [localhost:8026] {msandbox} (test_jfg) > show create table t\G
*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE `t` (
  `customerId` int unsigned NOT NULL,
  `type` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
  `requestedAt` bigint unsigned NOT NULL,
  `active` tinyint(1) NOT NULL DEFAULT '1',
  `completedAt` bigint unsigned DEFAULT NULL,
  `c` char(10) DEFAULT NULL,
  PRIMARY KEY (`customerId`,`type`,`requestedAt`),
  KEY `type` (`type`,`customerId`,`active`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

# Get the number of rows matching type = '0-52'.
mysql [localhost:8026] {msandbox} (test_jfg) > select count(*) from t where type = '0-52';
+----------+
| count(*) |
+----------+
|    38743 |
+----------+
1 row in set (0.01 sec)

# Truncate the slow log and performance_schema.events_statements_summary_by_digest for easier querying.
mysql [localhost:8026] {msandbox} (test_jfg) > truncate mysql.slow_log; truncate performance_schema.events_statements_summary_by_digest;
Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

# Do a query served by the index, but where we still need to scan many rows because the 2nd field of the index is not in the where-clause but the 3rd is, pager set to avoid printing the rows returned.
mysql [localhost:8026] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select type, customerId, active, requestedAt from t where type = '0-52' and active = 1; pager; show status like "Handler_read%";
Query OK, 0 rows affected (0.00 sec)

PAGER set to 'cat > /dev/null'
3 rows in set (0.03 sec)

Default pager wasn't set, using stdout.
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 1     |
| Handler_read_last     | 0     |
| Handler_read_next     | 38743 |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.01 sec)

# So we can see that the query above scanned 38743 rows, which is expected with the select count(*) above.  The slow log and p_s are giving us the same thing.
mysql [localhost:8026] {msandbox} (test_jfg) > select rows_sent, rows_examined, sql_text from mysql.slow_log where sql_text like '%where type % and active%';
+-----------+---------------+----------------------------------------------------------------------------------------+
| rows_sent | rows_examined | sql_text                                                                               |
+-----------+---------------+----------------------------------------------------------------------------------------+
|         3 |         38743 | select type, customerId, active, requestedAt from t where type = '0-52' and active = 1 |
+-----------+---------------+----------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql [localhost:8026] {msandbox} (test_jfg) > select DIGEST_TEXT, SUM_ROWS_SENT, SUM_ROWS_EXAMINED from performance_schema.events_statements_summary_by_digest WHERE DIGEST_TEXT like '%WHERE TYPE = %';
+--------------------------------------------------------------------------------------------+---------------+-------------------+
| DIGEST_TEXT                                                                                | SUM_ROWS_SENT | SUM_ROWS_EXAMINED |
+--------------------------------------------------------------------------------------------+---------------+-------------------+
| SELECT TYPE , `customerId` , ACTIVE , `requestedAt` FROM `t` WHERE TYPE = ? AND ACTIVE = ? |             3 |             38743 |
+--------------------------------------------------------------------------------------------+---------------+-------------------+
1 row in set (0.00 sec)

# Let's truncate the slow log and p_s table, and redo everything but with select *, and we get a surprising rows_examined.
mysql [localhost:8026] {msandbox} (test_jfg) > truncate mysql.slow_log; truncate performance_schema.events_statements_summary_by_digest; flush status; pager cat > /dev/null; select * from t where type = '0-52' and active = 1; pager; show status like "Handler_read%"; select rows_sent, rows_examined, sql_text from mysql.slow_log where sql_text like '%where type%and active%'; select DIGEST_TEXT, SUM_ROWS_SENT, SUM_ROWS_EXAMINED from performance_schema.events_statements_summary_by_digest WHERE DIGEST_TEXT like '%WHERE TYPE = %';
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

PAGER set to 'cat > /dev/null'
3 rows in set (0.02 sec)

Default pager wasn't set, using stdout.
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 1     |
| Handler_read_last     | 0     |
| Handler_read_next     | 3     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.00 sec)

+-----------+---------------+----------------------------------------------------+
| rows_sent | rows_examined | sql_text                                           |
+-----------+---------------+----------------------------------------------------+
|         3 |             3 | select * from t where type = '0-52' and active = 1 |
+-----------+---------------+----------------------------------------------------+
1 row in set (0.00 sec)

+-------------------------------------------------+---------------+-------------------+
| DIGEST_TEXT                                     | SUM_ROWS_SENT | SUM_ROWS_EXAMINED |
+-------------------------------------------------+---------------+-------------------+
| SELECT * FROM `t` WHERE TYPE = ? AND ACTIVE = ? |             3 |                 3 |
+-------------------------------------------------+---------------+-------------------+
1 row in set (0.00 sec)

# I do not understand how a query with select * can scan less rows than a query returning the fields of a covering index.  Both explains below, the Extra is a little different, but rows are the same.
mysql [localhost:8026] {msandbox} (test_jfg) > explain select type, customerId, active, requestedAt from t where type = '0-52' and active = 1;
+----+-------------+-------+------------+------+---------------+------+---------+-------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows  | filtered | Extra                    |
+----+-------------+-------+------------+------+---------------+------+---------+-------+-------+----------+--------------------------+
|  1 | SIMPLE      | t     | NULL       | ref  | type          | type | 22      | const | 35560 |    10.00 | Using where; Using index |
+----+-------------+-------+------------+------+---------------+------+---------+-------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8026] {msandbox} (test_jfg) > explain select * from t where type = '0-52' and active = 1;
+----+-------------+-------+------------+------+---------------+------+---------+-------+-------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref   | rows  | filtered | Extra                 |
+----+-------------+-------+------------+------+---------------+------+---------+-------+-------+----------+-----------------------+
|  1 | SIMPLE      | t     | NULL       | ref  | type          | type | 22      | const | 35560 |    10.00 | Using index condition |
+----+-------------+-------+------------+------+---------------+------+---------+-------+-------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

# If we look at what InnoDB is reporting in Innodb_rows_read, we see results that are consistent with rows_examined above, which makes me think this is an InnoDB bug.
mysql [localhost:8026] {msandbox} (test_jfg) > show global status like 'Innodb_rows_read'; pager cat > /dev/null; select type, customerId, active, requestedAt from t where type = '0-52' and active = 1; pager; show global status like 'Innodb_rows_read';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Innodb_rows_read | 77489 |
+------------------+-------+
1 row in set (0.00 sec)

PAGER set to 'cat > /dev/null'
3 rows in set (0.02 sec)

Default pager wasn't set, using stdout.
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| Innodb_rows_read | 116232 |
+------------------+--------+
1 row in set (0.00 sec)

mysql [localhost:8026] {msandbox} (test_jfg) > show global status like 'Innodb_rows_read'; pager cat > /dev/null; select * from t where type = '0-52' and active = 1; pager; show global status like 'Innodb_rows_read';
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| Innodb_rows_read | 116232 |
+------------------+--------+
1 row in set (0.00 sec)

PAGER set to 'cat > /dev/null'
3 rows in set (0.01 sec)

Default pager wasn't set, using stdout.
+------------------+--------+
| Variable_name    | Value  |
+------------------+--------+
| Innodb_rows_read | 116235 |
+------------------+--------+
1 row in set (0.00 sec)

# When comparing the performance of the two queries, it looks like the select * is 33% faster than selecting the fields of the index, which I do not explain, but not to the point of scanning 10,000 less rows (I did the comparison a few times, and the results are all similar).
$ n=1000; time yes "select type, customerId, active, requestedAt from t where type = '0-52' and active = 1;" | head -n $n | ./use test_jfg > /dev/null; time yes "select * from t where type = '0-52' and active = 1;" | head -n $n | ./use test_jfg > /dev/null

real    0m15.739s
user    0m0.081s
sys     0m0.056s

real    0m10.646s
user    0m0.067s
sys     0m0.045s
[18 Aug 2021 22:19] Jean-François Gagné
Reproduction for 5.7.35.

Attachment: 5.7.35.txt (text/plain), 10.32 KiB.

[19 Aug 2021 13:00] MySQL Verification Team
Hello Jean-François,

Verified as described.

regards,
Umesh
[20 Aug 2021 10:24] Øystein Grøvlen
I think this is expected since the reported statistics is based on the number of handler calls made.  For the non-covering index scan (select *), index condition push-down will be used.  Hence, the evaluation of the condition "active = 1" will happen in InnoDB, and the server will only see 3 rows.

I am a bit surprised by the behavior of Innodb_rows_read, though.

I am also a bit surprised that skip scan is not used in 8.0.26 for the query that use a covering index.
[28 Sep 2021 18:47] Jean-François Gagné
After discussion with Øystein in MySQL Community Slack [1], I understand better why the SELECT * is faster than only selecting the fields of the index.  The reason is that the SELECT * is using Index Condition Pushdown (ICP) so the rows are scanned in InnoDB and only the rows matching the where-clause are returned by the Storage Engine (Handler) interface.  But the query not needing to do Primary Key (PK) lookups returns many rows through the Storage Engine (SE) interface (38743) and filtering is done in the Server (rows returned 3).  All this data being shuffled around explains why the query that does not need PK lookups (but returning more data via the SE interface) takes more time than the query doing PK lookups but because of ICP is only returning 3 rows via the SE interface.

[1]: https://mysqlcommunity.slack.com/archives/C9AF2DFCL/p1629464563044200
[28 Sep 2021 18:52] Jean-François Gagné
Screenshot of the discussion with Øystein (for when it will not be accessible because of free Slack limited history).

Attachment: Screen Shot 2021-09-28 at 2.40.01 PM.png (image/png, text), 414.07 KiB.

[14 Mar 16:45] Jean-François Gagné
Related: Bug#68554.

Blog post on the subject: https://jfg-mysql.blogspot.com/2022/03/rows-examined-not-trustworthy-because-index-conditi...