Description:
Hi,
I have a weird situation where select * is reporting less rows_examined than only selecting the fields of an index. The query is not fully optimized by the index: one part of the where-clause is an equality on the first field of the index, and the other part is filtering on the third field of the index, so I would expect all rows matching the first part of the where-clause to be "examined" to perform the 2nd filtering. This is what I get when I select the fields of the index (38,743 in How to repeat), but when I do select *, rows_examined is much less and equal to the row sent (3 in How to repeat). This discrepancy in rows_examined does not make sense to me as I do not understand how the select * could be served more than 10,000 times "more efficiently" than selecting only the fields of the index. See How to repeat for details.
As shown in How to repeat, all of Handler_read_next from statues, rows_examined from slow log and SUM_ROWS_EXAMINED from p_s.events_statements_summary_by_digest are showing this weirdness.
I am reporting this as an InnoDB bug because the global status Innodb_rows_read is also showing this weirdness.
Also shown in How to repeat, the select * is "faster" than the other select by 33%, which I do not explain, but not to the point of doing 10,000 times "less work" as reported by rows_examined.
This bad reporting of rows_examined is misleading as it is making us think that a query is relatively efficient, even though it is not that efficient and would greatly benefit from adding another index. And as this is impeding query optimisation, I am filing this as a S2 (serious) bug
How to repeat is with 8.0.26, but 5.7.35 is also affected (for 5.7.35, select * is 50% faster than the other), reproduction in 1st comment.
Many thanks for looking into this,
Jean-François Gagné
How to repeat:
# Create a sandbox with the slow log enabled in a table and logging everything.
$ dbdeployer deploy single mysql_8.0.26 -c "slow_query_log = ON" -c "log_output = 'TABLE'" -c "long_query_time = 0"
# Initialize data.
$ ./use <<< "create database test_jfg"
$ zcat ~/tmp/t.sql.gz | ./use test_jfg
# Get in a mysql client skipping hex encoding to be able to select from mysql.slow_log without encoding sql_text.
$ ./use test_jfg --skip-binary-as-hex
# Show the structure of the table we are querying.
mysql [localhost:8026] {msandbox} (test_jfg) > show create table t\G
*************************** 1. row ***************************
Table: t
Create Table: CREATE TABLE `t` (
`customerId` int unsigned NOT NULL,
`type` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
`requestedAt` bigint unsigned NOT NULL,
`active` tinyint(1) NOT NULL DEFAULT '1',
`completedAt` bigint unsigned DEFAULT NULL,
`c` char(10) DEFAULT NULL,
PRIMARY KEY (`customerId`,`type`,`requestedAt`),
KEY `type` (`type`,`customerId`,`active`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
# Get the number of rows matching type = '0-52'.
mysql [localhost:8026] {msandbox} (test_jfg) > select count(*) from t where type = '0-52';
+----------+
| count(*) |
+----------+
| 38743 |
+----------+
1 row in set (0.01 sec)
# Truncate the slow log and performance_schema.events_statements_summary_by_digest for easier querying.
mysql [localhost:8026] {msandbox} (test_jfg) > truncate mysql.slow_log; truncate performance_schema.events_statements_summary_by_digest;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
# Do a query served by the index, but where we still need to scan many rows because the 2nd field of the index is not in the where-clause but the 3rd is, pager set to avoid printing the rows returned.
mysql [localhost:8026] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select type, customerId, active, requestedAt from t where type = '0-52' and active = 1; pager; show status like "Handler_read%";
Query OK, 0 rows affected (0.00 sec)
PAGER set to 'cat > /dev/null'
3 rows in set (0.03 sec)
Default pager wasn't set, using stdout.
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 0 |
| Handler_read_key | 1 |
| Handler_read_last | 0 |
| Handler_read_next | 38743 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
+-----------------------+-------+
7 rows in set (0.01 sec)
# So we can see that the query above scanned 38743 rows, which is expected with the select count(*) above. The slow log and p_s are giving us the same thing.
mysql [localhost:8026] {msandbox} (test_jfg) > select rows_sent, rows_examined, sql_text from mysql.slow_log where sql_text like '%where type % and active%';
+-----------+---------------+----------------------------------------------------------------------------------------+
| rows_sent | rows_examined | sql_text |
+-----------+---------------+----------------------------------------------------------------------------------------+
| 3 | 38743 | select type, customerId, active, requestedAt from t where type = '0-52' and active = 1 |
+-----------+---------------+----------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql [localhost:8026] {msandbox} (test_jfg) > select DIGEST_TEXT, SUM_ROWS_SENT, SUM_ROWS_EXAMINED from performance_schema.events_statements_summary_by_digest WHERE DIGEST_TEXT like '%WHERE TYPE = %';
+--------------------------------------------------------------------------------------------+---------------+-------------------+
| DIGEST_TEXT | SUM_ROWS_SENT | SUM_ROWS_EXAMINED |
+--------------------------------------------------------------------------------------------+---------------+-------------------+
| SELECT TYPE , `customerId` , ACTIVE , `requestedAt` FROM `t` WHERE TYPE = ? AND ACTIVE = ? | 3 | 38743 |
+--------------------------------------------------------------------------------------------+---------------+-------------------+
1 row in set (0.00 sec)
# Let's truncate the slow log and p_s table, and redo everything but with select *, and we get a surprising rows_examined.
mysql [localhost:8026] {msandbox} (test_jfg) > truncate mysql.slow_log; truncate performance_schema.events_statements_summary_by_digest; flush status; pager cat > /dev/null; select * from t where type = '0-52' and active = 1; pager; show status like "Handler_read%"; select rows_sent, rows_examined, sql_text from mysql.slow_log where sql_text like '%where type%and active%'; select DIGEST_TEXT, SUM_ROWS_SENT, SUM_ROWS_EXAMINED from performance_schema.events_statements_summary_by_digest WHERE DIGEST_TEXT like '%WHERE TYPE = %';
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
PAGER set to 'cat > /dev/null'
3 rows in set (0.02 sec)
Default pager wasn't set, using stdout.
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 0 |
| Handler_read_key | 1 |
| Handler_read_last | 0 |
| Handler_read_next | 3 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
+-----------------------+-------+
7 rows in set (0.00 sec)
+-----------+---------------+----------------------------------------------------+
| rows_sent | rows_examined | sql_text |
+-----------+---------------+----------------------------------------------------+
| 3 | 3 | select * from t where type = '0-52' and active = 1 |
+-----------+---------------+----------------------------------------------------+
1 row in set (0.00 sec)
+-------------------------------------------------+---------------+-------------------+
| DIGEST_TEXT | SUM_ROWS_SENT | SUM_ROWS_EXAMINED |
+-------------------------------------------------+---------------+-------------------+
| SELECT * FROM `t` WHERE TYPE = ? AND ACTIVE = ? | 3 | 3 |
+-------------------------------------------------+---------------+-------------------+
1 row in set (0.00 sec)
# I do not understand how a query with select * can scan less rows than a query returning the fields of a covering index. Both explains below, the Extra is a little different, but rows are the same.
mysql [localhost:8026] {msandbox} (test_jfg) > explain select type, customerId, active, requestedAt from t where type = '0-52' and active = 1;
+----+-------------+-------+------------+------+---------------+------+---------+-------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+-------+----------+--------------------------+
| 1 | SIMPLE | t | NULL | ref | type | type | 22 | const | 35560 | 10.00 | Using where; Using index |
+----+-------------+-------+------------+------+---------------+------+---------+-------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
mysql [localhost:8026] {msandbox} (test_jfg) > explain select * from t where type = '0-52' and active = 1;
+----+-------------+-------+------------+------+---------------+------+---------+-------+-------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+-------+----------+-----------------------+
| 1 | SIMPLE | t | NULL | ref | type | type | 22 | const | 35560 | 10.00 | Using index condition |
+----+-------------+-------+------------+------+---------------+------+---------+-------+-------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)
# If we look at what InnoDB is reporting in Innodb_rows_read, we see results that are consistent with rows_examined above, which makes me think this is an InnoDB bug.
mysql [localhost:8026] {msandbox} (test_jfg) > show global status like 'Innodb_rows_read'; pager cat > /dev/null; select type, customerId, active, requestedAt from t where type = '0-52' and active = 1; pager; show global status like 'Innodb_rows_read';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| Innodb_rows_read | 77489 |
+------------------+-------+
1 row in set (0.00 sec)
PAGER set to 'cat > /dev/null'
3 rows in set (0.02 sec)
Default pager wasn't set, using stdout.
+------------------+--------+
| Variable_name | Value |
+------------------+--------+
| Innodb_rows_read | 116232 |
+------------------+--------+
1 row in set (0.00 sec)
mysql [localhost:8026] {msandbox} (test_jfg) > show global status like 'Innodb_rows_read'; pager cat > /dev/null; select * from t where type = '0-52' and active = 1; pager; show global status like 'Innodb_rows_read';
+------------------+--------+
| Variable_name | Value |
+------------------+--------+
| Innodb_rows_read | 116232 |
+------------------+--------+
1 row in set (0.00 sec)
PAGER set to 'cat > /dev/null'
3 rows in set (0.01 sec)
Default pager wasn't set, using stdout.
+------------------+--------+
| Variable_name | Value |
+------------------+--------+
| Innodb_rows_read | 116235 |
+------------------+--------+
1 row in set (0.00 sec)
# When comparing the performance of the two queries, it looks like the select * is 33% faster than selecting the fields of the index, which I do not explain, but not to the point of scanning 10,000 less rows (I did the comparison a few times, and the results are all similar).
$ n=1000; time yes "select type, customerId, active, requestedAt from t where type = '0-52' and active = 1;" | head -n $n | ./use test_jfg > /dev/null; time yes "select * from t where type = '0-52' and active = 1;" | head -n $n | ./use test_jfg > /dev/null
real 0m15.739s
user 0m0.081s
sys 0m0.056s
real 0m10.646s
user 0m0.067s
sys 0m0.045s