Bug #111969 The performance of version 8.0 when using count(1) is significantly lower compar
Submitted: 4 Aug 2023 9:50 Modified: 4 Aug 2023 14:59
Reporter: fander chan Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.17+ OS:Any
Assigned to: CPU Architecture:Any

[4 Aug 2023 9:50] fander chan
Description:
The newly added records_from_index() function in version 8.0.17 is responsible for this, as it enforces the use of the primary key index to count the number of records in the table.

Prior to version 8.0.16, for the statement select count(*) from table, if there were secondary indexes in the table, InnoDB would choose to perform a full index scan on a secondary index to get the number of records in the table.

Starting from version 8.0.17, up to the current latest version (8.0.34), if there are secondary indexes in the table, the execution plan output by explain also indicates that it will use the secondary index. However, during actual execution, InnoDB will enforce a full table scan to utilize the parallel scanning capability of the primary key index.

This leads to two problems:

1.The execution plan displayed is inaccurate. The two execution plans are clearly not identical, but they appear the same. This is misleading for us, making us believe that version 8.0.34 is still performing a secondary index scan, when in reality it is performing a parallel primary key scan.

2.Version 8.0.17 and later versions completely disregard the fact that in certain situations, especially when CPU resources are insufficient, a secondary index scan is the fastest execution plan. Yet, it forcefully selects a slower execution plan.

How to repeat:
use 8.0.34:

drop table t0;

CREATE TABLE `t0` (
  `id` int NOT NULL AUTO_INCREMENT,
  `i1` int NOT NULL DEFAULT '0',
  `c1` varchar(300) NOT NULL DEFAULT 'fander',
  `c2` varchar(300) NOT NULL DEFAULT 'fander',
  `c3` varchar(300) NOT NULL DEFAULT 'fander',
  `c4` varchar(300) NOT NULL DEFAULT 'fander',
  `c5` varchar(300) NOT NULL DEFAULT 'fander',
  `c6` varchar(300) NOT NULL DEFAULT 'fander',
  `c7` varchar(300) NOT NULL DEFAULT 'fander',
  PRIMARY KEY (`id`) USING BTREE,
  KEY `idx_i1` (`i1`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;

mysql> insert into t0 values(1,0,REPEAT('a', 100),REPEAT('b', 100),REPEAT('c', 100),REPEAT('d', 100),REPEAT('e', 100),REPEAT('f', 100),REPEAT('g', 100));
Query OK, 1 row affected (0.02 sec)

mysql> select * from t0\G
*************************** 1. row ***************************
id: 1
i1: 0
c1: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
c2: bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
c3: cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c4: dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
c5: eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
c6: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
c7: gggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg
1 row in set (0.00 sec)

mysql> insert into t0(i1,c1,c2,c3,c4,c5,c6,c7) select i1,c1,c2,c3,c4,c5,c6,c7 from t0;
Query OK, 1 row affected (0.02 sec)
Records: 1  Duplicates: 0  Warnings: 0

Repeatedly execute the aforementioned SQL many times, until:
mysql> insert into t0(i1,c1,c2,c3,c4,c5,c6,c7) select i1,c1,c2,c3,c4,c5,c6,c7 from t0;
Query OK, 1048576 rows affected (29.15 sec)
Records: 1048576  Duplicates: 0  Warnings: 0

mysql> select count(1) from t0;
+----------+
| count(1) |
+----------+
|  2097152 |
+----------+
1 row in set (6.72 sec)

Due to the swift memory computation in MySQL, I plan to restart the server and ensure that data is loaded from the disk for calculation.

vi my.cnf
innodb_buffer_pool_load_at_startup=OFF

systemctl restart mysql3306

mysql> explain select count(1) from t0;
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key    | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | t0    | NULL       | index | NULL          | idx_i1 | 4       | NULL | 1992549 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> select count(1) from t0;
+----------+
| count(1) |
+----------+
|  2097152 |
+----------+
1 row in set (7.72 sec)

Simultaneously, I activated the 'top' command to monitor CPU usage. It is evident that the parallel primary key scan feature was utilized, as the CPU usage spiked to 200%.
You can see:
18000 mysql     20   0 3931860   2.2g  18732 S 192.1 58.7   0:16.20 mysqld 

Perform the same steps on MySQL 5.7 as well.
mysql> explain select count(1) from t0;
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key    | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | t0    | NULL       | index | NULL          | idx_i1 | 4       | NULL | 1952164 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.11 sec)

mysql> select count(1) from t0;
+----------+
| count(1) |
+----------+
|  2097152 |
+----------+
1 row in set (0.37 sec)

You will find that while the execution plan is the same, the performance is quite good. This is because it utilized a secondary index scan rather than a parallel primary key scan (as can be seen from the 'top' command). The design of my table structure and the number of CPU cores on my server dictate that the use of a secondary index scan is optimal for my SQL. As 5.7 does not support parallel primary key scanning by default, it is quicker. This resulted in version 8.0 losing in the SQL performance competition.

Suggested fix:
1. Fix the bug to differentiate between these two execution plans when displayed by the 'explain' command.
2. Fix the bug to enable version 8.0 to automatically select the fastest execution plan, rather than rigidly employing a parallel primary key index scan.
[4 Aug 2023 13:04] MySQL Verification Team
Hi Mr. chan,

Thank you for your bug report.

However, there are hundreds of bug reporting a significant performance drop when upgrading from 5.7 to 8.0. There are many, so we chose one. of the most similar bug, to be the original bug of your report:

https://bugs.mysql.com/bug.php?id=108430

Duplicate.
[4 Aug 2023 13:20] fander chan
I am absolutely certain that you have misunderstood my statements. Among the two bugs I pointed out,

One is related to performance. It's not about the difference between 5.7 and 8.0; I just used them as examples for convenience. The actual problem is caused by a feature introduced in version 8.0.17.

The other bug is completely unrelated to performance. I regard it as a display issue and a product defect that misleads users.
[4 Aug 2023 13:23] MySQL Verification Team
HI Mr. chan,

You are correct.

However, that feature is here to stay, so we can not remove it.

Not a bug.
[4 Aug 2023 13:37] fander chan
mysql> explain select count(1) from t0;
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key    | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | t0    | NULL       | index | NULL          | idx_i1 | 4       | NULL | 1992549 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

for my 8.0 case,I think this explain result is best:
mysql> explain select count(1) from t0;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-----------------------+
|  1 | SIMPLE      | t0    | NULL       | index | NULL          | PRIMARY | 4       | NULL | 1992549 |   100.00 | Using index; Parallel |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)
[4 Aug 2023 14:33] MySQL Verification Team
Hi Mr. chan,

Both queries return in 0 seconds.

Hence, we do not see what kind of bug it is.

What is it exactly that you want us to do???
[4 Aug 2023 14:59] fander chan
In my tests, whether using the parallel primary key index scan algorithm (the default behavior in 8.0.17+ versions, I think it is hard-coded) or the secondary index scan algorithm (in 8.0.16 or lower versions, including 5.7), their final execution plans are the same as follows.
mysql> explain select count(1) from t0;
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key    | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | t0    | NULL       | index | NULL          | idx_i1 | 4       | NULL | 1992549 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+--------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

To better distinguish, if the execution plan uses a parallel primary key index scanning algorithm, it should look like the following, key=PRIMARY, this would truly represent a full primary index scan in MySQL. I believe this won't mislead users.
mysql> explain select count(1) from t0;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra                 |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-----------------------+
|  1 | SIMPLE      | t0    | NULL       | index | NULL          | PRIMARY | 4       | NULL | 1992549 |   100.00 | Using index; Parallel |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)
[4 Aug 2023 15:06] MySQL Verification Team
Thank you, Mr. chan,

But this is not a bug.

The results are correct and performance is similar.

Not a bug.