MySQL Bugs: #107491: count(*) uses PRIMARY key ignoring force index

Bug #107491	count(*) uses PRIMARY key ignoring force index
Submitted:	6 Jun 2022 3:30	Modified:	7 Jun 2022 1:48
Reporter:	Brian Yue (OCA)	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S5 (Performance)
Version:	8.0.27	OS:	Any (rhel-7.4)
Assigned to:		CPU Architecture:	Any (x86-64)
Tags:	count star, count(*), primary key

Description:
Hello,

  Recently we found that the new parallel query implementation of `select count(*)` in MySQL8.0 is pretty inefficient when the PRIMARY key of a table is really big, in this case `select count(*)` in MySQL8.0 runs much slower than MySQL5.7 or small versions like MySQL8013. And if we add a where clause for count(*), it may gets faster.

  The problem I'm reporting is similar with bug https://bugs.mysql.com/bug.php?id=103213, but I think it's a different reason. Reason of Bug#103213 is thought as machanism of bufferpool,  but I think is should be reason that `select count(*)` in MySQL8.0.27 uses PRIMARY key to count records ignoring force index:

```
int ha_innobase::records(ha_rows *num_rows) /*!< out: number of rows */
{
  ...
  dict_index_t *index = m_prebuilt->table->first_index();
  ut_ad(index->is_clustered());
  
  /* Count the records in the clustered index */
  ret = row_scan_index_for_mysql(m_prebuilt, index, n_threads, false, &n_rows);
  ...
}

So, when the PRIMARY key is pretty larger than buffer pool (a big table), `select count(*)` in MySQL8.0.27 performs really bad.
For example, the PRIMARY key of table may occupies disk space 100GB, and each secondary key may be very small (e.g. 1GB), if we have a bufferpool of only 2GB, a large amount of disk read will happen force the parallel read count(*) implementation(which is default in recent version of MySQL8.0), but smaller secondary index may be choosed in MySQL5.7 or older versions of MySQL8.0 like MySQL8013.

How to repeat:
(1) Prepare a table witch is named as sbtest1 with 10,000,000 records by sysbench, and the table sbtest1 should occupy disk space about 2.4GB.
(2) Configure the param innodb_buffer_pool_size as 200MB and innodb_buffer_pool_instances as 1
(3) perform the same tests on MySQL5.7 and MySQL8.0.27

【UNDER MYSQL5.7】
mysql> show variables like '%innodb_buffer_poo%';
+-------------------------------------+----------------+
| Variable_name                       | Value          |
+-------------------------------------+----------------+
| innodb_buffer_pool_chunk_size       | 134217728      |
| innodb_buffer_pool_dump_at_shutdown | OFF            |
| innodb_buffer_pool_dump_now         | OFF            |
| innodb_buffer_pool_dump_pct         | 25             |
| innodb_buffer_pool_filename         | ib_buffer_pool |
| innodb_buffer_pool_instances        | 1              |
| innodb_buffer_pool_load_abort       | OFF            |
| innodb_buffer_pool_load_at_startup  | OFF            |
| innodb_buffer_pool_load_now         | OFF            |
| innodb_buffer_pool_size             | 268435456      |
+-------------------------------------+----------------+
10 rows in set (0.00 sec)

mysql> select database_name,table_name,index_name,stat_name,stat_value from mysql.innodb_index_stats where table_name = 'sbtest1' and stat_name = 'n_leaf_pages';
+---------------+------------+------------+--------------+------------+
| database_name | table_name | index_name | stat_name    | stat_value |
+---------------+------------+------------+--------------+------------+
| sbtest        | sbtest1    | PRIMARY    | n_leaf_pages |     133702 |
| sbtest        | sbtest1    | k_1        | n_leaf_pages |       8313 |
+---------------+------------+------------+--------------+------------+
2 rows in set (0.00 sec)

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (1.64 sec)

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (1.64 sec)

【UNDER MYSQL8.0.25】
[yycdb_80@localhost ~]$ mysql -uroot -p'db1x@NJ+1' sbtest
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 8.0.25 Source distribution

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like '%innodb_buffer_poo%';
+-------------------------------------+----------------+
| Variable_name                       | Value          |
+-------------------------------------+----------------+
| innodb_buffer_pool_chunk_size       | 134217728      |
| innodb_buffer_pool_dump_at_shutdown | OFF            |
| innodb_buffer_pool_dump_now         | OFF            |
| innodb_buffer_pool_dump_pct         | 25             |
| innodb_buffer_pool_filename         | ib_buffer_pool |
| innodb_buffer_pool_in_core_file     | ON             |
| innodb_buffer_pool_instances        | 1              |
| innodb_buffer_pool_load_abort       | OFF            |
| innodb_buffer_pool_load_at_startup  | OFF            |
| innodb_buffer_pool_load_now         | OFF            |
| innodb_buffer_pool_size             | 268435456      |
+-------------------------------------+----------------+
11 rows in set (0.01 sec)

mysql> select database_name,table_name,index_name,stat_name,stat_value from mysql.innodb_index_stats where table_name = 'sbtest1' and stat_name = 'n_leaf_pages';
+---------------+------------+------------+--------------+------------+
| database_name | table_name | index_name | stat_name    | stat_value |
+---------------+------------+------------+--------------+------------+
| sbtest        | sbtest1    | PRIMARY    | n_leaf_pages |     137003 |
| sbtest        | sbtest1    | k_1        | n_leaf_pages |       8313 |
+---------------+------------+------------+--------------+------------+
2 rows in set (0.01 sec)

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (5.07 sec)

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (5.05 sec)

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (5.16 sec)

Suggested fix:
Add a configure to choose where parallel count star implementation is used or not.
Or, design a machanism to choose where to use parallel count star or not automatically

Hi Mr. Yue,

Thank you for your Performance Improvement request.

However, your idea is not sufficiently detailed, so that we could verify the report.

The only way to count all rows in an InnoDB table is to sequentially read all rows or to read it by PRIMARY index. This is due to the fact that InnoDB is a MVCC engine. 

But, you have another option. Information schema will give you instant response regarding number of rows in an InnoDB table.

Hence, we do not see what, exactly, is your performance improvement proposition.

We are waiting on your feedback.

Hello,
  The problem I'm reporting is that `select count(*)` may become much slower after parallel count star is instroduced in MySQL8.0, but it's implemented in InnoDB engine instead of MySQL server layer. 
  The stack of parallel count star in MySQL8.0 is like this:
```
#0  row_mysql_parallel_select_count_star   storage/innobase/row/row0mysql.cc:4435
#1  row_scan_index_for_mysql   storage/innobase/row/row0mysql.cc:4717
#2  ha_innobase::records   storage/innobase/handler/ha_innodb.cc:16365
#3  handler::ha_records   sql/handler.h:5296
#4  get_exact_record_count   sql/sql_executor.cc:4391
#5  UnqualifiedCountIterator::Read   sql/sql_executor.cc:6669
#6  Query_expression::ExecuteIteratorQuery   sql/sql_union.cc:1347
#7  Query_expression::execute   sql/sql_union.cc:1419
#8  Sql_cmd_dml::execute   sql/sql_select.cc:645
```

  But only PRIMARY key can be used by parallel count star:
```
  int ha_innobase::records(ha_rows *num_rows) /*!< out: number of rows */
{
  ...
  dict_index_t *index = m_prebuilt->table->first_index();
  ut_ad(index->is_clustered());
  
  /* Count the records in the clustered index */
  ret = row_scan_index_for_mysql(m_prebuilt, index, n_threads, false, &n_rows);
  ...
}
```

  Although in MySQL5.7 there is no parallel count star and only sequential read of index is performed to count rows,  we can choose to use a secondary index to count rows. It's correct that InnoDB is a MVCC engine, but when the secondary data page is visible to the count_star trx, we only need to access pages in the secondary index and there is no need to find the clustered record in PRIMARY key, thus it could be very fast.
  The stack to count rows in MySQL5.7 is like this:
```
#0  row_search_mvcc    storage/innobase/row/row0sel.cc:4459
#1  ha_innobase::index_read    storage/innobase/handler/ha_innodb.cc:10076
#2  handler::index_read_map   sql/handler.h:5404
#3  handler::ha_index_read_map    sql/handler.cc:3386
#4  handler::read_range_first    sql/handler.cc:7431
#5  ha_innobase::read_range_first    storage/innobase/handler/ha_innodb.cc:10548
#6  handler::multi_range_read_next    sql/handler.cc:6565
#7  handler::ha_multi_range_read_next    sql/handler.cc:6499
#8  QUICK_RANGE_SELECT::get_next    sql/opt_range.cc:10481
#9  IndexRangeScanIterator::Read    sql/records.cc:307
#10 FilterIterator::Read     sql/composite_iterators.cc:87
#11 AggregateIterator::Read    sql/composite_iterators.cc:226
#12 Query_expression::ExecuteIteratorQuery    sql/sql_union.cc:1347
#13 Query_expression::execute    sql/sql_union.cc:1419
#14 in Sql_cmd_dml::execute    sql/sql_select.cc:645
```

  As a result, for the same `select count(*) from sbtest1` SQL, secondary indexes can be used to count rows, but in MySQL8.0 only the primary key can be used to count rows. Thus count(*) in MySQL8.0 need to read much more data pages, leading to a bad performance.

Hi Mr. Yue,

Regarding 5.7 versus 8.0, that is a very well known issue. MySQL 8.0 is relatively slower, due to many new features that have been added.

However, moving row counting from the storage engine to the SQL layer is not a proposition that we could accept. That is because each engine is different and it will stay that way ......

Hello,
  Actually I'm not proposing to move row counting from the storage engine to the SQL layer, and I think the parallel count star introduced in MySQL8.0 is really an useful feature which can speed the count(*) commonly. But due to the parallel count star can use PRIMARY key only, it may perform slower for pretty big tables.
  Facing those cases that MySQL8.0 performs slower, now I have no available method to force the secondary keys to be used, but actually they can be used to count rows. Analyzing the cases that I previously posted, there is a big performance gap. Executing the same count star SQL, MySQL5.7 takes 1.64 sec and MySQL8.0 takes 5.07 sec:
  MySQL5.7:
  mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (1.64 sec)

  MySQL8.0:
  mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (5.07 sec)

  So I'm proposing that:
  (1) Introduce a session variable or an optimizer hint to control whether the parallel row counting in InnoDB engine should be used, so that we can tune the SQL manually when it's found to be slow.
  (2) As an another option, using secondary indexes to count rows can be considered by InnoDB engine, because actually it should can work: when the row is visible to the count star transaction, increase the row count; when the row is not visible, just find the full record in the clustered index and increase the row count if it's finally found to be visible.

  In my opinion, the second option is prefered if it could be implemented. And I think the issue I'm reporting is really an important issue because we almost always have big tables.

  Thanks for your analysis.

Hi Mr. Yue,

We have already explained why is 8.0 slower than 5.7.

Next, parallel count can be done only with PRIMARY key, due to the amount and speed of locking and due to the organisation of the clustered primary index.

We also do not think that another option is necessary for the parallel counting, since our optimiser does a very decent job of making those decisions. Also, we are working towards reducing the number of options and hints and not towards further increase of the same. 

Thank you for your opinion, though.