Bug #107491 | count(*) uses PRIMARY key ignoring force index | ||
---|---|---|---|
Submitted: | 6 Jun 2022 3:30 | Modified: | 7 Jun 2022 1:48 |
Reporter: | Brian Yue (OCA) | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S5 (Performance) |
Version: | 8.0.27 | OS: | Any (rhel-7.4) |
Assigned to: | CPU Architecture: | Any (x86-64) | |
Tags: | count star, count(*), primary key |
[6 Jun 2022 3:30]
Brian Yue
[6 Jun 2022 12:16]
MySQL Verification Team
Hi Mr. Yue, Thank you for your Performance Improvement request. However, your idea is not sufficiently detailed, so that we could verify the report. The only way to count all rows in an InnoDB table is to sequentially read all rows or to read it by PRIMARY index. This is due to the fact that InnoDB is a MVCC engine. But, you have another option. Information schema will give you instant response regarding number of rows in an InnoDB table. Hence, we do not see what, exactly, is your performance improvement proposition. We are waiting on your feedback.
[6 Jun 2022 12:46]
Brian Yue
Hello, The problem I'm reporting is that `select count(*)` may become much slower after parallel count star is instroduced in MySQL8.0, but it's implemented in InnoDB engine instead of MySQL server layer. The stack of parallel count star in MySQL8.0 is like this: ``` #0 row_mysql_parallel_select_count_star storage/innobase/row/row0mysql.cc:4435 #1 row_scan_index_for_mysql storage/innobase/row/row0mysql.cc:4717 #2 ha_innobase::records storage/innobase/handler/ha_innodb.cc:16365 #3 handler::ha_records sql/handler.h:5296 #4 get_exact_record_count sql/sql_executor.cc:4391 #5 UnqualifiedCountIterator::Read sql/sql_executor.cc:6669 #6 Query_expression::ExecuteIteratorQuery sql/sql_union.cc:1347 #7 Query_expression::execute sql/sql_union.cc:1419 #8 Sql_cmd_dml::execute sql/sql_select.cc:645 ``` But only PRIMARY key can be used by parallel count star: ``` int ha_innobase::records(ha_rows *num_rows) /*!< out: number of rows */ { ... dict_index_t *index = m_prebuilt->table->first_index(); ut_ad(index->is_clustered()); /* Count the records in the clustered index */ ret = row_scan_index_for_mysql(m_prebuilt, index, n_threads, false, &n_rows); ... } ``` Although in MySQL5.7 there is no parallel count star and only sequential read of index is performed to count rows, we can choose to use a secondary index to count rows. It's correct that InnoDB is a MVCC engine, but when the secondary data page is visible to the count_star trx, we only need to access pages in the secondary index and there is no need to find the clustered record in PRIMARY key, thus it could be very fast. The stack to count rows in MySQL5.7 is like this: ``` #0 row_search_mvcc storage/innobase/row/row0sel.cc:4459 #1 ha_innobase::index_read storage/innobase/handler/ha_innodb.cc:10076 #2 handler::index_read_map sql/handler.h:5404 #3 handler::ha_index_read_map sql/handler.cc:3386 #4 handler::read_range_first sql/handler.cc:7431 #5 ha_innobase::read_range_first storage/innobase/handler/ha_innodb.cc:10548 #6 handler::multi_range_read_next sql/handler.cc:6565 #7 handler::ha_multi_range_read_next sql/handler.cc:6499 #8 QUICK_RANGE_SELECT::get_next sql/opt_range.cc:10481 #9 IndexRangeScanIterator::Read sql/records.cc:307 #10 FilterIterator::Read sql/composite_iterators.cc:87 #11 AggregateIterator::Read sql/composite_iterators.cc:226 #12 Query_expression::ExecuteIteratorQuery sql/sql_union.cc:1347 #13 Query_expression::execute sql/sql_union.cc:1419 #14 in Sql_cmd_dml::execute sql/sql_select.cc:645 ``` As a result, for the same `select count(*) from sbtest1` SQL, secondary indexes can be used to count rows, but in MySQL8.0 only the primary key can be used to count rows. Thus count(*) in MySQL8.0 need to read much more data pages, leading to a bad performance.
[6 Jun 2022 15:07]
MySQL Verification Team
Hi Mr. Yue, Regarding 5.7 versus 8.0, that is a very well known issue. MySQL 8.0 is relatively slower, due to many new features that have been added. However, moving row counting from the storage engine to the SQL layer is not a proposition that we could accept. That is because each engine is different and it will stay that way ......
[7 Jun 2022 1:48]
Brian Yue
Hello, Actually I'm not proposing to move row counting from the storage engine to the SQL layer, and I think the parallel count star introduced in MySQL8.0 is really an useful feature which can speed the count(*) commonly. But due to the parallel count star can use PRIMARY key only, it may perform slower for pretty big tables. Facing those cases that MySQL8.0 performs slower, now I have no available method to force the secondary keys to be used, but actually they can be used to count rows. Analyzing the cases that I previously posted, there is a big performance gap. Executing the same count star SQL, MySQL5.7 takes 1.64 sec and MySQL8.0 takes 5.07 sec: MySQL5.7: mysql> select count(*) from sbtest1; +----------+ | count(*) | +----------+ | 10000000 | +----------+ 1 row in set (1.64 sec) MySQL8.0: mysql> select count(*) from sbtest1; +----------+ | count(*) | +----------+ | 10000000 | +----------+ 1 row in set (5.07 sec) So I'm proposing that: (1) Introduce a session variable or an optimizer hint to control whether the parallel row counting in InnoDB engine should be used, so that we can tune the SQL manually when it's found to be slow. (2) As an another option, using secondary indexes to count rows can be considered by InnoDB engine, because actually it should can work: when the row is visible to the count star transaction, increase the row count; when the row is not visible, just find the full record in the clustered index and increase the row count if it's finally found to be visible. In my opinion, the second option is prefered if it could be implemented. And I think the issue I'm reporting is really an important issue because we almost always have big tables. Thanks for your analysis.
[7 Jun 2022 11:26]
MySQL Verification Team
Hi Mr. Yue, We have already explained why is 8.0 slower than 5.7. Next, parallel count can be done only with PRIMARY key, due to the amount and speed of locking and due to the organisation of the clustered primary index. We also do not think that another option is necessary for the parallel counting, since our optimiser does a very decent job of making those decisions. Also, we are working towards reducing the number of options and hints and not towards further increase of the same. Thank you for your opinion, though.