Bug #107700 Performance of Temptable is worse than Memory in GROUP BY scenario
Submitted: 29 Jun 2022 11:18 Modified: 13 Jun 20:13
Reporter: Jiayu Zhou Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:8.0.29 OS:Any
Assigned to: CPU Architecture:Any

[29 Jun 2022 11:18] Jiayu Zhou
Description:
In our test, it's much more slower for TempTable engine to deal with GROUP BY operation than Memory engine.

Here are operational details:

mysql> set session internal_tmp_mem_storage_engine=TempTable;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT `source`.`Company Number Existence` AS `Company Number Existence`, count(*) AS `count` FROM ( SELECT * , CASE  WHEN `companies`.`company_number` IS NULL THEN 'No Company Number' ELSE 'With Company Number' END AS `Company Number Existence` FROM `companies` ) `source` GROUP BY `source`.`Company Number Existence`;
+--------------------------+---------+
| Company Number Existence | count   |
+--------------------------+---------+
| With Company Number      | 3974299 |
+--------------------------+---------+
1 row in set (14.19 sec)

mysql> set session internal_tmp_mem_storage_engine=Memory;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT `source`.`Company Number Existence` AS `Company Number Existence`, count(*) AS `count` FROM ( SELECT * , CASE  WHEN `companies`.`company_number` IS NULL THEN 'No Company Number' ELSE 'With Company Number' END AS `Company Number Existence` FROM `companies` ) `source` GROUP BY `source`.`Company Number Existence`;
+--------------------------+---------+
| Company Number Existence | count   |
+--------------------------+---------+
| With Company Number      | 3974299 |
+--------------------------+---------+
1 row in set (3.84 sec)

How to repeat:
This scenario can be stably reproduced in MySQL from 8.0.21(not certainly sure) to 8.0.29.

1) Create table

CREATE TABLE `companies` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `name_of_business_corporation_english` varchar(255) DEFAULT NULL,
  `registered_address` varchar(255) DEFAULT NULL,
  `date_of_commencement` date DEFAULT NULL,
  `date_of_cessation` date DEFAULT NULL,
  `date_of_expiry` date DEFAULT NULL,
  `certificate_number` varchar(255) DEFAULT NULL,
  `fee_and_levy` int DEFAULT NULL,
  `business_branch_name_english` varchar(255) DEFAULT NULL,
  `business_branch_name_chinese` varchar(255) DEFAULT NULL,
  `date_of_incorporation` date DEFAULT NULL,
  `company_number` varchar(255) DEFAULT NULL,
  `name_of_business_corporation_chinese` varchar(255) DEFAULT NULL,
  `created_at` datetime(6) DEFAULT NULL,
  `updated_at` datetime(6) DEFAULT NULL,
  `created_by_id` int unsigned DEFAULT NULL,
  `updated_by_id` int unsigned DEFAULT NULL,
  `winding_up_mode` varchar(255) DEFAULT NULL,
  `register_of_charges` varchar(255) DEFAULT NULL,
  `important_note` longtext,
  `last_update_by` varchar(255) DEFAULT NULL,
  `nature_of_business` varchar(255) DEFAULT NULL,
  `business_status` varchar(255) DEFAULT NULL,
  `company_category` varchar(255) DEFAULT NULL,
  `company_type` varchar(255) DEFAULT NULL,
  `administrative_division` varchar(255) DEFAULT NULL,
  `remarks` longtext,
  `active_status` varchar(255) DEFAULT NULL,
  `total_issued_shares` bigint DEFAULT NULL,
  `br_number` varchar(255) DEFAULT NULL,
  `status_remarks` varchar(255) DEFAULT NULL,
  `todos` json DEFAULT NULL,
  `country` varchar(255) DEFAULT NULL,
  `org_chart` json DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `companies_company_number_unique` (`company_number`),
  KEY `companies_created_by_id_fk` (`created_by_id`),
  KEY `companies_updated_by_id_fk` (`updated_by_id`),
  KEY `idx_company_number` (`company_number`),
  KEY `idx_name_of_business_corporation_chinese` (`name_of_business_corporation_chinese`),
  KEY `idx_name_of_business_corporation_english` (`name_of_business_corporation_english`),
  KEY `companies_country_foreign` (`country`)
) ENGINE=InnoDB AUTO_INCREMENT=6370143 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

2) Insertion

while :                                                                                                                                                                                                                                   
do
  bin/mysql -uroot -S data/mysql.sock -e "insert into test.companies(company_number, important_note, remarks) values(current_timestamp(6),repeat('a',99), repeat('b',99));"
done

File 'companies.ibd' is about 1.7G in my test.

3) Query

SELECT `source`.`Company Number Existence` AS `Company Number Existence`, count(*) AS `count`
FROM (
	SELECT *
		, CASE 
			WHEN `companies`.`company_number` IS NULL THEN 'No Company Number'
			ELSE 'With Company Number'
		END AS `Company Number Existence`
	FROM `companies`
) `source`
GROUP BY `source`.`Company Number Existence`;

Replace the storage engine and you can see the performace gap.

Suggested fix:
I have probably figured out what's causing the performance degradation.

A bug is fixed in patch:https://github.com/mysql/mysql-server/commit/9d630eea67fa642ab56229c80d9480fe4f6fae70.

Before this bugfix, temptable/header.h:decrement_number_of_used_chunks() will never free the rightmost chunk in block because of miscalculation of offset. 

After bugfix, rightmost chunk is recycled, and block may be released if there is no chunk in it, memory may be returned to OS.This part of logic is in temptable/allocator.h:deallocate(). In temptable/table.cc:update(), a local variable 'Row' is used and will be destroy once update() is over. update() function is called millions of time, which let to countless blocks destuctions and creations.

I think the root of the problem is allocator.h:deallocate(). Temptable should always keep a live block, m_state->current_block shouldn't be released even if there is no chunks in it because there is a high probabiliry that a future operation will use memory.  m_state->current_block should be released only if there is no other block in this allocator, which means this table is destructured.
[30 Jun 2022 9:42] MySQL Verification Team
Hello Jiayu Zhou,

Thank you for the report and detailed analysis.
Verified as described.

regards,
Umesh
[26 Jul 2022 14:06] Jiayu Zhou
Hello,

I have fixed this bug with no more than 30 lines of code. I'd love to contribute the fix code. 

How can I contribute code to the community?

regards,
Jiayu
[8 Aug 2022 22:48] Omer Barnir
Hi Jiayu,

In order to submit a contribution you need to first sign the Oracle Contribution Agreement (OCA). Please review the details and follow the instructions at https://oca.opensource.oracle.com/

Once an OCA is in place you can submit a contribution either by attaching it to the contribution tab of this bug, or via a PR in github (https://github.com/mysql/mysql-server)

Thanks for contributing to MySQL
==Omer
[13 Jun 20:13] Philip Olson
Posted by developer:
 
Fixed as of the MySQL Server 8.0.37, 8.4.1, and 9.1.0 releases, and here's the proposed changelog entry from the documentation team:

SELECT ... GROUP BY queries were at least twice as slow with the
TempTable engine than the Memory engine.

Thank you for the bug report.