Description:
In our test, it's much more slower for TempTable engine to deal with GROUP BY operation than Memory engine.
Here are operational details:
mysql> set session internal_tmp_mem_storage_engine=TempTable;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT `source`.`Company Number Existence` AS `Company Number Existence`, count(*) AS `count` FROM ( SELECT * , CASE WHEN `companies`.`company_number` IS NULL THEN 'No Company Number' ELSE 'With Company Number' END AS `Company Number Existence` FROM `companies` ) `source` GROUP BY `source`.`Company Number Existence`;
+--------------------------+---------+
| Company Number Existence | count |
+--------------------------+---------+
| With Company Number | 3974299 |
+--------------------------+---------+
1 row in set (14.19 sec)
mysql> set session internal_tmp_mem_storage_engine=Memory;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT `source`.`Company Number Existence` AS `Company Number Existence`, count(*) AS `count` FROM ( SELECT * , CASE WHEN `companies`.`company_number` IS NULL THEN 'No Company Number' ELSE 'With Company Number' END AS `Company Number Existence` FROM `companies` ) `source` GROUP BY `source`.`Company Number Existence`;
+--------------------------+---------+
| Company Number Existence | count |
+--------------------------+---------+
| With Company Number | 3974299 |
+--------------------------+---------+
1 row in set (3.84 sec)
How to repeat:
This scenario can be stably reproduced in MySQL from 8.0.21(not certainly sure) to 8.0.29.
1) Create table
CREATE TABLE `companies` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`name_of_business_corporation_english` varchar(255) DEFAULT NULL,
`registered_address` varchar(255) DEFAULT NULL,
`date_of_commencement` date DEFAULT NULL,
`date_of_cessation` date DEFAULT NULL,
`date_of_expiry` date DEFAULT NULL,
`certificate_number` varchar(255) DEFAULT NULL,
`fee_and_levy` int DEFAULT NULL,
`business_branch_name_english` varchar(255) DEFAULT NULL,
`business_branch_name_chinese` varchar(255) DEFAULT NULL,
`date_of_incorporation` date DEFAULT NULL,
`company_number` varchar(255) DEFAULT NULL,
`name_of_business_corporation_chinese` varchar(255) DEFAULT NULL,
`created_at` datetime(6) DEFAULT NULL,
`updated_at` datetime(6) DEFAULT NULL,
`created_by_id` int unsigned DEFAULT NULL,
`updated_by_id` int unsigned DEFAULT NULL,
`winding_up_mode` varchar(255) DEFAULT NULL,
`register_of_charges` varchar(255) DEFAULT NULL,
`important_note` longtext,
`last_update_by` varchar(255) DEFAULT NULL,
`nature_of_business` varchar(255) DEFAULT NULL,
`business_status` varchar(255) DEFAULT NULL,
`company_category` varchar(255) DEFAULT NULL,
`company_type` varchar(255) DEFAULT NULL,
`administrative_division` varchar(255) DEFAULT NULL,
`remarks` longtext,
`active_status` varchar(255) DEFAULT NULL,
`total_issued_shares` bigint DEFAULT NULL,
`br_number` varchar(255) DEFAULT NULL,
`status_remarks` varchar(255) DEFAULT NULL,
`todos` json DEFAULT NULL,
`country` varchar(255) DEFAULT NULL,
`org_chart` json DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `companies_company_number_unique` (`company_number`),
KEY `companies_created_by_id_fk` (`created_by_id`),
KEY `companies_updated_by_id_fk` (`updated_by_id`),
KEY `idx_company_number` (`company_number`),
KEY `idx_name_of_business_corporation_chinese` (`name_of_business_corporation_chinese`),
KEY `idx_name_of_business_corporation_english` (`name_of_business_corporation_english`),
KEY `companies_country_foreign` (`country`)
) ENGINE=InnoDB AUTO_INCREMENT=6370143 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
2) Insertion
while :
do
bin/mysql -uroot -S data/mysql.sock -e "insert into test.companies(company_number, important_note, remarks) values(current_timestamp(6),repeat('a',99), repeat('b',99));"
done
File 'companies.ibd' is about 1.7G in my test.
3) Query
SELECT `source`.`Company Number Existence` AS `Company Number Existence`, count(*) AS `count`
FROM (
SELECT *
, CASE
WHEN `companies`.`company_number` IS NULL THEN 'No Company Number'
ELSE 'With Company Number'
END AS `Company Number Existence`
FROM `companies`
) `source`
GROUP BY `source`.`Company Number Existence`;
Replace the storage engine and you can see the performace gap.
Suggested fix:
I have probably figured out what's causing the performance degradation.
A bug is fixed in patch:https://github.com/mysql/mysql-server/commit/9d630eea67fa642ab56229c80d9480fe4f6fae70.
Before this bugfix, temptable/header.h:decrement_number_of_used_chunks() will never free the rightmost chunk in block because of miscalculation of offset.
After bugfix, rightmost chunk is recycled, and block may be released if there is no chunk in it, memory may be returned to OS.This part of logic is in temptable/allocator.h:deallocate(). In temptable/table.cc:update(), a local variable 'Row' is used and will be destroy once update() is over. update() function is called millions of time, which let to countless blocks destuctions and creations.
I think the root of the problem is allocator.h:deallocate(). Temptable should always keep a live block, m_state->current_block shouldn't be released even if there is no chunks in it because there is a high probabiliry that a future operation will use memory. m_state->current_block should be released only if there is no other block in this allocator, which means this table is destructured.