Bug #108913 Table partition maintenance takes very long time to complete
Submitted: 28 Oct 2022 12:43 Modified: 3 Nov 2022 16:30
Reporter: Niranjan R Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.30 OS:Any (Linux)
Assigned to: CPU Architecture:x86
Tags: Drop, partitions, performance issue

[28 Oct 2022 12:43] Niranjan R
Description:
Dropping a table partition takes extremely long time to complete.

== Issue seems to be related to "O_DIRECT" innodb_flush_method. 
When "fsync" is used, the operation is quick.
== The operation spends a lot of time at "committing alter table to storage engine" step.

====== Server Details: ======
++ 8 vCPU
++ 64 GiB of RAM
++ Linux OS

====== DB workload details: ======
No other query is being run on the DB except the queries involved in the testing.

====== Behavior varies based on multiple factors ====== 
== I can see that in a specific case, the operation doesn't even take a second to complete.  --> Very good performance
== I can see that in few cases, it takes few seconds to complete.  --> Bad performance
== I can also see that it takes minutes to complete.   --> very poor performance

How to repeat:
== Preferably use a server with similar CPU and memory configuration.
== Use the attached my.cnf file
== .sql has been attached to the report. Load it on to the DB. It includes creating a DB, table and populating it with required data.
== Wait for the load to complete.

== Once the data has been loaded, you will ideally see one of the partitions in the table being ~18GB.

== Now, we need to drop the partition.

THE AMOUNT OF TIME IT TAKES TO DROP THE PARTITION(operation to complete) DEPENDS ON WHEN WE ACTUALLY DROP THE PARTITION:

~~~~~ DROP THE PARTITION IMMEDIATELY ~~~~~

== As soon as the data is loaded using the SQL given, drop the above partition(which is of ~18GB) using the below command:

alter table big_table drop partition p20221029;

The name of the partition might be different in your case. Make sure to change it in the command accordingly.

== The above operation would take several seconds to complete.

Query OK, 0 rows affected (9.47 sec)
Records: 0  Duplicates: 0  Warnings: 0

~~~~~ DROP THE PARTITION AFTER 30 MINUTES ~~~~~

== Once the data is loaded using the SQL given, leave the server idle for 30 minutes. Drop the partition after 30 minutes.

== The operation still takes several seconds to complete.

Query OK, 0 rows affected (6.49 sec)
Records: 0  Duplicates: 0  Warnings: 0

~~~~~ DROP THE PARTITION AFTER REBOOT ~~~~~

== Once the data is loaded using the SQL given, reboot the server. Once the engine is up, drop the partition immediately.

== The operation takes MINUTES TO COMPLETE.

Query OK, 0 rows affected (1 min 41.53 sec)
Records: 0  Duplicates: 0  Warnings: 0

~~~~~ DROP THE PARTITION POST 30 MINUTES 'AFTER REBOOT' ~~~~~

== Once the data is loaded using the SQL given, reboot the server. Once the engine is up, leave the server idle for 30 minutes. Drop the partition after 30 minutes(AFTER REBOOT).

== The operation completes within a "fraction of a second".

Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0
[2 Nov 2022 14:16] MySQL Verification Team
Hi Mr. Niranjan,

Thank you for your bug report.

However, it is not a bug.

When configuring your OS and MySQL for proper ACID functionality, it will be ALWAYS slower, then when ACID safety is more relaxed.

In short, this is expected behaviour.
[3 Nov 2022 16:30] Niranjan R
Thanks for the update, team.

It still looks concerning here.
Just to summarize, if the server is restarted, eventually after sometime, the operation completes real quick (0.03 seconds).   ---> 4th scenario in the "steps to reproduce".

But, when the server is __NOT__ restarted, it always takes long. And it __NEVER__ completes as quick as above.

I tried waiting even longer in the 2nd scenario (in "steps to reproduce")(left the server idle for longer than __10 hours__ without reboot). But the operation tills take long.  --> Why does this never complete in around 0.03 seconds ?

Even running flush tables, flush logs etc does __NOT__ help.

This gives me an impression that the operation will always be longer if you don't reboot the server.  --> Which is concerning.

What I would call as an expected behavior is that the operation should at least at some point(eventually) complete as quickly as it happens in the 4th scenario, even without a reboot.  --> At least by running something manually like "flush tables". etc.

Can you please check and re-validate?
[8 Nov 2022 16:10] Tim Gustafson
Please note that this behavior is not observed on MySQL 5.7.  MySQL 5.7 performs table partition maintenance under these with reasonable execution times, generally around 1 second or less.

This feels very much like a regression in MySQL 8.0.
[12 Nov 2022 7:16] MySQL Verification Team
Hello Tim,

I checked to see if this is a regression in 8.0 and hence tried on latest GA builds but not seeing any big diff when compared to 5.7 vs 8.0 (even 8.0 fared better). So at least from my tests I can see clearly that there is no regression in 8.0 with provided test case. I'll join the test results shortly. Tu sum-up, I confirmed what Niranjan claimed in the initial report for all the scenarios i.e.

** DROP THE PARTITION IMMEDIATELY
8.0.31 - 4.31 sec
5.7.40 - 6.19 sec

^^ Repeated each steps 3 times, 8.0.31 took 4.30 to 4.50 sec, 5.7.40 took 6.19 to 6.55 sec  to DROP PARTITION

** DROP THE PARTITION AFTER 30 MINUTES
8.0.31 - 4.29 sec
5.7.40 - 6.17 sec

^^ Repeated steps 3 times, 8.0.31 took 4.29 sec to 4.58 sec, 5.7.40 took 6.17 to 6.30 sec  to DROP PARTITION

** DROP THE PARTITION AFTER REBOOT
8.0.31 - 2 min 43.41 sec
5.7.40 - 2 min 58.27 sec

^^ Repeated steps 3 times, 8.0.31 took 2 min  43.41 sec to 2 min 50 sec, 5.7.40 took 2 min 58.27 sec to 3 min 10 sec to DROP PARTITION

** DROP THE PARTITION POST 30 MINUTES 'AFTER REBOOT' 
8.0.31 - 0.02 sec
5.7.40 - 0.31 sec

^^ Repeated steps only once, 8.0.31 took 0.02 sec, 5.7.40 took 0.31 sec took to DROP PARTITION

Could you please try latest GA build 8.0.31? Also, please correct me if I'm missing anything. Thank you.

regards,
Umesh
[12 Nov 2022 7:17] MySQL Verification Team
8.0.31 test results

Attachment: 108913_8.0.31.results (application/octet-stream, text), 33.47 KiB.

[12 Nov 2022 7:18] MySQL Verification Team
5.7.40 test results

Attachment: 108913_5.7.40.results (application/octet-stream, text), 31.71 KiB.

[8 Dec 2022 15:27] Marek Kretkiewicz
Question related to long time execution which increase with server running time.

Can it be related to number of free memory in innodb_buffer_pool_size and what is in this buffer?

We have the same problem, that adding and deleting new/old partitions takes a lot of time, however when we met this problem id=101900 and database was restarted, just few GB were used in buffer, it was executed in ms:

mysql> ALTER TABLE table_name ADD PARTITION (PARTITION p202250 VALUES LESS THAN ('2022-12-12'));
Query OK, 0 rows affected (0.09 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql>

however when doing it with higher uptime it's always:

mysql> ALTER TABLE table_name ADD PARTITION (PARTITION p202248 VALUES LESS THAN ('2022-11-28'));
Query OK, 0 rows affected (10.57 sec)
Records: 0  Duplicates: 0  Warnings: 0

And what is interesting when doing it in a sequence, 2nd execution is always short (today I have deleted some data, so it was a little bit faster):

mysql> ALTER TABLE table_name ADD PARTITION (PARTITION p202251 VALUES LESS THAN ('2022-12-21'));
Query OK, 0 rows affected (6.20 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE table_name ADD PARTITION (PARTITION p202252 VALUES LESS THAN ('2022-12-26'));
Query OK, 0 rows affected (0.18 sec)
Records: 0  Duplicates: 0  Warnings: 0

and the same is with droping:
mysql> ALTER TABLE table_name DROP PARTITION p202149;
Query OK, 0 rows affected (7.15 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE table_name DROP PARTITION p202150;
Query OK, 0 rows affected (0.09 sec)
Records: 0  Duplicates: 0  Warnings: 0
[24 Jan 2023 12:07] Tim Gustafson
I've done a bit more testing, and we seem to have narrowed this down to innodb_flush_method being set to O_DIRECT, and the database catalog moving from MyISAM (in 5.7) to InnoDB (in 8.0).

We observe a difference of up to three orders of magnitude in the amount of time it takes to do table partition maintenance on a MySQL 8.0 server with innodb_flush_method set to O_DIRECT, and under heavy load.

2023-01-12 17:35:32.1961: Version: 5.7.40-log
2023-01-12 17:35:32.1961: Total Inserts: 331400
2023-01-12 17:35:32.1961: Partition Maintenance Duration:
2023-01-12 17:35:32.1961:   During Setup: 812
2023-01-12 17:35:32.1961:   After 1 Minute(s): 1234
2023-01-12 17:35:32.1961:   After 10 Minute(s): 852
2023-01-12 17:35:32.1961:   After 60 Minute(s): 728

2023-01-12 17:35:32.1962: Version: 8.0.31
2023-01-12 17:35:32.1962: Total Inserts: 368393
2023-01-12 17:35:32.1962: Partition Maintenance Duration:
2023-01-12 17:35:32.1962:   During Setup: 1035
2023-01-12 17:35:32.1962:   After 1 Minute(s): 968
2023-01-12 17:35:32.1962:   After 10 Minute(s): 5492
2023-01-12 17:35:32.1962:   After 60 Minute(s): 65695

I think it's fair to say that it ought to be at least documented somewhere.

I think it's also worth looking at why this takes so long when O_DIRECT is in use, and why the partition maintenance time increases (roughly linearly, it seems to me) with the duration since the last partition maintenance took place.
[24 Jan 2023 12:08] Tim Gustafson
Apologies, I should have said "two orders of magnitude", not three.
[31 Jan 2023 10:32] Marek Kretkiewicz
Regarding this correlation which you can see: time of operation on partition and duration od this operation.

Do you think that it can be simply related with number of data in innodb_buffer_pool ?
Longer time, less data in buffer, more time to load to buffer.

What about innodb_adaptive_hash_index parameter?
In this incident id=101900 there is information that when it's enabled, operations on partitions take 5x more time.
[27 Feb 2023 16:05] Marek Kretkiewicz
For your information we have switched off innodb_adaptive_hash_index parameter and thanks to it decreased time from ~10s to 0.2s when adding new partition.

It's for sure related to case id=101900.