Bug #56332 Performance regression in DROP TABLE performance post 5.0.77
Submitted: 27 Aug 2010 18:17 Modified: 7 Jul 2013 11:39
Reporter: Justin Swanhart Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.0.87, 5.0.91 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: drop table, innodb, performance

[27 Aug 2010 18:17] Justin Swanhart
Description:
This fix apparently creates a performance regression when dropping SMALL tables.

To test:
create 10K tables
drop 10K tables

time on 5.0.77: 2m15s
time on 5.0.91: 4m30s

oprofile reveals that the extra time is spent in buf_LRU_invalidate_tablespace()

5.0.77:
-------------------
time: 2m40.437s

# opreport session:s1 --demangle=smart --symbols /data/5.0.77/libexec/mysqld |head
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples % symbol name
8756 37.8851 buf_LRU_invalidate_tablespace
4994 21.6078 safe_mutex_lock
2158 9.3371 srv_release_threads
1236 5.3479 safe_mutex_unlock
524 2.2672 .plt
474 2.0509 mutex_get_waiters
436 1.8865 mutex_test_and_set

5.0.87
-------------------
time: 4m46.589s

# opreport session:s2 --demangle=smart --symbols /data/5.0.87/libexec/mysqld |head
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples % symbol name
20778 52.1523 buf_LRU_invalidate_tablespace
5897 14.8013 safe_mutex_lock
2308 5.7930 safe_mutex_unlock
2138 5.3663 srv_release_threads
952 2.3895 mutex_test_and_set
758 1.9026 mutex_get_waiters
705 1.7695 .plt

5.0.91
---------------------
time: 4m37.773s

opreport session:s3 --demangle=smart --symbols /data/5.0.91/libexec/mysqld |head
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
samples % symbol name
15470 46.4732 buf_LRU_invalidate_tablespace
6028 18.1086 safe_mutex_lock
2191 6.5820 safe_mutex_unlock
2162 6.4948 srv_release_threads
851 2.5565 .plt
806 2.4213 mutex_get_waiters
590 1.7724 mutex_test_and_set

How to repeat:
Create 10K tables for the test case:
<?php
echo "use test;\n";
for($i=1;$i<10000;++$i) {
echo "CREATE TABLE `ts_$i` ( `begin_time` int(11) NOT NULL, `end_time` int(11) NOT NULL, `call_count` int(11) NOT NULL, `total_call_time` float NOT NULL, `sum_of_squares` float NOT NULL, `min_call_time` float NOT NULL, `max_call_time` float NOT NULL, `agent_id` int(11) NOT NULL, `metric_id` int(11) NOT NULL, `total_exclusive_time` float NOT NULL, `agent_run_id` int(11) NOT NULL, PRIMARY KEY  (`metric_id`,`agent_id`,`begin_time`,`agent_run_id`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;\n";
}

Drop 10K tables:
<?php
echo "use test;\n";
for($i=1;$i<10000;++$i) {
echo "DROP TABLE `ts_$i`;\n";
}

Suggested fix:
none
[27 Aug 2010 18:19] Justin Swanhart
"This fix" mentioned in the post is supposed to include a link to:
http://bugs.mysql.com/bug.php?id=39939

It looks like changes made for big tables may have negatively impacted drop time for small ones.
[27 Aug 2010 20:41] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior and get identical results for all versions:

5.0.77:
real    0m31.790s
user    0m0.127s
sys     0m0.127s

5.0.91:
$time ../bin/mysql -uroot -h127.0.0.1 -P9306 test <~/blade12/src/bugs/bug56332/drop.sql 

real    0m32.043s
user    0m0.116s
sys     0m0.108s

5.1.50:
real    0m31.021s
user    0m0.111s
sys     0m0.131s

Please send us your configuration file.
[9 Sep 2010 18:24] Justin Swanhart
Please note that the test was done on XFS.  If you do the test on a filesystem that takes a long time to remove data, this may mask the change in performance by moving the bottleneck to the filesystem.
[9 Sep 2010 20:16] Sveta Smirnova
Thank you for the feedback.

I still can not repeat described behavior. Which package do you use? (File name you downloaded)
[9 Sep 2010 21:45] Justin Swanhart
I downloaded the source tarball for each version and built with:
sh BUILD/compile-amd64-debug-max

Are you certain the tables that you are dropping are innodb?  No warnings from the create table script that you ran first?
[19 Sep 2010 12:10] Alexey Kishkin
Hi Justin, I am trying to reproduce it. Any mount options for XFS ?
[19 Oct 2010 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[20 Nov 2010 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[20 Nov 2010 3:18] Justin Swanhart
It has to do with the performance of buf_LRU_... 

On some machines this function performs more slowly than others.  It seems to be a problem on multiple core machines, which means it might be mutex contention on buffer pool mutex.  This would not show up on machine with small number of cores.

I have two different customers that can duplicate this problem on 16 core boxes.
[20 Nov 2010 3:24] Justin Swanhart
The function is called twice as many times - it is clearly visible in the oprofile.  Even if you can't duplicate the slowdown you can clearly see the change in the stack.

Assuming that buf_LRU is expensive, and that on some machines it is more expensive than others, you should investigate what aspects of this function consume the most time (relatively) on your machine.

Maybe try adding concurrency to the drop test - create 40K tables and 4 drop scripts or eight drop scripts if you have 8 cores.  

There has to be some way to make the performance difference visible in your tests.
[3 Feb 2011 23:57] Peter Zaitsev
Hi,

You most likely need to test with large buffer pool which is filled with tables and innodb_file_per_table=1 to repeat it.

I hope Justin compared same buffer pool size completely filled with pages for this one.
[5 Feb 2011 2:35] Justin Swanhart
On the machine I tested this on, I did not need a buffer pool full of pages, just 10K empty tables.

I used O_DIRECT, and innodb_file_per_table, per the my.cnf I attached earlier (which may not be visible to the public).  I tested on XFS and ext3, drop time was not related to the filesystem.  

I compiled both versions of MySQL, started them up with an empty database, created the 10K tables, dropped them and compared performance between the two versions as built from source using the provided scripts in the BUILD directory.

Perhaps it had something to do with the particular machine, but I don't now what would cause it.  However, my observation still stands - the buffer pool invalidation function is called twice as often as it was in previous versions, and performance is about 2x worse than the earlier version due to this increase in the number of calls to this function.
[5 Feb 2011 2:44] Justin Swanhart
The invalidation function and safe_mutex_lock seem to take more time, looking at the stacks I posted before.
[21 Apr 2011 15:36] Davi Arnaut
> sh BUILD/compile-amd64-debug-max
> The invalidation function and safe_mutex_lock seem to take more time [...]

Please, do not use debug-enabled builds for benchmarking.
[21 Apr 2011 15:39] Justin Swanhart
sh BUILD/compile-amd64-max is not a debug build.  I did not use debug builds for benchmarking.

The customer who encountered this issue uses New Relic.  They asked us to investigate decreased drop table performance after we upgraded their database.  They were not using a DEBUG build either!
[21 Apr 2011 15:44] Davi Arnaut
> sh BUILD/compile-amd64-max is not a debug build. 

But that's not what you said. You said: "sh BUILD/compile-amd64-debug-max"

> I did not use debug builds for benchmarking.

Yes, at least in some, you did. Quoting you: "The invalidation function and safe_mutex_lock seem to take more time". Safe mutex is only enabled in debug builds.
[21 Apr 2011 15:45] Davi Arnaut
It does not diminish the importance of the bug report, I'm just pointing out that it is better to avoid debug-enabled builds for benchmarking.
[21 Apr 2011 15:48] Justin Swanhart
Sorry for the confusion.  I tested it on both debug and non-debug.  I only posted the oprofile from the debug version.

I don't have access to this machine anymore so I can't give you oprofile output from the non-debug version.  Regardless, in production, using non-debug builds, the customer encountered a significant drop in DROP TABLE performance after upgrading, as identified by instrumentation (New Relic) in their application.
[29 Jun 2013 15:05] MySQL Verification Team
Looks like this was continued over to:
http://bugs.mysql.com/bug.php?id=69316
[7 Jul 2013 11:39] James Day
Update status to closed since this feature request was done in 5.5.

This won't be fixed in 5.0 or 5.1 but it was fixed in 5.5 as a feature request implemented in that version. The problem was slow dropping of tables in a large and mostly filled buffer pool.

It was temporarily slower in 5.6 than in 5.5 until the fix for bug #69316. Changelog entry from 5 July 2013 for that closed bug is:

-----
Added a changelog entry for 5.6.13, 5.7.2:

"A code regression introduced in MySQL 5.6 negatively impacted "DROP TABLE"
and "ALTER TABLE" performance."
-----

In addition, dropping of compressed tables was speeded up in bug #68022 for versions 5.1.69, 5.5.31, 5.6.11 and that might be of interest to you if this was.

James Day, MySQL Senior Principal Support Engineer, Oracle