MySQL Bugs: #45526: mysqldump unnecessarily(?) slow with many tables

Bug #45526	mysqldump unnecessarily(?) slow with many tables
Submitted:	16 Jun 2009 12:03	Modified:	1 Jul 2009 7:23
Reporter:	Jan Steemann (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: mysqldump Command-line Client	Severity:	S5 (Performance)
Version:	5.1.33 and before	OS:	Linux (2.6.26-1-amd64 #1 SMP, Debian 4)
Assigned to:		CPU Architecture:	Any

Description:
Using mysqldump to dump a database with 150.000 tables is very slow.
From my findings, this is not because of the overall data volume but because of the too many "SHOW TABLES LIKE table_name" queries that mysqldump executes internally.

When dumping 10 database tables from a database with 150.000 tables, it takes 25+ seconds even if there is no data in these tables. Dumping the same tables from a database with just the 10 tables completes in 0.0 seconds.

This is because mysqldump executes a SHOW TABLES LIKE table_name for each table involved in the dump. The code is in 5.1.33/client/mysqldump.c:4174. The first row of the SHOW TABLES LIKE query resultset is actually used as the actual table name.
On a database server that has a case-sensitive file system this is unnecessary and huge performance penalty. I think there is a variable named "lower_case_table_names" which is not taken into account here. It could be queried from the server by mysqldump at the start and be used when iterating through the table list. This will remove the necessity to issue all the SHOW TABLES LIKE queries in mysqldump.
Furthermore, if all tables in a DB are dumped, mysqldump does a show tables at the beginning anyway so this is redundant.

By simply returning the old_table_name at the beginning of the get_actual_table_name function I was able to more than double the throughput of mysqldump. This is of course not an appropriate fix but it shows the potential speed improvements one can achieve when the table names in the db and the file system have the same case.

How to repeat:
Create a DB with 150.000 tables (datadir should be on ext3 filesystem) using the following bash command:

for i in `seq 1 150000`; do mysql -u tester test -e "CREATE TABLE test$i (id INTEGER) ENGINE=MyISAM"; done;

Then run mysqldump for the first 10 tables of that database:

time mysqldump -u tester -f -Q test test1 test2 test3 test4 test5 test6 test7 test8 test9 test10 > test.dump

real 0m23.407s
user 0m0.008s
sys 0m0.012s

Suggested fix:
Modify client/mysqldump.c:get_actual_table_name() so that it doesn't issue a SHOW TABLES LIKE query if this isn't necessary (because file system is case sensitive or use tables found by initial show tables command of mysqldump).

This will double the troughput of mysqldump in this setting:

time mysqldump -u tester -f -Q test test1 test2 test3 test4 test5 test6 test7 test8 test9 test10 > tester.dump

real 0m11.822s
user 0m0.016s
sys 0m0.008s

Verified as described.

Dumping huge tables needs extrem loong.