Bug #45526 mysqldump unnecessarily(?) slow with many tables
Submitted: 16 Jun 2009 12:03 Modified: 1 Jul 2009 7:23
Reporter: Jan Steemann (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: mysqldump Command-line Client Severity:S5 (Performance)
Version:5.1.33 and before OS:Linux (2.6.26-1-amd64 #1 SMP, Debian 4)
Assigned to: CPU Architecture:Any
Triage: Triaged: D5 (Feature request)

[16 Jun 2009 12:03] Jan Steemann
Description:
Using mysqldump to dump a database with 150.000 tables is very slow.
From my findings, this is not because of the overall data volume but because of the too many "SHOW TABLES LIKE table_name" queries that mysqldump executes internally.

When dumping 10 database tables from a database with 150.000 tables, it takes 25+ seconds even if there is no data in these tables. Dumping the same tables from a database with just the 10 tables completes in 0.0 seconds.

This is because mysqldump executes a SHOW TABLES LIKE table_name for each table involved in the dump. The code is in 5.1.33/client/mysqldump.c:4174. The first row of the SHOW TABLES LIKE query resultset is actually used as the actual table name.
On a database server that has a case-sensitive file system this is unnecessary and huge performance penalty. I think there is a variable named "lower_case_table_names" which is not taken into account here. It could be queried from the server by mysqldump at the start and be used when iterating through the table list. This will remove the necessity to issue all the SHOW TABLES LIKE queries in mysqldump.
Furthermore, if all tables in a DB are dumped, mysqldump does a show tables at the beginning anyway so this is redundant.

By simply returning the old_table_name at the beginning of the get_actual_table_name function I was able to more than double the throughput of mysqldump. This is of course not an appropriate fix but it shows the potential speed improvements one can achieve when the table names in the db and the file system have the same case.

How to repeat:
Create a DB with 150.000 tables (datadir should be on ext3 filesystem) using the following bash command:

for i in `seq 1 150000`; do  mysql -u tester test -e "CREATE TABLE test$i (id INTEGER) ENGINE=MyISAM"; done;

Then run mysqldump for the first 10 tables of that database:

time mysqldump -u tester -f -Q test test1 test2 test3 test4 test5 test6 test7 test8 test9 test10 > test.dump

real    0m23.407s
user    0m0.008s
sys     0m0.012s

Suggested fix:
Modify client/mysqldump.c:get_actual_table_name() so that it doesn't issue a SHOW TABLES LIKE query if this isn't necessary (because file system is case sensitive or use tables found by initial show tables command of mysqldump).

This will double the troughput of mysqldump in this setting:

time mysqldump -u tester -f -Q test test1 test2 test3 test4 test5 test6 test7 test8 test9 test10 > tester.dump

real    0m11.822s
user    0m0.016s
sys     0m0.008s
[1 Jul 2009 7:23] Susanne Ebrecht
Verified as described.

Dumping huge tables needs extrem loong.