MySQL Bugs: #22487: mysql performs extremely slow when creating unique indexes on huge table

Bug #22487	mysql performs extremely slow when creating unique indexes on huge table
Submitted:	19 Sep 2006 17:57	Modified:	14 Mar 2007 12:00
Reporter:	Alexander Valyalkin	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: DDL	Severity:	S5 (Performance)
Version:	5.0.26	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	Contribution

Description:
Mysql performs extremely slow when creating unique indexes or primary key on huge table by ALTER TABLE ADD UNIQUE INDEX query. 

I know, that key_buffer_size parameter can improve mysql performance in this case for relative small tables, but become useless when unique index size become bigger than available RAM size.

Nowadays widespreaded computers and operating systems have 4G limit of userspace memory, while table sizes constantly become bigger. So the problem becomes more and more real every day.

It seems that bug #9544 has the same origin.

How to repeat:
1) create table without indexes and fill it by large amount of data (more than 10G)
2) add unique index or primary key to this table by ALTER TABLE ADD [UNIQUE INDEX | PRIMARY KEY ]
3) wait for eternity ;)

Suggested fix:
I noticed that mysql uses slow per-row algorithm when creating unique indexes, while it uses fast filesort algorithm when creating non-unique indexes. Why don't use fast algorithm for creating unique indexes too?

Thank you for a problem report. Have you got this behaviour (slow UNIQUE index creation while non-UNIQUE index is created obviously faster, on the same table with the same (unique) data and same my.cnf settings) for MyISAM tables only or for InnoDB also? Please, try to check that with a newer version, 5.0.26, and inform about the results.

sql/sql_table.cc diff for mysql 5.0.24

Attachment: sql_table.cc.diff (application/octet-stream, text), 7.34 KiB.

Valeriy, thank you for your answer. InnoDB tables also shows performance degradation when switching from non-unique to unique keys on the same set of columns.

I investigated the problem and wrote the patch for sql/sql_table.cc file of mysql 5.0.24 sources, which significantly improves performance on the following SQL queries againist MyISAM tables:
ALTER TABLE ... ADD {PRIMARY|UNIQUE} keys (average 8 times faster)
ALTER TABLE ... DROP any_keys (average 5 times faster)

It also shows speed improvements, when the table already contains unique or primary keys on the following query:
ALTER TABLE ... ADD any_keys (average 8 times faster)

The patch uses sort buffer instead of key cache buffer when generating indexes, so speed of adding keys doesn't suffer from small key_cache_size value or lack of the physical RAM. sort_buffer_size value can be quite small (several Mbytes) for huge 100-million-row tables.

The patch has been successfully tested on different MyISAM tables, which contained wide range of rows count (from 100K to 180M).

The patch can be found in the files attached to this bug.

I just downloaded mysql 5.0.26 sources and noticed that nothing changed in sql/sql_table.cc, which can significantly impact on the speed of adding unique indexes to tables.

Here are several clarifications to the patch.

First of all, I forgot to mention, that the patch is based on Peter Zaitsev's note "MySQL: Loading large tables with Unique Keys" ( http://peter-zaitsev.livejournal.com/11772.html ).

Also I forgot to say that the patch works when adding unique (primary) keys only with IGNORE flag in ALTER TABLE, like this:

ALTER IGNORE TABLE ... ADD {UNIQUE|PRIMARY} key

The reason is described in the patch comments: 
"it is not safe to use patch when adding unique or primary key without IGNORE flag, because repair function doesn't restore original data file (.MYD) when duplicate keys found. It just removes duplicate rows from the table."

So, there is no legal way to restore (fast) original data file in the case when duplicate key will be found during repairing the table.
In the case when adding primary (unique) key into table without IGNORE flag, an old (safe and slow) per-row method will be used.

I've fixed several recovery bugs in the patch and tested it with mysql 5.0.26.

The main difference between new and old patch is that MYD file now copied before repairing, not moved. This decision slows downs speed of creating indexes, because mysql does full copy of data file before altering the table, but it allows to make recovery process safe and clear in case if any error occured during altering the table. 

Mysql patched by updated patch still wins unpatched mysql in speed on aforementioned ALTER TABLE quieries.

Note, that new version of the patch can restore original data file in the case when duplicate keys were found during reparing, but the patch still works with UNIQUE and PRIMARY keys only when IGNORE flag set due to another reason, described in the patch: because there are no easy ways to reproduce an old behaviour in case when mysql found duplicate key: printing out something like 
ERROR 1062 (23000): Duplicate entry '...' for key N

updated patch

Attachment: sql_table.cc.diff (application/octet-stream, text), 5.90 KiB.

The problem will be resolved with WL #1333.

Hello, has anyone reviewed the patch? I'm interested on the performance issues this patch fixes, is there a specific reason this was rejected?