Bug #22487 mysql performs extremely slow when creating unique indexes on huge table
Submitted: 19 Sep 2006 17:57 Modified: 14 Mar 2007 12:00
Reporter: Alexander Valyalkin Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DDL Severity:S5 (Performance)
Version:5.0.26 OS:Any
Assigned to: Evgeny Potemkin CPU Architecture:Any
Tags: Contribution
Triage: Triaged: D5 (Feature request) / R2 (Low) / E5 (Major)

[19 Sep 2006 17:57] Alexander Valyalkin
Description:
Mysql performs extremely slow when creating unique indexes or primary key on huge table by ALTER TABLE ADD UNIQUE INDEX query. 

I know, that key_buffer_size parameter can improve mysql performance in this case for relative small tables, but become useless when unique index size become bigger than available RAM size.

Nowadays widespreaded computers and operating systems have 4G limit of userspace memory, while table sizes constantly become bigger. So the problem becomes more and more real every day.

It seems that bug #9544 has the same origin.

How to repeat:
1) create table without indexes and fill it by large amount of data (more than 10G)
2) add unique index or primary key to this table by ALTER TABLE ADD [UNIQUE INDEX | PRIMARY KEY ]
3) wait for eternity ;)

Suggested fix:
I noticed that mysql uses slow per-row algorithm when creating unique indexes, while it uses fast filesort algorithm when creating non-unique indexes. Why don't use fast algorithm for creating unique indexes too?
[13 Oct 2006 13:55] Valeriy Kravchuk
Thank you for a problem report. Have you got this behaviour (slow UNIQUE index creation while non-UNIQUE index is created obviously faster, on the same table with the same (unique) data and same my.cnf settings) for MyISAM tables only or for InnoDB also? Please, try to check that with a newer version, 5.0.26, and inform about the results.
[13 Oct 2006 14:55] Alexander Valyalkin
sql/sql_table.cc diff for mysql 5.0.24

Attachment: sql_table.cc.diff (application/octet-stream, text), 7.34 KiB.

[13 Oct 2006 14:56] Alexander Valyalkin
Valeriy, thank you for your answer. InnoDB tables also shows performance degradation when switching from non-unique to unique keys on the same set of columns.

I investigated the problem and wrote the patch for sql/sql_table.cc file of mysql 5.0.24 sources, which significantly improves performance on the following SQL queries againist MyISAM tables:
ALTER TABLE ... ADD {PRIMARY|UNIQUE} keys (average 8 times faster)
ALTER TABLE ... DROP any_keys (average 5 times faster)

It also shows speed improvements, when the table already contains unique or primary keys on the following query:
ALTER TABLE ... ADD any_keys (average 8 times faster)

The patch uses sort buffer instead of key cache buffer when generating indexes, so speed of adding keys doesn't suffer from small key_cache_size value or lack of the physical RAM. sort_buffer_size value can be quite small (several Mbytes) for huge 100-million-row tables.

The patch has been successfully tested on different MyISAM tables, which contained wide range of rows count (from 100K to 180M).

The patch can be found in the files attached to this bug.
[13 Oct 2006 16:06] Alexander Valyalkin
I just downloaded mysql 5.0.26 sources and noticed that nothing changed in sql/sql_table.cc, which can significantly impact on the speed of adding unique indexes to tables.
[17 Oct 2006 7:36] Alexander Valyalkin
Here are several clarifications to the patch.

First of all, I forgot to mention, that the patch is based on Peter Zaitsev's note "MySQL: Loading large tables with Unique Keys" ( http://peter-zaitsev.livejournal.com/11772.html ).

Also I forgot to say that the patch works when adding unique (primary) keys only with IGNORE flag in ALTER TABLE, like this:

ALTER IGNORE TABLE ... ADD {UNIQUE|PRIMARY} key

The reason is described in the patch comments: 
"it is not safe to use patch when adding unique or primary key without IGNORE flag, because repair function doesn't restore original data file (.MYD) when duplicate keys found. It just removes duplicate rows from the table."

So, there is no legal way to restore (fast) original data file in the case when duplicate key will be found during repairing the table.
In the case when adding primary (unique) key into table without IGNORE flag, an old (safe and slow) per-row method will be used.
[24 Oct 2006 16:13] Alexander Valyalkin
I've fixed several recovery bugs in the patch and tested it with mysql 5.0.26.

The main difference between new and old patch is that MYD file now copied before repairing, not moved. This decision slows downs speed of creating indexes, because mysql does full copy of data file before altering the table, but it allows to make recovery process safe and clear in case if any error occured during altering the table. 

Mysql patched by updated patch still wins unpatched mysql in speed on aforementioned ALTER TABLE quieries.

Note, that new version of the patch can restore original data file in the case when duplicate keys were found during reparing, but the patch still works with UNIQUE and PRIMARY keys only when IGNORE flag set due to another reason, described in the patch: because there are no easy ways to reproduce an old behaviour in case when mysql found duplicate key: printing out something like 
ERROR 1062 (23000): Duplicate entry '...' for key N
[24 Oct 2006 16:14] Alexander Valyalkin
updated patch

Attachment: sql_table.cc.diff (application/octet-stream, text), 5.90 KiB.

[13 Mar 2007 20:10] Igor Babaev
The problem will be resolved with WL #1333.
[10 Jul 2009 9:50] andy andy
Hello, has anyone reviewed the patch? I'm interested on the performance issues this patch fixes, is there a specific reason this was rejected?