Bug #22487 mysql performs extremely slow when creating unique indexes on huge table
Submitted: 19 Sep 2006 19:57 Modified: 14 Mar 2007 13:00
Reporter: Alexander Valyalkin
Status: Verified
Category:Server: DDL Severity:S5 (Performance)
Version:5.0.26 OS:Any
Assigned to: Evgeny Potemkin Target Version:
Tags: Contribution
Triage: Triaged: D5 (Feature request) / R2 (Low) / E5 (Major)

[19 Sep 2006 19:57] Alexander Valyalkin
Description:
Mysql performs extremely slow when creating unique indexes or primary key on huge table
by ALTER TABLE ADD UNIQUE INDEX query. 

I know, that key_buffer_size parameter can improve mysql performance in this case for
relative small tables, but become useless when unique index size become bigger than
available RAM size.

Nowadays widespreaded computers and operating systems have 4G limit of userspace memory,
while table sizes constantly become bigger. So the problem becomes more and more real
every day.

It seems that bug #9544 has the same origin.

How to repeat:
1) create table without indexes and fill it by large amount of data (more than 10G)
2) add unique index or primary key to this table by ALTER TABLE ADD [UNIQUE INDEX |
PRIMARY KEY ]
3) wait for eternity ;)

Suggested fix:
I noticed that mysql uses slow per-row algorithm when creating unique indexes, while it
uses fast filesort algorithm when creating non-unique indexes. Why don't use fast
algorithm for creating unique indexes too?
[13 Oct 2006 15:55] Valeriy Kravchuk
Thank you for a problem report. Have you got this behaviour (slow UNIQUE index creation
while non-UNIQUE index is created obviously faster, on the same table with the same
(unique) data and same my.cnf settings) for MyISAM tables only or for InnoDB also?
Please, try to check that with a newer version, 5.0.26, and inform about the results.
[13 Oct 2006 16:55] Alexander Valyalkin
sql/sql_table.cc diff for mysql 5.0.24

Attachment: sql_table.cc.diff (application/octet-stream, text), 7.34 KiB.

[13 Oct 2006 16:56] Alexander Valyalkin
Valeriy, thank you for your answer. InnoDB tables also shows performance degradation when
switching from non-unique to unique keys on the same set of columns.

I investigated the problem and wrote the patch for sql/sql_table.cc file of mysql 5.0.24
sources, which significantly improves performance on the following SQL queries againist
MyISAM tables:
ALTER TABLE ... ADD {PRIMARY|UNIQUE} keys (average 8 times faster)
ALTER TABLE ... DROP any_keys (average 5 times faster)

It also shows speed improvements, when the table already contains unique or primary keys
on the following query:
ALTER TABLE ... ADD any_keys (average 8 times faster)

The patch uses sort buffer instead of key cache buffer when generating indexes, so speed
of adding keys doesn't suffer from small key_cache_size value or lack of the physical
RAM. sort_buffer_size value can be quite small (several Mbytes) for huge 100-million-row
tables.

The patch has been successfully tested on different MyISAM tables, which contained wide
range of rows count (from 100K to 180M).

The patch can be found in the files attached to this bug.
[13 Oct 2006 18:06] Alexander Valyalkin
I just downloaded mysql 5.0.26 sources and noticed that nothing changed in
sql/sql_table.cc, which can significantly impact on the speed of adding unique indexes to
tables.
[17 Oct 2006 9:36] Alexander Valyalkin
Here are several clarifications to the patch.

First of all, I forgot to mention, that the patch is based on Peter Zaitsev's note
"MySQL: Loading large tables with Unique Keys" (
http://peter-zaitsev.livejournal.com/11772.html ).

Also I forgot to say that the patch works when adding unique (primary) keys only with
IGNORE flag in ALTER TABLE, like this:

ALTER IGNORE TABLE ... ADD {UNIQUE|PRIMARY} key

The reason is described in the patch comments: 
"it is not safe to use patch when adding unique or primary key without IGNORE flag,
because repair function doesn't restore original data file (.MYD) when duplicate keys
found. It just removes duplicate rows from the table."

So, there is no legal way to restore (fast) original data file in the case when duplicate
key will be found during repairing the table.
In the case when adding primary (unique) key into table without IGNORE flag, an old (safe
and slow) per-row method will be used.
[24 Oct 2006 18:13] Alexander Valyalkin
I've fixed several recovery bugs in the patch and tested it with mysql 5.0.26.

The main difference between new and old patch is that MYD file now copied before
repairing, not moved. This decision slows downs speed of creating indexes, because mysql
does full copy of data file before altering the table, but it allows to make recovery
process safe and clear in case if any error occured during altering the table. 

Mysql patched by updated patch still wins unpatched mysql in speed on aforementioned
ALTER TABLE quieries.

Note, that new version of the patch can restore original data file in the case when
duplicate keys were found during reparing, but the patch still works with UNIQUE and
PRIMARY keys only when IGNORE flag set due to another reason, described in the patch:
because there are no easy ways to reproduce an old behaviour in case when mysql found
duplicate key: printing out something like 
ERROR 1062 (23000): Duplicate entry '...' for key N
[24 Oct 2006 18:14] Alexander Valyalkin
updated patch

Attachment: sql_table.cc.diff (application/octet-stream, text), 5.90 KiB.

[13 Mar 2007 21:10] Igor Babaev
The problem will be resolved with WL #1333.
[10 Jul 11:50] andy andy
Hello, has anyone reviewed the patch? I'm interested on the performance issues this patch
fixes, is there a specific reason this was rejected?