Bug #9544 very slow index creation
Submitted: 31 Mar 2005 22:10 Modified: 22 Apr 2005 18:21
Reporter: Thad Welch Email Updates:
Status: Not a Bug Impact on me:
Category:MySQL Server: MyISAM storage engine Severity:S2 (Serious)
Version:4.1.10 OS:Linux (Linux, Red Hat 8)
Assigned to: CPU Architecture:Any

[31 Mar 2005 22:10] Thad Welch
I'm trying to create indexes on a table with 308 million rows. It took ~20 minutes to load the table but 10 days to build indexes on it. The table's MYD file is 3.2G and its MYI file is 7.7G.

How to repeat:
CREATE TABLE `individual_genotype_single_bp` (
  `variation_id` int(11) NOT NULL default '0',
  `allele_1` char(1) default NULL,
  `allele_2` char(1) default NULL,
  `individual_id` int(11) default NULL

load ~308 millions rows.

alter table individual_genotype_single_bp add index variation_idx (variation_id), add index individual_idx (individual_id);
[22 Apr 2005 18:21] Hartmut Holzgraefe
Writing the actual table does only append rows at the end of the data file, this can be done very fast. Creating an index on the other hand needs keys to be sorted into the index tree, involving tree reorganization as the ammount of data grows. With two indexes and 308 million rows the time per index key in your case is ~1.5ms which is not too bad.

Creating the table with DELAY_KEY_WRITE=1 option and a big key_buffer may help
to save I/O operations though
[25 Apr 2007 13:39] Miguel Solorzano
Bug: http://bugs.mysql.com/bug.php?id=26504 was marked as duplicate
of this one.
[2 Mar 2009 21:35] Tyson Whitehead
I've ran into this problem as well and I fail to see why MySQL cannot do something like a n-way merge sort on disk first so the index can be built in order.[1]

(isn't this what already happens when repairing indexes)

[1]  Database--principles, Programming, and Performance
[3 Mar 2009 16:35] Tyson Whitehead
MySQL has now been going for several days building my second index alone, so I paused it and tried Postgres last night.  It imported and built both indexes in around 40min.

It really seems that the problem is MySQL is not sorting/partial sorting to improve key insertion locality when adding large number of keys to an index (i.e., building it from scratch or loading millions of rows from a file).  From several hours of googling, reading bug reports and documentation, I have set 

key_buffer_size           = 512MB
myisam_sort_buffer_size   = 128MB
myisam_max_extra_sort_file_size = 18446744073709551615
myisam_max_sort_file_size = 18446744073709551615

to try and get MySQL to use sorting, but *sort_file_size only seems to effect table repair.

From the googling, it also seemed that I am not alone in bumping into this index-building data-loading performance show stopper.  Perhaps then the status of this report should be changed from not a bug to a performance bug or a feature request.
[3 Nov 2009 11:07] Ezekiel Macharia
Insert into table then create index (faster)
Creating an index then updating the table (very slow)