Bug #26504 Adding indexes to a large existing table takes several hours
Submitted: 20 Feb 2007 17:08 Modified: 25 Apr 2007 13:38
Reporter: Eric Greveson Email Updates:
Status: Duplicate Impact on me:
Category:MySQL Server: MyISAM storage engine Severity:S5 (Performance)
Version:5.0.27-community OS:Microsoft Windows (Windows Server 2003 SP2)
Assigned to: Miguel Solorzano CPU Architecture:Any
Tags: alter, create, INDEX, myisam, performance, slow, table

[20 Feb 2007 17:08] Eric Greveson
I have a MyISAM table with about 7000000 rows, with the fields shown in the "How to repeat" section. The file size of the table is about 1.5GB for the data and 1.5GB for the indexes. I needed to alter the table to add an index on a previously unindexed column (due to lack of foresight). The index takes around half a day to create, during which time the table is locked for writing. The machine is a dual-core 3.4GHz Pentium D machine with 2 GB RAM. Server variables: The key buffer is 512MB, sort buffer is 2MB, MyISAM sort buffer is 500M, "Use concurrent inserts" is on, binary logging is on (with a connected replication slave), and other parameters are defaults for the general high-load server configuration. The indexing was tested when no other queries / connections were running. The "Server Connections / Threads" page of the Administrator (v1.2.9) shows the "State" as "copy to tmp table" for upwards of 9 hours. The 

How to repeat:
Create the table:

CREATE TABLE  `db`.`frames` (
  `id` bigint(20) unsigned NOT NULL auto_increment,
  `gpstime` double NOT NULL,
  `pos` point NOT NULL,
  `altitude` double NOT NULL,
  `pitch` double NOT NULL,
  `roll` double NOT NULL,
  `yaw` double NOT NULL,
  `device_id` int(10) unsigned NOT NULL,
  `path` varchar(256) NOT NULL,
  `frame_num` int(10) unsigned default NULL,
  `pos_accuracy` double NOT NULL default '2' COMMENT 'in metres',
  `rot_accuracy` double NOT NULL default '10' COMMENT 'in degrees',
  `uuid` varchar(45) NOT NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `UUIDIndex` (`uuid`),
  KEY `TimeIndex` (`gpstime`),
  KEY `AltIndex` (`altitude`),
  KEY `HeadingIndex` (`pitch`,`roll`,`yaw`),
  KEY `DeviceIndex` (`device_id`),
  SPATIAL KEY `SpatialIndex` (`pos`(32))

Populate with around 7000000 rows of random data, with about 100 characters in each row in the "path" field.

Execute the command:
ALTER TABLE `db`.`frames` ADD INDEX PathIndex(`path`);

This will lock the table for writing and initiate the creation of a temporary table. On a fast machine, this temporary table creation may take around half a day.

Suggested fix:
Part of this problem could be related to Bug #24495 (adding indexes becomes slower as more indexes are added). However, just copying 3GB of files takes a few minutes on the filesystem (NTFS), so the creation of the tmp table should be expected to be faster than half a day? A sensible algorithm to my mind would be:
- lock table for writes
- copy existing data into the new temporary table, interleaving blank chunks for space for the new index where appropriate
- create the new index or indexes, filling in the previously allocated space
- rename the temp table to the working table
- release lock

However I don't know how MySQL works, and this might be a stupid / unworkable suggestion. Additionally, "copy to tmp table" might mean that the indexing is being performed as well as the data copy (in parallel), in which case I don't know what the performance bottleneck is. I can say with certainty that my temporary data file grows fairly fast at the start (a few hundred MB per hour) but after a few hours it starts to slow down and eventually only adds a few 10's of MB per hour, which seems far too slow to me.
[25 Apr 2007 13:38] Miguel Solorzano
Thank you for the bug report. Duplicate of bug: http://bugs.mysql.com/bug.php?id=9544
[26 Dec 2007 13:08] Bugs System
This is a horrible bug / implementation flaw.

It is very counter productive.  The normal process for loading huge amounts of data does not seem to apply to MySQL.  I have a database of 83 million rows and tried to create an index on a field.  The system had to be brought down because it locked the table for way too long.  The time to create the index is vastly longer on IDE drives than SCSI.