Description:
I have a MyISAM table with about 7000000 rows, with the fields shown in the "How to repeat" section. The file size of the table is about 1.5GB for the data and 1.5GB for the indexes. I needed to alter the table to add an index on a previously unindexed column (due to lack of foresight). The index takes around half a day to create, during which time the table is locked for writing. The machine is a dual-core 3.4GHz Pentium D machine with 2 GB RAM. Server variables: The key buffer is 512MB, sort buffer is 2MB, MyISAM sort buffer is 500M, "Use concurrent inserts" is on, binary logging is on (with a connected replication slave), and other parameters are defaults for the general high-load server configuration. The indexing was tested when no other queries / connections were running. The "Server Connections / Threads" page of the Administrator (v1.2.9) shows the "State" as "copy to tmp table" for upwards of 9 hours. The
How to repeat:
Create the table:
CREATE TABLE `db`.`frames` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`gpstime` double NOT NULL,
`pos` point NOT NULL,
`altitude` double NOT NULL,
`pitch` double NOT NULL,
`roll` double NOT NULL,
`yaw` double NOT NULL,
`device_id` int(10) unsigned NOT NULL,
`path` varchar(256) NOT NULL,
`frame_num` int(10) unsigned default NULL,
`pos_accuracy` double NOT NULL default '2' COMMENT 'in metres',
`rot_accuracy` double NOT NULL default '10' COMMENT 'in degrees',
`uuid` varchar(45) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UUIDIndex` (`uuid`),
KEY `TimeIndex` (`gpstime`),
KEY `AltIndex` (`altitude`),
KEY `HeadingIndex` (`pitch`,`roll`,`yaw`),
KEY `DeviceIndex` (`device_id`),
SPATIAL KEY `SpatialIndex` (`pos`(32))
) ENGINE=MyISAM AUTO_INCREMENT=8269975 DEFAULT CHARSET=latin1;
Populate with around 7000000 rows of random data, with about 100 characters in each row in the "path" field.
Execute the command:
ALTER TABLE `db`.`frames` ADD INDEX PathIndex(`path`);
This will lock the table for writing and initiate the creation of a temporary table. On a fast machine, this temporary table creation may take around half a day.
Suggested fix:
Part of this problem could be related to Bug #24495 (adding indexes becomes slower as more indexes are added). However, just copying 3GB of files takes a few minutes on the filesystem (NTFS), so the creation of the tmp table should be expected to be faster than half a day? A sensible algorithm to my mind would be:
- lock table for writes
- copy existing data into the new temporary table, interleaving blank chunks for space for the new index where appropriate
- create the new index or indexes, filling in the previously allocated space
- rename the temp table to the working table
- release lock
However I don't know how MySQL works, and this might be a stupid / unworkable suggestion. Additionally, "copy to tmp table" might mean that the indexing is being performed as well as the data copy (in parallel), in which case I don't know what the performance bottleneck is. I can say with certainty that my temporary data file grows fairly fast at the start (a few hundred MB per hour) but after a few hours it starts to slow down and eventually only adds a few 10's of MB per hour, which seems far too slow to me.