Bug #24495 "create index" gets slower the more indexes there are
Submitted: 22 Nov 2006 3:37 Modified: 27 Sep 2008 10:42
Reporter: P Eger Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: DDL Severity:S5 (Performance)
Version:5.1.14-BK, 5.1.12-beta,5.0.24a OS:Linux (Linux, Windows Server 2003 x64)
Assigned to: CPU Architecture:Any

[22 Nov 2006 3:37] P Eger
Description:
Below is a log from our application, showing decreasing performance with each additional index that gets added. Table type is MyISAM, create table statement is below & it was filled with 813449 rows of data at the time.

It looks like each additional "create index" has to scan all the other indexes, though a can't see why it would need to do any more than scan the table.

I have included a couple relevant settings from my.ini

I have not confirmed if this behaviour exists in mysql 5.0.X or on other OSs.

tmp_table_size=25M
myisam_max_sort_file_size=1M
myisam_max_extra_sort_file_size=1M
myisam_sort_buffer_size=35M
key_buffer_size=400M
read_buffer_size=1M
read_rnd_buffer_size=1M
sort_buffer_size=5M

---------------------------------------
create table done
813449 inserts in 10.154 sec
creating indexes
ddl took 11.265 sec : create unique index t401_i00 on t401(c0)
ddl took 24.438 sec : create index t401_i01 on t401(c1)
ddl took 37.375 sec : create index t401_i02 on t401(c4)
ddl took 48.922 sec : create index t401_i03 on t401(c5,c6,c7)
ddl took 58.375 sec : create index t401_i04 on t401(c5,c14,c6)
ddl took 01 min 10.062 sec : create index t401_i05 on t401(c6)
ddl took 01 min 20.422 sec : create index t401_i06 on t401(c7)
ddl took 01 min 33.641 sec : create index t401_i07 on t401(c8)
ddl took 01 min 44.297 sec : create index t401_i08 on t401(c12)
ddl took 01 min 57.906 sec : create index t401_i09 on t401(c13)
ddl took 02 min 09.234 sec : create index t401_i10 on t401(c14,c5,c6)
ddl took 02 min 20.938 sec : create index t401_i11 on t401(c15)
ddl took 02 min 28.359 sec : create index t401_i12 on t401(c16)
ddl took 02 min 37.375 sec : create index t401_i13 on t401(c19)
ddl took 02 min 56.735 sec : create index t401_i14 on t401(c20)
ddl took 03 min 00.703 sec : create index t401_i15 on t401(c23,c5,c6)
ddl took 03 min 14.844 sec : create index t401_i16 on t401(c27)

How to repeat:
1)
create table t401(c0 BINARY(16) not null,c1 int,c2 int,c3 BINARY(16),c4 smallint,c5 tinyint unsigned,c6 smallint,c7 int,c8 smallint,c9 tinyint unsigned,c10 tinyint unsigned,c11 tinyint unsigned,c12 int,c13 tinyint unsigned,c14 varchar(7),c15 int,c16 int,c17 BINARY(16),c18 tinyint unsigned,c19 int,c20 int,c21 smallint,c22 varbinary(256) COMMENT 'C_GENERIC_URL',c23 smallint,c24 int,c25 smallint,c26 varchar(17),c27 tinyint unsigned,c28 int,c29 varbinary(256) COMMENT 'C_GENERIC_URL',c30 tinyint unsigned,c31 varchar(20),c32 varbinary(51) COMMENT 'C_GENERIC_TEXT',c33 smallint,c34 varchar(50)) ENGINE = MyISAM;

2) Fill with 1 million rows of test data.

3) Create & time each of the above indexes.
[22 Nov 2006 4:14] P Eger
I have confirmed this behaviour in Mysql 5.0.24a, same OS as above.

This bug actually makes it quicker to create the indexes before the data is inserted, which is massively counterintuitive!
[22 Nov 2006 12:45] Valeriy Kravchuk
Thank you for a problem report. Verified almost as described with 5.1.14-BK on Linux:

openxs@suse:~/dbs/5.1> bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.14-beta-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table t401(c0 BINARY(16) not null,c1 int,c2 int,c3 BINARY(16),c4
    -> smallint,c5 tinyint unsigned,c6 smallint,c7 int,c8 smallint,c9 tinyint
    -> unsigned,c10 tinyint unsigned,c11 tinyint unsigned,c12 int,c13 tinyint
    -> unsigned,c14 varchar(7),c15 int,c16 int,c17 BINARY(16),c18 tinyint unsig
ned,c19
    -> int,c20 int,c21 smallint,c22 varbinary(256) COMMENT 'C_GENERIC_URL',c23
    -> smallint,c24 int,c25 smallint,c26 varchar(17),c27 tinyint unsigned,c28 i
nt,c29
    -> varbinary(256) COMMENT 'C_GENERIC_URL',c30 tinyint unsigned,c31 varchar(
20),c32
    -> varbinary(51) COMMENT 'C_GENERIC_TEXT',c33 smallint,c34 varchar(50)) ENG
INE =
    -> MyISAM;
Query OK, 0 rows affected (0.02 sec)

mysql> alter table t401 modify c0 int not null auto_increment primary key;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into t401(c1, c2, c4) values (rand(), rand(), rand());
Query OK, 1 row affected (0.00 sec)

mysql> insert into t401(c1, c2, c4) select rand(), rand(), rand() from t401;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into t401(c1, c2, c4) select rand(), rand(), rand() from t401;
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

...

mysql> insert into t401(c1, c2, c4) select rand(), rand(), rand() from t401;
Query OK, 131072 rows affected (4.07 sec)
Records: 131072  Duplicates: 0  Warnings: 0

mysql> create index t401_i01 on t401(c1);
Query OK, 262144 rows affected (10.13 sec)
Records: 262144  Duplicates: 0  Warnings: 0

mysql> create index t401_i02 on t401(c2);
Query OK, 262144 rows affected (11.92 sec)
Records: 262144  Duplicates: 0  Warnings: 0

mysql> create index t401_i04 on t401(c4);
Query OK, 262144 rows affected (13.98 sec)
Records: 262144  Duplicates: 0  Warnings: 0

So, yes, each next index is created slower than previous one. It may be expected behaviour, because (see http://dev.mysql.com/doc/refman/5.1/en/create-index.html) table is really copied  when CREATE INDEX is executed (as CREATE INDEX is mapped to ALTER TABLE). But, in any case, it is counter-intuitive and different from other RDBMSes, so, I still think it is a performance problem to be eventually solved (and/or explicitely documented).
[23 Nov 2006 1:44] P Eger
Glad you can reproduce. This is a pretty big performance limiter for recreating a database/table from scratch. The sequence of 1) create new table 2) insert data bulk 3) create all indexes, is pretty standard method of bulk loading data into a table optimally. Can you recommend another method to eliminate the overhead of maintaining the indexes on every IUD? "load data infile" is a possible option, but may be difficult/impossible since we are accessing through jdbc and this would require shell access. Also, http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html seems to imply that "myisamchk -rq /path/to/db/tbl_name" will recreate all indexes fast after doing a "myisamchk --keys-used=0 -rq". Is this implemented different/faster in myisamchk then in mysqld? Again, shelling to external scripts would be very sub-optimal for us (complexity, portability, error handling, etc), but if that is the only option i would give it a shot.
[28 Nov 2006 13:31] Sergei Golubchik
The workaround is adding all indexes in one statement:

ALTER TABLE xxx ADD INDEX ..., ADD INDEX ..., ADD INDEX ...
[28 Nov 2006 18:45] Mark Matthews
Just a quick comment "LOAD DATA [LOCAL] INFILE" works fine with the JDBC driver, and in fact is a very efficient way of bulk loading tables, compared to crafting the INSERTs yourself.
[6 Dec 2006 3:56] P Eger
Thanks guys, i have implemented both of these suggestions to good effect. alter table with a comma separated list does the trick for index creation & load data infile is far superior to even properly batched insert() statements. Still don't like the slow create index though ;-)
[27 Sep 2008 10:41] Konstantin Osipov
This can only be fixed if the engine supports online add/drop index.
MyISAM does support that, other engines move in that direction.
There is no server bug, since the framework is in place.
[27 Sep 2008 10:42] Konstantin Osipov
Please file a feature request against the engine you use to support online add/drop index. The server framework is in place starting from 5.1