Bug #27152 LOAD DATA INFILE slower when inserting successive datasets
Submitted: 14 Mar 2007 19:18 Modified: 28 Mar 2007 11:21
Reporter: Guillaume Lefranc Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S5 (Performance)
Version:5.0.32-log OS:Linux (Ubuntu 6.06 LTS x86_64)
Assigned to: CPU Architecture:Any

[14 Mar 2007 19:18] Guillaume Lefranc
Description:
When inserting large datasets (around 20 million lines), the successive inserts are a lot slower than the first one.

Verified against 5.0.32-enterprise and 5.1.16-beta.

How to repeat:
CREATE TABLE t1 (
   f1 int(10) unsigned NOT NULL DEFAULT '0',
   f2 int(10) unsigned DEFAULT NULL,
   f3 char(33) CHARACTER SET latin1 NOT NULL DEFAULT '',
   f4 char(15) DEFAULT NULL,
   f5 datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
   f6 char(40) CHARACTER SET latin1 DEFAULT NULL,
   f7 text CHARACTER SET latin1,
  KEY f1_idx (f1),
  KEY f5_idx (f5)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

Insert around 20 million lines.

mysql> load data infile '/data/dump/logarchive_200702_aa' into table logarchive_new;
Query OK, 19999963 rows affected, 9384 warnings (3 min 33.41 sec)
Records: 19999963  Deleted: 0  Skipped: 0  Warnings: 2740

mysql> load data infile '/data/dump/logarchive_200702_aa' into table logarchive_new;
Query OK, 19999963 rows affected, 9384 warnings (22 min 59.68 sec)
Records: 19999963  Deleted: 0  Skipped: 0  Warnings: 2740

mysql> load data infile '/data/dump/logarchive_200702_aa' into table logarchive_new;
Query OK, 19999963 rows affected, 9384 warnings (24 min 12.85 sec)
Records: 19999963  Deleted: 0  Skipped: 0  Warnings: 2740

Suggested fix:
This behavior might be related to index reconstruction, not sure if it's really fixable in this state. It looks like the index is not rebuilt in batch state though.
[28 Mar 2007 10:43] MySQL Verification Team
Hello Guillaume,

This is expected behaviour that LOAD DATA into empty MyISAM table is faster then loading the same data into non-empty table. This is quote from our manual:

LOAD DATA INFILE performs the preceding optimization automatically if the MyISAM table into which you insert data is empty. The main difference between automatic optimization and using the procedure explicitly is that you can let myisamchk allocate much more temporary memory for the index creation than you might want the server to allocate for index re-creation when it executes the LOAD DATA INFILE statement.

http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

So to speed up load you should disable keys, then load data and enable keys again.
[28 Mar 2007 11:21] Guillaume Lefranc
Thanks for your answer, I was partially aware of this behaviour. My suggestion was that (enable/disable keys) should be used by default.
[28 Mar 2007 18:50] Sergei Golubchik
It cannot be used by default, because this optimization only helps when you add many rows to the table. If the table is already quite big, and LOAD DATA adds only a small fraction of the existing table size then disabling/enabling keys would make the load slower, not faster.