Bug #64664 Batched inserts are as slow as individual inserts
Submitted: 15 Mar 2012 21:16 Modified: 16 Mar 2012 16:14
Reporter: Stefan Müller Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / J Severity:S5 (Performance)
Version:mysql connector java 5.1.18 OS:Any
Assigned to: CPU Architecture:Any

[15 Mar 2012 21:16] Stefan Müller
Description:
Batching insert statements with JDBC addBatch / executeBatch is just as slow as executing each statement individually. It seems that no batching is done.

A small test showed that this is slows bulk inserts down massively (I measured over a factor 10)!

For data-warehousing applications and a like this is should have a major impact.

Maybe my current approch is not optimal.
I'm very interested in suggestions of how to get large amounts of data into a newly created ISAM table very quickly.

How to repeat:
Insert many (10^6 or so) rows using a prepared statement.

Compare the execution time of

for(...){
  statement.set(...);
  statement.execute();
}

with

for(...){
  statement.set(...);
  statement.addBatch();
  if (every k iterations) statement.execute();
}

The time is equal for both and independent of k.

Now do the same thing but use

INSERT INTO test (...) VALUES (...),(...),(...) usw

to 'batch' the inserts together manually. On my machine the optimum is around 100 rows in one statement, resulting in more than a factor 12 (sic) performance boost. Going higher the factor settles at around 10 to 11.

I'll attach code to demonstrate.

Suggested fix:
The simplest thing might be to re-write the statement internally to use a multi-row insert statement. At least no protocol change would be required.
[15 Mar 2012 21:16] Stefan Müller
Creates and fills a table using different batching methods

Attachment: FillTable.java (application/octet-stream, text), 3.12 KiB.

[16 Mar 2012 16:14] Mark Matthews
Mysql itself doesn't have a "batch" mode for prepared statements. For certain cases, the driver can rewrite batches into much more efficient multi-value inserts, see http://www.jroller.com/mmatthews/entry/speeding_up_batch_inserts_for

(this feature is not enabled by default).

The other option is to use LOAD DATA [LOCAL] INFILE, which is even much more efficient (no SQL parsing is done).