Bug #38531 make innodb faster for concurrent insert/update workloads
Submitted: 2 Aug 2008 14:16 Modified: 12 Aug 2009 17:13
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S4 (Feature request)
Version:5.0,5.1,4.0,4.1 OS:Any
Assigned to: Inaam Rana CPU Architecture:Any
Tags: innodb, performance

[2 Aug 2008 14:16] Mark Callaghan
Description:
There is a simple way to make innodb much faster on servers with 8+ cores for workloads that do mostly short-running insert/update/delete statement.

How to repeat:
1) use mysqldump to dump a database
2) reload the database with concurrent sessions (1 per core)

Suggested fix:
1. Link with a scalable malloc replacement library (tcmalloc, mtmalloc)
2. Replace the memory heap in innobase/mem/mem0*.c with direct calls to malloc/free
[2 Aug 2008 14:46] Mark Callaghan
The functions to change are:
mem_area_alloc -> make it call malloc() and do nothing else
mem_area_free -> make it call free() and do nothing else
[4 Aug 2008 11:45] Susanne Ebrecht
Many thanks for writing a bug report.
[4 Aug 2008 13:39] Mark Callaghan
This change made my workload 6X faster on an 8 core server. The workload is a database reload -- 8 sessions reloading all tables and each session works on different tables. The server has enough IO capacity that the server is CPU bound.
[6 Aug 2008 14:52] Heikki Tuuri
Inaam, please study this.
[17 Aug 2008 1:23] Mark Callaghan
Using tcmalloc rather than libc malloc makes this about 10% faster. Almost all of the speedup is from removing the InnoDB heap.
[29 Aug 2008 5:56] Vadim TKACHENKO
Mark,

Can you provide more details about workload ?
We are trying to repeat, but not able to get significant improvements with your recipe
[29 Aug 2008 12:55] Mark Callaghan
What doesn't make things faster?
1) make mem_area_alloc()/mem_area_free() call malloc()/free() and do
nothing else?
2) linking with tcmalloc?

1) gives the big speedup, 6X
2) makes it about 10% faster after the speedup from 1)

This was measured on an 8 core server. The workload is a database
reload with 8 clients running concurrently on the db server. Each
table is reloaded by a different connection -- no contention for the
same table and indexes. The server is very CPU bound during this test
-- innodb_flush_log_at_trx_commit=0, the InnoDB log files are as large
as possible.
[30 Aug 2008 2:15] Vadim TKACHENKO
Mark,

We trying
1) make mem_area_alloc()/mem_area_free() call malloc()/free() and do
nothing else?

What we do - table sbtest
 CREATE TABLE `sbtest` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `name` varchar(64) NOT NULL default '',
  `email` varchar(64) NOT NULL default '',
  `password` varchar(64) NOT NULL default '',
  `dob` date default NULL,
  `address` varchar(128) NOT NULL default '',
  `city` varchar(64) NOT NULL default '',
  `state_id` tinyint(3) unsigned NOT NULL default '0',
  `zip` varchar(8) NOT NULL default '',
  `country_id` smallint(5) unsigned NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `email_idx` (`email`),
  KEY `country_id_idx` (`country_id`,`state_id`,`city`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=latin1 MAX_ROWS=1000000

1000000 rows.

mysqldump sbtest > sbtest.sql

then

for i in 1 2 3 4 5 6 7 8
time mysql t$i < sbtest.sql &

for orginal mysql and for malloc / free - 
the time is almost the same, may be ~10% difference.

--innodb_status_file=0 --innodb_data_home_dir=/home/vadim/data/org --innodb_data_file_path=ibdata1:10M:autoextend --innodb_log_group_home_dir=/home/vadim/data/org --innodb_buffer_pool_size=12G --innodb_additional_mem_pool_size=20M --innodb_log_file_size=1G --innodb_log_files_in_group=2 --innodb_log_buffer_size=200M --innodb_flush_log_at_trx_commit=2 --innodb_lock_wait_timeout=300 --innodb_thread_concurrency=0 --innodb_file_per_table=1
[30 Aug 2008 3:04] Mark Callaghan
Does each db connection insert into a different table or the same table? For my test, each connection inserts into different table. From what you have listed, all connections insert into the same table. That will create a lot of contention and move the bottlenecks elsewhere.

How many cores does this server have? You need 8+ cores to get the speedup because the problem for InnoDB without the code change gets much worse there.

Are the insert statements single-row or multi-row? For my tests, I use the mysqldump option so that each insert statement is multi-row. The clients run on the db server box. Single-row inserts, especially single-row inserts with autocommit will move the bottleneck elsewhere.
[30 Aug 2008 4:11] Mark Callaghan
I now know why you could not reproduce this. I used two workloads and the one I just described did not have much of a speedup. The one that has the speedup uses production data that I cannot share.

The workload with the speedup is the reload of a database using production data (100GB+ after reload) on an 8 core server with 8 concurrent connections. Each connection loads data for a different table and runs on host. The insert statements are multi-row inserts (the output from mysqldump with the right flags). In this case:
5.0.37 with innodb_thread_concurrency=20 -> 110,000 seconds
5.0.37 with innodb_thread_concurrency=4 -> ~35,000 seconds
5.0.37 with innodb malloc heap disabled -> ~15,000 seconds

The test above for 5.0.37 with innodb malloc heap disabled is being rerun for confirmation.

The other workload is easier for others to run. It is described in my earlier update to this. The speedup in that case is less dramatic. The results below are for reload using 1, 2, 4, 8 and 16 concurrent sessions into the same number of tables. The first column describes the binary:
* base -> 5.0.37 with many changes by us but no SMP fixes
* tc4 -> base and innodb_thread_concurrency=4
* smpfix -> base and the smp fixes
* onlymalloc -> base with the innodb malloc heap disabled

In this case, disabling makes a huge difference over base, but is not much better than using innodb_thread_concurrency=4.

base 15.02 33.00 75.04 17458.60 30494.63
tc4 15.14 32.69 76.17 148.57 294.60
smpfix 11.94 20.94 45.39 143.57 294.00
onlymalloc 12.56 22.87 47.67 140.96 280.10
[30 Aug 2008 5:08] Vadim TKACHENKO
Mark,

We insert into different databases / tables.

Box is 8 cores with 16GB but with weak IO subsystem.

By default mysqldump use --opt option and that generate multi-row insert, we use that.

What difference I see your table is big, I will try that size.
[30 Aug 2008 13:56] Mark Callaghan
I have updated the results here to include the 'onlymalloc' binary that has only the change suggested by this bug report. For the reload time test the speedup is significant. For the insert bogus data concurrently test, the speedup is not significant -- and that is what you found as well.

Results for sysbench will be added soon.

http://code.google.com/p/google-mysql-tools/wiki/SmpPerformance
[24 Oct 2008 13:28] Heikki Tuuri
Marking this as a feature request as this involves some new code.
[12 Aug 2009 17:13] Inaam Rana
This feature is introduced in plugin 1.0.3

binaries/source and documentation available at innodb.com