Bug #29413 Maximum performance of OLTP benchmark is not so scalable on multi-cpu
Submitted: 28 Jun 2007 8:38 Modified: 12 Aug 2009 17:14
Reporter: Yasufumi Kinoshita Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S4 (Feature request)
Version:5.0.41 OS:Any
Assigned to: Inaam Rana CPU Architecture:Any
Tags: Contribution

[28 Jun 2007 8:38] Yasufumi Kinoshita
Description:
The following five settings are required for OLTP performance scaling to CPU(core) number.
At least for this benchmark trial, all of the five settings are required at a time.

<Benchmark>
Conditions:
TPC-C based workload (Scale Factor = 160 [approximately 16GB or more])
Opteron 8220SE 2.8GHz (Dual Core) x 4
Memory 16GB (buffer pool 10GB for InnoDB)
External RAID Storage(Fiber Channel)
Enough ramp-up period is taken

Results:
As you see the graph (performance.png).

<Required Settings>

=====1. Disable "read-ahead" =====

 Read-ahead sweeps effective blocks away and causes useless reading-IO
 when we have few free blocks and high buffer pool hit rate.

=====2. Tune the activities of insert buffer merging =====
 (increase parameters for ibuf_contract_for_n_pages() in srv0srv.c)[* x100 (in our case)]

 The enlarged insert buffer causes performance slowdown or parallelism reduction.
 Moreover it also makes buffer hit rate decline when we use small buffer pool.

=====3. Tune the activities of dirty block flushing =====
 (increase parameters for buf_flush_batch() in srv0srv.c)[* x50..x100 (in our case)]

 A cost (risk) of free block allocation increases by increasing of dirty blocks.

 * We may not be able to set optimally by only innodb_max_dirty_pages_pct.

=====4. Enhance rwlock implementation (like Bug#26442)=====

http://bugs.mysql.com/bug.php?id=26442

It is also effective for the congestion of tree->lock or block->lock using.

=====5. Disable "Dowblewrite" =====

Doublewrite processing may have a logical limit of its performance.
And the limit seems not to scale with a number of CPU(core)..

How to repeat:
To execute the TPC-C based workload (like OSDL DBT-2?) on the equivalent environments.

Suggested fix:
=====1.=====
 Adding the parameter
 like [innodb_read_ahead = true, false]

OR

 To change the condition
 at buf0rea.c: in buf_read_ahead_random(), buf_read_ahead_linear()

 [from]
 if (buf_pool->n_pend_reads >
                 buf_pool->curr_size / BUF_READ_AHEAD_PEND_LIMIT) {

 (This condition may hardly make sense with large buffer pool..)

 [to]
 if ( [[few n_pend_reads]] && [[few free blocks]] && [[high buffer pool hit rate]] ) {

=====2.===== =====3.=====
 Making these parameters tunable.

 ex.
 mysql-5.0.41_control_flush_and_merge.patch

=====4.=====
 ex.
 http://bugs.mysql.com/bug.php?id=26442
 mysql-5.0.41_optimize_rwlock_stable.patch
 mysql-5.0.41_optimize_block_mutex_etc.patch

=====5.=====
(Should we only set innodb_doublewrite = false ?)
[28 Jun 2007 8:39] Yasufumi Kinoshita
The benchmark results

Attachment: performance.png (image/png, text), 918 bytes.

[28 Jun 2007 8:41] Yasufumi Kinoshita
example

Attachment: mysql-5.0.41_control_flush_and_merge.patch (application/octet-stream, text), 8.21 KiB.

[2 Jul 2007 0:57] Yasufumi Kinoshita
Sorry, I mistook.

<<<<<<wrong>>>>>>
 [to]
 if ( [[few n_pend_reads]] && [[few free blocks]] && [[high buffer pool hit rate]] ) {

<<<<<correct>>>>>>
 [to]
 if ( [[many n_pend_reads]] || ( [[few free blocks]] && [[high buffer pool hit rate]] ) ) {
[13 Aug 2007 9:53] Yasufumi Kinoshita
Heikki,

I have found two more effective settings to improve the performance of this benchmark workload.

=====6. Parallelize and increase IO threads. (Reading threads especially)=====

Generally, we have to request parallel read-IO, for the maximum performance of RAID storages.
So, we should use parallel IO threads or asynchronous IO.

( mysql-5.0.41_control_io-threads.patch.gz )
	Enable "innodb_file_io_threads" (accept values >4)
	Dispatch IO requests to threads by each 64 blocks' striping

=====7. Optimize relatives of freeing blocks =====

If there are no free blocks with heavy load,
"free blocks margin check", "LRU flush", "search and free block" and etc..
cause buf_pool->mutex contentions.

( mysql-5.0.41_split_buf_pool_mutex_fixed_optimistic2.patch.gz )
	Split buf_pool->mutex
	Tune the use of buf_pool->LRU_mutex
		(ex.) Make buf_flush_LRU_recommendation and buf_LRU_search_and_free_block optimistic
		      etc..

--------------------------------------------------------
The following result (sequencial_resulut_new.pdf) shows effects of the above two patches.
These buffer pool hit rates in balanced condition are relatively high (998/1000 or more).
So I expect more remarkable effects in more IO-bounded conditions.

-50 sessions
-No thinktime

normal:
	With no patches but best settings.
	So, this result is a little better than the previous one.

	(ex.) innodb_doublewrite = false
	      innodb_thread_concurrency = 0 (unlimited)

base:
	"normal" +
	1. Disable "read-ahead"
	2. Tune the activities of insert buffer merging
	3. Tune the activities of dirty block flushing
	4. Enhance rwlock implementation (like Bug#26442)
	(5. Disable "Dowblewrite" *same to "normal")

io-t:
	"base" +
	6. Parallelize and increase IO threads. (innodb_file_io_threads = 12)

full:
	"io-t" +
	7. Optimize relatives of freeing blocks

-- Fig.2 shows IO performances. 
    "base"(no read-ahead, single read thread) is worst result..
    But we can say "multi read threads" is much better IO-strategy than "read-ahead" in this condition.

-- Fig.3 shows stationary performance. (* with modified blocks increasing gradually..)
    "normal": Read-ahed may cause surplus readings and may waste IO bandwidth..
    "base":   IO bandwidth is used effectively but it is not so broad..
    "io-t":   A peak performance is enhanced.
              But the stationary performance may be not different to "base",
              if there are many modified blocks.
              (because of buf_pool->mutex contention..)
    "full":   No trouble.
[13 Aug 2007 9:54] Yasufumi Kinoshita
Parallelize and increase IO threads

Attachment: mysql-5.0.41_control_io-threads.patch.gz (application/octet-stream, text), 813 bytes.

[13 Aug 2007 9:55] Yasufumi Kinoshita
Optimize relatives of freeing blocks

Attachment: mysql-5.0.41_split_buf_pool_mutex_fixed_optimistic2.patch.gz (application/octet-stream, text), 7.21 KiB.

[13 Aug 2007 9:56] Yasufumi Kinoshita
new results

Attachment: sequencial_resulut_new.pdf (application/force-download, text), 46.20 KiB.

[13 Aug 2007 9:58] Yasufumi Kinoshita
add tuning parameters in this report

Attachment: mysql-5.0.41_control_flush_and_merge_and_read_ahead.patch.gz (application/octet-stream, text), 2.45 KiB.

[4 Mar 2008 17:18] Heikki Tuuri
Inaam,

please study Yasufumi's ideas about:

1) less readahead
2) insert buffer
3) size of a flush batch

These should improve InnoDB's performance under sysbench.

Regards,

Heikki
[1 Aug 2008 4:50] Ben Handy
I am really interested in using the patch mysql-5.0.41_split_buf_pool_mutex_fixed_optimistic2.patch.gz.  It eliminates a lot of the buffer pool contention on my 8 cpu machines.

This was posted by Yasufumi over a year ago.  Has anybody else been using it?

Yasufumi: have you put much run-time on this over the past year?  Have you encountered any problems with it?

Thanks,
-Ben
[2 Aug 2008 12:53] Yasufumi Kinoshita
latest patch (fixed just a little)

Attachment: mysql-5.0.54_split_buf_pool_mutex_fixed_optimistic.patch.gz (application/x-gzip, text), 7.09 KiB.

[2 Aug 2008 12:58] Yasufumi Kinoshita
Ben,

Thank you for your trying.

This attached file is latest version of the my patch.
It is fixed just a little.
(But I have forgotten the detail of the certain problem(why fixed). sorry...)

Regards,
Yasufumi
[29 Aug 2008 4:40] Yasufumi Kinoshita
Some doubtful code are fixed.

Attachment: mysql-5.0.67_split_buf_pool_mutex_fixed_optimistic_safe2.patch.gz (application/x-gzip, text), 7.17 KiB.

[12 Aug 2009 17:14] Inaam Rana
Fixed in plugin 1.0.4.

binaries/source and documentation available at innodb.com