Bug #38501 | hold times on prepare_commit_mutex limit read/write scalability | ||
---|---|---|---|
Submitted: | 31 Jul 2008 17:45 | Modified: | 10 Oct 2008 4:06 |
Reporter: | David Lutz | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S4 (Feature request) |
Version: | 5.0, 5.1, 6.0 | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | Contribution |
[31 Jul 2008 17:45]
David Lutz
[31 Jul 2008 17:50]
Sergei Golubchik
also there's possible to skip the loking of the mutex when this binlog-commit ordering is not required. it could be controlled with a command-line option, that a user can set.
[5 Aug 2008 16:18]
David Lutz
I've done some more testing to determine the full impact of the prepare_commit_mutex, comparing read/write throughput of MySQL 6.0.5 without binlog enabled, with binlog enabled, with binlog enabled and the proposed prepare_commit_mutex improvement applied, and with the acquire and release of prepare_commit_mutex commented out. With the 6.0.5 baseline, throughput without binlog enabled scaled moderately to 48 threads, peaking at roughly 1500 TPS. With binlog enabled, scaling stops at 12 threads, peaking at roughly 545 TPS, or 36% of throughput without binlog. By applying the proposed prepare_commit_mutex improvement, scaling improves somewhat, with very moderate scaling to 32 threads at roughly 690 TPS. This is roughly a 25% improvement, but is still only 46% of throughput without binlog enabled. By commenting out the acquire and release of prepare_commit_mutex, moderate scaling is achieved to 48 threads at roughly 1390 TPS. This is a 250% improvement over the initial binlog throughput, and is roughly 90% of the throughput without binlog. This demonstrates the need to break up the serialization of transactions that is currently enforced by the prepare_commit_mutex lock. The proposal to enabled/disable the lock with an option is tempting, but I think will lead to problems. Someone may enable the feature unnecessarily and take a huge performance hit with no benefit, or worse, disable it when they need it, breaking hot backups. It also leaves a very heavy cost for hot backups, even when configured correctly, and the cost would be easily demonstrated by turning the option on and off. I currently plan to go back to the 5.0 source code to see how things were done there, to see if this presents another option.
[7 Aug 2008 13:07]
Heikki Tuuri
Assigning this as a feature request to Inaam. The bug "MySQL/InnoDB group commit is broken in 5.1" is essentially the same bug report.
[13 Aug 2008 19:00]
David Lutz
I think I may have found a solution that eliminates contention on the prepare_commit_mutex and restores the concurrent commit behavior that was lost in the 4.1 to 5.0 upgrade (see Bug#13669). I have a new prototype that is inspired by the old 4.1 behavior, and it is showing really great performance. The change is to do a write to the redo log without a flush, to secure a place in the redo log that is in the same order as the binlog, then release the prepare_commit_mutex, then flush the redo log. This preserves binlog/redo log ordering while reducing serialization of InnoDB transactions and allowing for concurrent commits, and is similar to the old behavior from MYSQL_LOG::write(Log_event* event_info) in mysql-4.1.22/sql/log.cc. My testing on a multi-core system shows as much as 2.5X throughput increase on a 48 thread, read/write, sysbench OLTP test. At the same time, I see a 75% reduction in disk writes per transaction due to coalescing of concurrent commits. I will upload a diff showing changes to storage/innobase/handler/ha_innodb.cc from mysql-6.0.5-alpha-pb87. This diff includes the previously suggested changes to innobase_xa_prepare(), as well as additional changes to innobase_commit(). Caveats: Includes the same caveats as the patch submitted at [31 Jul 19:45] as well as: In this prototype, the initial call to innobase_commit_low() with trx->flush_log_later == true is made without checking for srv_commit_concurrency > 0 and without acquiring the commit_cond_m lock or commit_cond condition variable. This was done to avoid intermixing acquire and release of prepare_commit_mutex and commit_cond_m, which might lead to deadlock. However, srv_commit_concurrency is checked after prepare_commit_mutex has been released and before performing the flush to disk. If it is necessary to check srv_commit_concurrency before even performing a buffered commit, the prototype could be modified to perform the check before both the buffered commit and the flush to disk.
[13 Aug 2008 19:03]
David Lutz
Diff showing changes to storage/innobase/handler/ha_innodb.cc as described in the comment of [13 Aug 21:00]
Attachment: prepare_commit_mutex.patch3 (application/octet-stream, text), 4.80 KiB.
[20 Aug 2008 22:37]
Inaam Rana
Duplicate of http://bugs.mysql.com/bug.php?id=13669
[10 Oct 2008 4:06]
David Lutz
Please see Bug#13669 for follow up comments and questions about this issue and the proposed fix.
[7 Aug 2009 16:53]
Mark Callaghan
This is a serious performance regression from 4.1. Why was this changed to a feature request?