Bug #46459 Innodb limits concurrency too much when XA is used
Submitted: 29 Jul 2009 18:48 Modified: 13 Jan 2010 16:38
Reporter: Mark Callaghan Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.0.44, 5.1. bzr OS:Any
Assigned to: Inaam Rana
Tags: commit, concurrency, Contribution, GROUP, innodb
Triage: Triaged: D3 (Medium)

[29 Jul 2009 18:48] Mark Callaghan
Description:
Innodb limits concurrency too much when XA is used (when the binlog is enabled). This causes big query pileups and is very hard to diagnose.

The problems are:
1) in innobase_xa_prepare, innobase_release_stat_resources is called after prepare_commit_mutex is locked. When there is a bottleneck on prepare_commit_mutex, the waiting threads may block other threads from entering InnoDB because of innodb_thread_concurrency enforcement.

2) the per-table auto increment lock used by InnoDB is to be unlocked at statement end. When autocommit is used, innobase_xa_prepare is called with the autoinc lock still held and innobase_xa_prepare doesn't release it. So this is another bottleneck.

How to repeat:
I don't have an easy reproduction case for problem 1.

For problem 2:
modify innobase_xa_prepare by adding a call to 'sleep(30)' after this code:
                pthread_mutex_lock(&prepare_commit_mutex);
                trx->active_trans = 2;
                sleep(30); // TODO -- add this code for the test

----

To see that the delay in innobase_xa_prepare during commit does not block use of the autoinc lock by the second session ...

Run this in one session:
select now(); begin; insert into ai values (NULL); commit; select now();

And then run this immediately in a second session:
begin; insert into ai values (NULL); select now(); commit; select now();

----

To see that the delay does block use of the autoinc lock, then the SQL for the first session is changed to use autocommit:

Run this in one session:
select now(); insert into ai values (NULL); select now();

And then run this immediately in a second session:
begin; insert into ai values (NULL); select now(); commit; select now();

---

Output from the first testcase

>>> first session

mysql> select now(); begin; insert into ai values (NULL); commit; select now();
+---------------------+
| now()               |
+---------------------+
| 2009-07-29 11:41:24 | 
+---------------------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

Query OK, 0 rows affected (30.00 sec)

+---------------------+
| now()               |
+---------------------+
| 2009-07-29 11:41:54 | 
+---------------------+
1 row in set (0.01 sec)

>>> second session, note that the insert is not delayed
mysql> begin; insert into ai values (NULL); select now(); commit; select now();
Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (0.01 sec)

+---------------------+
| now()               |
+---------------------+
| 2009-07-29 11:41:30 | 
+---------------------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (53.72 sec)

+---------------------+
| now()               |
+---------------------+
| 2009-07-29 11:42:24 | 
+---------------------+
1 row in set (0.00 sec)

---

Output from the second testcase that shows the delay for the insert done by the second session because autocommit is used by the first session

>>> first session
mysql> select now(); insert into ai values (NULL); select now();
+---------------------+
| now()               |
+---------------------+
| 2009-07-29 11:45:07 | 
+---------------------+
1 row in set (0.00 sec)

Query OK, 1 row affected (30.00 sec)

+---------------------+
| now()               |
+---------------------+
| 2009-07-29 11:45:37 | 
+---------------------+

>>> output from the second session
mysql> begin; insert into ai values (NULL); select now(); commit; select now();
Query OK, 0 rows affected (0.00 sec)

Query OK, 1 row affected (27.17 sec)

+---------------------+
| now()               |
+---------------------+
| 2009-07-29 11:45:37 | 
+---------------------+
1 row in set (0.00 sec)

Query OK, 0 rows affected (30.00 sec)

+---------------------+
| now()               |
+---------------------+
| 2009-07-29 11:46:07 | 
+---------------------+

Suggested fix:
1) when autocommit is used, release the autoinc lock on entry to or before entry to innobase_xa_prepare

2) call innobase_release_stat_resources before locking prepare_commit_mutex in innobase_xa_prepare
[29 Jul 2009 19:30] Mark Callaghan
Call to row_unlock_table_autoinc_for_mysql without autocommit:

#0  0x002c98d4 in lock_table_unlock_auto_inc ()
#1  0x0022ce51 in row_unlock_table_autoinc_for_mysql ()
#2  0x00192482 in innobase_commit (thd=0x82ce00, all=false) at ha_innodb.cc:1697
#3  0x001832e2 in ha_commit_one_phase (thd=0x82ce00, all=false) at handler.cc:749
#4  0x001855aa in ha_commit_trans (thd=0x82ce00, all=false) at handler.cc:719
#5  0x00185800 in ha_autocommit_or_rollback (thd=0x82ce00, error=8574836) at handler.cc:858
#6  0x0012f775 in mysql_insert (thd=0x82ce00, table_list=0x857a98, fields=@0x82deb0, values_list=@0x82ded4, update_fields=@0x82dec8, update_values=@0x82debc, duplic=DUP_ERROR, ignore=false) at sql_insert.cc:910
#7  0x000c99c2 in mysql_execute_command (thd=0x82ce00) at sql_parse.cc:3518
#8  0x000ccef9 in mysql_parse (thd=0x82ce00, inBuf=0x857a10 "insert into ai values (NULL)", length=28, found_semicolon=0xb026ad9c) at sql_parse.cc:6068
#9  0x000cd9a6 in dispatch_command (command=COM_STMT_FETCH, thd=0x82ce00, packet=<value temporarily unavailable, due to optimizations>, packet_length=29) at sql_parse.cc:1812
#10 0x000ceef4 in do_command (thd=0x82ce00) at sql_parse.cc:1586
#11 0x000cf6ca in handle_one_connection (arg=0x3) at sql_parse.cc:1197
#12 0x92b43095 in _pthread_start ()
#13 0x92b42f52 in thread_start ()

The unlock is done in the commit code (in this case the commit is fake). The commit code is called after the prepare code so that explains why this is a problem for autocommit inserts.
[31 Jul 2009 10:27] Sveta Smirnova
Thank you for the report.

Case #2 verified as described.

Possible problem in case # 1 confirmed, although can not create test case showing the problem yet.
[31 Jul 2009 22:59] Mark Callaghan
More notes on this problem and a potential fix:
http://www.facebook.com/MySQLatFacebook#/note.php?note_id=116362990932
[31 Jul 2009 23:00] Mark Callaghan
A kludge that fixes this problem -- needs work

Attachment: autoinc.patch (application/octet-stream, text), 1.60 KiB.

[3 Aug 2009 20:11] Mark Callaghan
A better patch with tests generated from 5.0.75

Attachment: p2 (application/octet-stream, text), 3.51 KiB.

[5 Aug 2009 22:06] Mark Callaghan
Yet another patch. There is no benefit from releasing the autoinc lock early. All of the improvement is from calling innobase_release_stat_resources() before locking prepare_commit_mutex. A patch for 5.0.84:

diff --git a/sql/ha_innodb.cc b/sql/ha_innodb.cc
index 7eb6ef0..dc09a43 100644
--- a/sql/ha_innodb.cc
+++ b/sql/ha_innodb.cc
@@ -7253,6 +7253,13 @@ innobase_xa_prepare(
        int error = 0;
         trx_t* trx = check_trx_exists(thd);
 
+       /* Release a possible FIFO ticket and search latch. Since we will
+       reserve the kernel mutex, we have to release the search system latch
+       first to obey the latching order. Do this before blocking on a global
+       mutex. */
+
+       innobase_release_stat_resources(trx);
+
         if (thd->lex->sql_command != SQLCOM_XA_PREPARE) {
 
                 /* For ibbackup to work the order of transactions in binlog
@@ -7285,12 +7292,6 @@ innobase_xa_prepare(
 
         trx->xid=thd->transaction.xid_state.xid;
 
-       /* Release a possible FIFO ticket and search latch. Since we will
-       reserve the kernel mutex, we have to release the search system latch
-       first to obey the latching order. */
-
-       innobase_release_stat_resources(trx);
-
        if (trx->active_trans == 0 && trx->conc_state != TRX_NOT_STARTED) {
 
          sql_print_error("trx->active_trans == 0, but trx->conc_state != "
[12 Aug 2009 17:07] Inaam Rana
Fixed in plugin 1.0.4. with the fix of group commit bug#13669

binaries/source and documentation available at innodb.com
[13 Jan 2010 16:38] Calvin Sun
Set to "Won't fix" given the issue is fixed in the plugin and 5.0 is EOL'd.