Bug #99638 xa transaction with low performance
Submitted: 20 May 2020 0:53 Modified: 3 Dec 2020 19:31
Reporter: phoenix Zhang (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:8.0.18 OS:Any
Assigned to: CPU Architecture:Any
Tags: xa

[20 May 2020 0:53] phoenix Zhang
Description:
When use sysbench tool do xa performance test, it has a low performance when high concurrency. 

By analysis, the performance neck will be in 

std::this_thread::sleep_for(sleep_time);

in source file storage/innobase/include/clone0clone.h, the function wait of class Clone_Sys. 

static int wait(Clone_Msec sleep_time, Clone_Sec timeout,
                  Clone_Sec alert_interval, Wait_Cond_Cbk_Func &&func,
                  ib_mutex_t *mutex, bool &is_timeout)

The whole call stack will be:

(gdb) bt
#0  Clone_Sys::wait(std::chrono::duration<long, std::ratio<1l, 1000l> >, std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1l> >, std::function<int (bool, bool&)>&&, PolicyMutex<TTASEventMutex<GenericPolicy> >*, bool&) (sleep_time=..., timeout=..., alert_interval=..., func=..., mutex=0x0, is_timeout=@0x7fabb1786cdf: false)
    at storage/innobase/include/clone0clone.h:1040
#1  0x0000562e315d38e0 in Clone_persist_gtid::wait_thread(bool, bool, unsigned long, bool, bool, std::function<int ()>) (this=0x7fad442eb3f0, start=false, wait_flush=true, flush_number=31,
    compress=false, early_timeout=false, cbk=...) at storage/innobase/clone/clone0repl.cc:588
#2  0x0000562e315d3f1b in Clone_persist_gtid::wait_flush(bool, bool, bool, std::function<int ()>) (this=0x7fad442eb3f0, wait=true, compress_gtid=false, early_timeout=false, cbk=...)
    at storage/innobase/clone/clone0repl.cc:660
#3  0x0000562e314a2775 in trx_undo_gtid_flush_prepare (trx=0x7fad4a6fd6b8) at storage/innobase/trx/trx0undo.cc:592
#4  0x0000562e31491758 in trx_commit_for_mysql (trx=0x7fad4a6fd6b8) at storage/innobase/trx/trx0trx.cc:2357
#5  0x0000562e311329a4 in innobase_commit_low (trx=0x7fad4a6fd6b8) at storage/innobase/handler/ha_innodb.cc:5818
#6  0x0000562e311332c8 in innobase_commit (hton=0x562e356ce0d0, thd=0x7fac38023050, commit_trx=true) at storage/innobase/handler/ha_innodb.cc:6063
#7  0x0000562e2f8218e4 in ha_commit_low (thd=0x7fac38023050, all=true, run_after_commit=false)
#8  0x0000562e30c3290e in MYSQL_BIN_LOG::process_commit_stage_queue (this=0x562e344bc420 <mysql_bin_log>, thd=0x7fac38023050, first=0x7fac38023050)
#9  0x0000562e30c346b4 in MYSQL_BIN_LOG::ordered_commit (this=0x562e344bc420 <mysql_bin_log>, thd=0x7fac38023050, all=true, skip_commit=false)
#10 0x0000562e30c3215e in MYSQL_BIN_LOG::commit (this=0x562e344bc420 <mysql_bin_log>, thd=0x7fac38023050, all=true)
#11 0x0000562e2fe14150 in Sql_cmd_xa_commit::process_internal_xa_commit (this=0x7fac380dfee8, thd=0x7fac38023050, xid_state=0x7fac38020320)
#12 0x0000562e2fe1396e in Sql_cmd_xa_commit::trans_xa_commit (this=0x7fac380dfee8, thd=0x7fac38023050)
#13 0x0000562e2fe14339 in Sql_cmd_xa_commit::execute (this=0x7fac380dfee8, thd=0x7fac38023050)
#14 0x0000562e2fc6f5e4 in mysql_execute_command (thd=0x7fac38023050, first_level=true)
#15 0x0000562e2fc7216f in mysql_parse (thd=0x7fac38023050, parser_state=0x7fabb1789b70, update_userstat=false)
#16 0x0000562e2fc66562 in dispatch_command (thd=0x7fac38023050, com_data=0x7fabb178ac40, command=COM_QUERY)
#17 0x0000562e2fc64950 in do_command (thd=0x7fac38023050)

How to repeat:
Test use sysbench 0.5, write new test file name as oltp_xa.test.

MySQL version is 8.0.18, use source file compile, the build command is:
cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo -DBUILD_CONFIG=mysql_release  -DWITH_ZLIB=system -DWITH_BOOST=../boost/

after make & make install, startup the mysql with below my.cnf

[mysqld]
basedir = /xxx
datadir = /xxx/data
socket = /xxx/data/mysql.sock
port = 3333
server-id = 3333
log_bin = mysql-bin
binlog_format = row
enforce_gtid_consistency = 1
gtid_mode = on

innodb_buffer_pool_instances = 4
innodb_buffer_pool_size = 500M
innodb_change_buffering = all
innodb_doublewrite = true
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_io_capacity = 800
innodb_log_buffer_size = 256M
innodb_log_file_size = 500M
innodb_log_files_in_group = 4
innodb_read_io_threads = 16
innodb_thread_concurrency = 64
innodb_write_io_threads = 16

Then run sysbench with below command:
./sysbench --db-driver=mysql --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3333  --mysql-db=test  --table_size=100000 --num-threads=400 --events=0 --max-time=10  --rand-type=uniform --test=./tests/db/oltp_xa.lua prepare

./sysbench --db-driver=mysql --mysql-user=root --mysql-host=127.0.0.1 --mysql-port=3333  --mysql-db=test  --table_size=100000 --num-threads=400 --events=0 --max-time=10  --rand-type=uniform --test=./tests/db/oltp_xa.lua run

The test result is:
OLTP test statistics:
    queries performed:
        read:                            73910
        write:                           22173
        other:                           29564
        total:                           125647
    transactions:                        0      (0.00 per sec.)
    read/write requests:                 96083  (9210.17 per sec.)
    other operations:                    29564  (2833.90 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          10.4323s
    total number of events:              7391
    total time taken by event execution: 4087.0812s
    response time:
         min:                                 35.48ms
         avg:                                552.98ms
         max:                               2214.63ms
         approx.  95 percentile:            1040.59ms

Threads fairness:
    events (avg/stddev):           18.4775/1.06
    execution time (avg/stddev):   10.2177/0.14

BTW, my env basic config is:

Suggested fix:
std::this_thread::sleep_for(sleep_time) for xa will lead the problem, use below patch will improve the performance.

diff --git a/storage/innobase/include/clone0clone.h b/storage/innobase/include/clone0clone.h
index ba3901c1926..74ab57bce0c 100644
--- a/storage/innobase/include/clone0clone.h
+++ b/storage/innobase/include/clone0clone.h
@@ -1060,7 +1060,7 @@ class Clone_Sys {
         mutex_exit(mutex);
       }
 
-      std::this_thread::sleep_for(sleep_time);
+      ut_delay(10);

after this patch, the sysbench test result will improve a lot in my env, the qps from 9210.17 to 25834.32.

Threads started!

OLTP test statistics:
    queries performed:
        read:                            100100
        write:                           30030
        other:                           40040
        total:                           170170
    transactions:                        0      (0.00 per sec.)
    read/write requests:                 130130 (25834.32 per sec.)
    other operations:                    40040  (7949.02 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          5.0371s
    total number of events:              10010
    total time taken by event execution: 1921.8934s
    response time:
         min:                                  9.65ms
         avg:                                192.00ms
         max:                               1717.77ms
         approx.  95 percentile:             718.14ms

Threads fairness:
    events (avg/stddev):           25.0250/2.56
    execution time (avg/stddev):   4.8047/0.32
[20 May 2020 0:54] phoenix Zhang
the sysbench test file.

Attachment: oltp_xa.lua (text/x-lua), 1.52 KiB.

[20 May 2020 1:00] phoenix Zhang
adding the patch as contribution

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
[20 May 2020 1:00] phoenix Zhang
my patch is

Attachment: xa_performance.diff (text/x-patch), 369 bytes.

[20 May 2020 1:19] phoenix Zhang
BTW, my env basic config is:
cpu: 64core
memory: 256G
[20 May 2020 12:42] MySQL Verification Team
Hi Mr. Zhang,

Thank you for your bug report.

However, in order to consider whether this change is acceptable, we need to know exactly how did you run the sysbench.

Thanks in advance.
[20 May 2020 12:51] MySQL Verification Team
One more question:

Have you tried tuning it with:

 --innodb-adaptive-max-sleep-delay=#
[20 May 2020 13:31] MySQL Verification Team
Hi Mr. Zhang,

We have discussed this problem with our Development and it turns out that the change would not be effective.

This is the explanation .....

The callstack involves  Clone_persist_gtid::wait_thread, which calls :
  /* Sleep for 1 millisecond */
  Clone_Msec sleep_time(1);
...
  static_cast<void>(Clone_Sys::wait(sleep_time, time_out, alert_interval,
                                    wait_cond, nullptr, is_timeout));

so the       std::this_thread::sleep_for(sleep_time) request the thread to sleep for 0.001s (1 millisecond).
However, ut_delay(10) does not go to sleep at all, and does "some amount" of spinning, defined to be something like 0.00001s (10 microseconds) on 100 MHz Pentium, so probably much less on modern machines.

Not a bug.

So, there are two differences: duration and (not)releasing the CPU.
[21 May 2020 0:37] phoenix Zhang
Hi.

While use sleep 1 millisecond maybe too expensive for mysql server. When high concurrency(in my sysbench test above, it is 400 threads), the ordered_commit request queue become large, and will wait for much more time than 1ms. And, this will lead low performance of xa transactions. 

If use ut_delay replace the origin std::this_thread::sleep_for, the qps of sysbench have a obvious high improvement (9210.17 to 25834.32). 

So, i still think here should do optimize, maybe not use ut_delay, but it should have some way to reduce the sleep time.
[9 Jun 2020 13:33] MySQL Verification Team
Hi Mr. Zhang,

We have had further discussions and concluded that this would be a good performance improvement request.

Verified as performance improvement bug.

Thanks.
[3 Dec 2020 19:31] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 8.0.23 release, and here's the proposed changelog entry from the documentation team:

Persisting GTID values for XA transactions affected XA transaction
performance. Two GTID values are generated for XA transactions, one for
the prepare stage and another for the commit stage. The first GTID value
is written to the undo log and later overwritten by the second GTID value.
Writing of the second GTID value could only occur after flushing the first
GTID value to the gtid_executed table. Space is now reserved in the undo
log for both XA transaction GTID values. 

Thank you for the bug report.
[4 Dec 2020 12:54] MySQL Verification Team
Thank you, Daniel.