Bug #83270 Performance degradation when the server has many gtids
Submitted: 5 Oct 2016 14:13 Modified: 11 Apr 2017 7:25
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S5 (Performance)
Version:5.7.15 OS:Any
Assigned to: CPU Architecture:Any
Tags: GTID, performance

[5 Oct 2016 14:13] Daniël van Eeden
Description:
If there gtid_executed list is very long this affects performance.

- flush logs is getting slow and eats more CPU
- this blocks commits.

How to repeat:
Without any new writes:
mysql> flush binary logs;
Query OK, 0 rows affected (1 min 19.48 sec)

From perf top:
  75.07%  mysqld                                   [.] Sid_map::add_node
   3.88%  mysqld                                   [.] Gtid_set::ensure_sidno

Binlog is 3.3M with just a Previous-GTIDs event.

Runtimes: Count: 1000, Min: 0.00762486, Max: 0.01346445, Avg: 0.00826310
Runtimes: Count: 1000, Min: 0.00756955, Max: 0.02338958, Avg: 0.00912160
Runtimes: Count: 1000, Min: 0.00761366, Max: 0.35665417, Avg: 0.00892370
Runtimes: Count: 1000, Min: 0.00765824, Max: 0.01592278, Avg: 0.00831140
<flush logs>
Runtimes: Count: 1000, Min: 0.00770998, Max: 39.81745243, Avg: 0.05046528
Runtimes: Count: 1000, Min: 0.00765586, Max: 0.34519124, Avg: 0.00912425
Runtimes: Count: 1000, Min: 0.00761461, Max: 0.01139593, Avg: 0.00833480
Runtimes: Count: 1000, Min: 0.00766683, Max: 0.01132226, Avg: 0.00836014
Runtimes: Count: 1000, Min: 0.00767994, Max: 0.01914334, Avg: 0.00865974
Runtimes: Count: 1000, Min: 0.00764441, Max: 0.34583831, Avg: 0.00866367
Runtimes: Count: 1000, Min: 0.00741029, Max: 0.02059412, Avg: 0.00839262
Runtimes: Count: 1000, Min: 0.00708032, Max: 0.02272081, Avg: 0.00833124
Runtimes: Count: 1000, Min: 0.00695872, Max: 0.01966143, Avg: 0.00770949
Runtimes: Count: 1000, Min: 0.00693130, Max: 0.01122618, Avg: 0.00752847
Runtimes: Count: 1000, Min: 0.00682974, Max: 0.02170372, Avg: 0.00801383
Runtimes: Count: 1000, Min: 0.00657010, Max: 0.35488176, Avg: 0.00948849
Runtimes: Count: 1000, Min: 0.00635290, Max: 0.01187325, Avg: 0.00696669
<flush logs>
Runtimes: Count: 1000, Min: 0.00677776, Max: 125.19222689, Avg: 0.20068376
Runtimes: Count: 1000, Min: 0.00990081, Max: 0.36910009, Avg: 0.01126712
[5 Oct 2016 14:14] Daniël van Eeden
Script to add many gtids to a server

Attachment: gtidtrasher.py (text/x-python), 678 bytes.

[5 Oct 2016 14:16] Daniël van Eeden
mysql [localhost] {msandbox} ((none)) > show processlist;
+----+----------+-----------------+------+---------+------+----------------+------------------+
| Id | User     | Host            | db   | Command | Time | State          | Info             |
+----+----------+-----------------+------+---------+------+----------------+------------------+
|  2 | msandbox | localhost       | NULL | Query   |   62 | starting       | flush logs       |
|  5 | msandbox | localhost:39436 | NULL | Query   |   62 | closing tables | COMMIT           |
|  6 | msandbox | localhost       | NULL | Query   |    0 | starting       | show processlist |
+----+----------+-----------------+------+---------+------+----------------+------------------+
3 rows in set (0.00 sec)
[5 Oct 2016 14:49] Daniël van Eeden
It now takes 2m+ to restart an empty mysql sandbox which has many gtids.

$ time ./restart
.............................................................................................................. sandbox server started

real	2m9.215s
user	0m0.060s
sys	0m0.132s
[5 Oct 2016 15:14] Daniël van Eeden
Restoring a mysqldump fails unless max_allowed_packet is increased.

Replication fails unless max_allowed_packet is increased:
2016-10-05T15:12:52.479639Z 19 [ERROR] Failed on request_dump() for channel ''
[6 Oct 2016 6:07] MySQL Verification Team
Hello Daniël,

Thank you for the report and test case.
Verified as described with 5.7.15 build.

Thanks,
Umesh
[6 Oct 2016 6:07] MySQL Verification Team
test results

Attachment: 83270_5.7.15.results (application/octet-stream, text), 71.29 KiB.

[6 Oct 2016 6:44] MySQL Verification Team
-- Shutdown

[umshastr@hod03]/export/umesh/server/binaries/GABuilds/mysql-advanced-5.7.15: time bin/mysqladmin -uroot -S /tmp/mysql_ushastry.sock shutdown
[1]+  Done                    bin/mysqld --gtid-mode=ON --log-bin --enforce-gtid-consistency --server-id=1 --basedir=/export/umesh/server/binaries/GABuilds/mysql-advanced-5.7.15 --datadir=/export/umesh/server/binaries/GABuilds/mysql-advanced-5.7.15/83270 --core-file --socket=/tmp/mysql_ushastry.sock --port=3306 --log-error=/export/umesh/server/binaries/GABuilds/mysql-advanced-5.7.15/83270/log.err 2>&1

real    2m5.020s
user    19m45.599s
sys     0m46.736s
[31 Mar 2017 17:01] Paul DuBois
Posted by developer:
 
Noted in 5.7.18, 8.0.1 changelogs.

FLUSH BINARY LOG could become slow with data replicated from many
servers.
[11 Apr 2017 7:25] Daniël van Eeden
commit 85afc80f307d5ade34975cd580584853c082ad8f
Author: Venkatesh Duggirala <venkatesh.duggirala@oracle.com>
Date:   Mon Feb 20 21:45:21 2017 +0530

    Bug#24806259 PERFORMANCE DEGRADATION WHEN THE SERVER HAS MANY GTIDS
    
    Problem: If a server replicated data from many different servers,
    (can happen over the period of time if slave server's master
    is different time to time, or replication chain is big enough
    that the transactions are travelled from many different Masters
    and reached the node), then FLUSH BINARY LOG command takes
    a lot of time. Since FLUSH BINARY LOG takes lock on binary log,
    it locks all the operations on the server which degrades performance
    of the server during this long time.
    Also > restarting of such this server takes many minutes.
         > setting GTID_PURGED with a big GTID list is taking time.
    
    Analysis:
    Flush binary log is calling  Gtid_state::save_gtids_of_last_binlog_into_table
    which prepares logged_gtids_last_binlog to compute the gtids that needs to be
    stored in the gtid_executed table. During this logged_gtids_last_binlog
    prepare operation, server calls
    Gtid_set::add_gtid_set()
       => Sid_map::add_sid()
             => Sid_map::add_node()
    when ever it finds a new UUID to be added to the set.
    When server calls add_node it has to update many internal structures
    to maintain the consistency of the Gtid_state. Some of these structures are
    declared as dynamic arrays (Preallocated_array).
    Eg:
         > Gtid_set::m_intervals # Array that maps SIDNO to SID;
                                 # the element at index N points
                                 # to a Node with SIDNO N-1,
    
         > Sid_map::sidno_to_hash   # Array that maps SIDNO to SID;
                                    # the element at index N points to
                                    # a Node with SIDNO N-1.
    
         > Mutex_cond_array::m_array # Array of read-write locks.
    
    As the name says, these array lengths needs to be adjusted dynamically when ever
    there is a requirement (no space in the array). In the current code,
    add_node is calling Preallocated_array::reserve(sidno)
    which will make the array adjusted to the size of 'sidno'. The mapped sidno for
    a new UUID is just equal to number of elements in the array + 1. Hence these
    arrays are getting incremented by just one more element.
    These dynamic arrays are adjusted in the following way
                  > Allocate new memory required to fit the new array size
                  > copy element by element from old array to new array location.
    Complexity of the algorithm is O(n^2) where n is the number of new UUIDs that
    needs to be added to these structures.
    
    Similar problem is happening when this server is restarted or when
    we are setting GTID_PURGED with a big gtid list. During these two
    operations, server prepares gtid_executed set by reading the gtids from
    gtid_executed table (or from the gtid text in case SET GTID_PURGED command)
    and server calls add_node for every new UUID.
    If the number of unique UUIDs are in thousands, the restart operation can
    take long time.
    
    Fix: Preallocated_array::push_back has logic inside that it adjust
    these dynamic arrays by doubling the current size when ever it finds
    no space. Hence server should not call "reserve" explicitly and
    let the push_back handle the cash of no space in the array.

 sql/rpl_gtid_mutex_cond_array.cc | 4 +---
 sql/rpl_gtid_owned.cc            | 6 ++----
 sql/rpl_gtid_set.cc              | 4 +---
 3 files changed, 4 insertions(+), 10 deletions(-)