Bug #84792 Idle GR Cluster: Member CPU hog
Submitted: 2 Feb 2017 12:49 Modified: 3 May 2017 13:18
Reporter: Kenny Gryp Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:5.7.17 OS:Any
Assigned to: Bogdan Kecman CPU Architecture:Any

[2 Feb 2017 12:49] Kenny Gryp
Description:
Idle system (having one thread executing is normal with group replication)

mysql> show processlist;
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+------------------+
| Id | User        | Host      | db   | Command | Time  | State                                                  | Info             |
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+------------------+
|  6 | system user |           | NULL | Connect | 31950 | executing                                              | NULL             |
|  9 | system user |           | NULL | Connect | 31950 | Slave has read all relay log; waiting for more updates | NULL             |
| 18 | root        | localhost | NULL | Query   |     0 | starting                                               | show processlist |
+----+-------------+-----------+------+---------+-------+--------------------------------------------------------+------------------+
3 rows in set (0.00 sec)

mysql> show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name: 
Status: 
=====================================
2017-02-02 12:47:28 0x7fa4147b7700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 59 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 32637 srv_active, 0 srv_shutdown, 0 srv_idle
srv_master_thread log flush and writes: 32635
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 31643
OS WAIT ARRAY INFO: signal count 27569
RW-shared spins 0, rounds 23552, OS waits 11655
RW-excl spins 0, rounds 491978, OS waits 16129
RW-sx spins 607, rounds 18210, OS waits 330
Spin rounds per wait: 23552.00 RW-shared, 491978.00 RW-excl, 30.00 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 318146983
Purge done for trx's n:o < 318146983 undo n:o < 0 state: running but idle
History list length 40
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421818374558208, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421818374557296, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421818374556384, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 1; buffer pool: 0
3725 OS file reads, 63919338 OS file writes, 62191309 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 1802.44 writes/s, 1753.02 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 1 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 0 buffer(s)
Hash table size 34673, node heap has 1 buffer(s)
5252.50 hash searches/s, 0.14 non-hash searches/s
---
LOG
---
Log sequence number 95526973537
Log flushed up to   95526972993
Pages flushed up to 95517680405
Last checkpoint at  95516523644
1 pending log flushes, 0 pending chkp writes
62111633 log i/o's done, 1750.78 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 137428992
Dictionary memory allocated 395368
Buffer pool size   8191
Free buffers       1024
Database pages     7165
Old database pages 2624
Modified db pages  216
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 2245, not young 6658
0.00 youngs/s, 0.00 non-youngs/s
Pages read 3270, created 7529, written 1754926
0.00 reads/s, 0.00 creates/s, 50.27 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 7165, unzip_LRU len: 0
I/O sum[2526]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=4229, Main thread ID=140342754014976, state: sleeping
Number of rows inserted 1048588, updated 62098736, deleted 62098716, read 310493636
0.00 inserts/s, 1750.33 updates/s, 1750.33 deletes/s, 8751.63 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

1 row in set (0.14 sec)

top - 12:41:40 up 12:12,  1 user,  load average: 2.17, 2.21, 2.20
Tasks: 104 total,   1 running,  98 sleeping,   5 stopped,   0 zombie
%Cpu(s):  5.9 us, 78.3 sy,  0.0 ni,  6.7 id,  0.4 wa,  0.0 hi,  8.7 si,  0.0 st
KiB Mem :   242356 total,    12112 free,    99848 used,   130396 buff/cache
KiB Swap:  2064380 total,  1191124 free,   873256 used.   109060 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 4229 mysql     20   0 1653528  48564    612 S 63.5 20.0 351:53.18 mysqld
 4894 root      20   0       0      0      0 S 14.0  0.0   0:13.68 kworker/0:2

CPU Usage is constantly high

How to repeat:
I don't know how to repeat exactly, I noticed this 2 times already (when it drains my laptop battery). I collected this data before, then decided to not post it as I blamed myself, seems to be happening again....

Suggested fix:
save my battery lifespan, be more ecological, help reduce global warming!
[2 Feb 2017 12:50] Kenny Gryp
gdb

Attachment: 84792_gdb.txt (text/plain), 50.39 KiB.

[2 Feb 2017 12:50] Kenny Gryp
perf

Attachment: 84792_perf.txt (text/plain), 3.25 KiB.

[2 Feb 2017 13:11] Kenny Gryp
I was able to trace it down to:

my mysql got killed (OOM, see #84785), and then mysql started again with:

[mysqld]
symbolic-links=0

log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

server_id=2
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
relay-log=gr-2-relay-bin
binlog_checksum=NONE
log_slave_updates=ON
log_bin=binlog
binlog_format=ROW

transaction_write_set_extraction=XXHASH64
group_replication_group_name="da7aba5e-dead-da7a-ba55-da7aba5e57ab"
group_replication_start_on_boot=on
super_read_only=on
group_replication_local_address= "gr-2:24901"
group_replication_group_seeds= "gr-1:24901,gr-2:24901,gr-3:24901"
group_replication_bootstrap_group= off

Note that super_read_only=on breaks group replication from starting. (#84733)
But somehow mysql got stuck.

I do not have the CPU problems every time I start mysql. The node must be in some kind of state before it happens
[6 Mar 2017 12:48] Bogdan Kecman
Hi,

I'm having issues reproducing this!
Can you share your full config?
Are you using bare metal or VM for reproducing this?

all best
Bogdan
[6 Mar 2017 14:21] Kenny Gryp
See my last comment, it contains more information on how to reproduce it and includes my configuration.

I did this test on VMs with low amount of RAM (as you need to get MySQL OOM first to reproduce this)
[6 Mar 2017 14:54] Bogdan Kecman
Hi Kenny,

I hoped there's more and that you "just sent parts of config" :( as I can't reproduce this .. I easily get OOM but mysql starts after it ?! .. tried few times (>5)

Now, I have to ask, what binary are you using? Might be related.
Do you have some statistics of "what % of starts go into high cpu usage"?
Is mysql completely "stuck" (eating cpu, not responding), or is responsive, just eating cpu?

thanks
Bogdan
[7 Apr 2017 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[18 Apr 2017 18:49] Kenny Gryp
$ rpm -qa | grep mysql
mysql-community-common-5.7.17-1.el7.x86_64
mysql-community-libs-compat-5.7.17-1.el7.x86_64
mysql-community-libs-5.7.17-1.el7.x86_64
mysql-community-server-5.7.17-1.el7.x86_64
mysql57-community-release-el7-9.noarch
mysql-community-client-5.7.17-1.el7.x86_64

[vagrant@gr-2 ~]$ mpstat 1
Linux 3.10.0-514.2.2.el7.x86_64 (gr-2) 	04/18/2017 	_x86_64_	(1 CPU)

06:49:12 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:49:13 PM  all    5.56    0.00   76.67    1.11    0.00   12.22    0.00    0.00    0.00    4.44
06:49:14 PM  all    6.59    0.00   79.12    0.00    0.00   10.99    0.00    0.00    0.00    3.30
06:49:15 PM  all    4.55    0.00   80.68    0.00    0.00   10.23    0.00    0.00    0.00    4.55
06:49:16 PM  all    4.21    0.00   81.05    1.05    0.00   10.53    0.00    0.00    0.00    3.16
06:49:17 PM  all    4.35    0.00   78.26    0.00    0.00   15.22    0.00    0.00    0.00    2.17
^C
Average:     all    5.04    0.00   79.17    0.44    0.00   11.84    0.00    0.00    0.00    3.51

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                             
 2778 mysql     20   0 1195028  56592    560 S 64.0 23.4  24:28.72 mysqld                                                                                                              

This is running on a VM.
[3 May 2017 13:18] Bogdan Kecman
Hi,

I still can't reproduce this. Used few different VM's, low RAM, your config.. as I said, OOM is easy to get but I don't get the bug (cpu load and process stuck).

How easily you can reproduce this?