Bug #84785 Prevent Large Transactions in Group Replication
Submitted: 2 Feb 2017 3:38 Modified: 9 Jun 2017 9:37
Reporter: Kenny Gryp Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S2 (Serious)
Version:5.7.17 OS:Any
Assigned to: CPU Architecture:Any
Triage: Needs Triage: D5 (Feature request)

[2 Feb 2017 3:38] Kenny Gryp
Description:
You can easily trash a group replication server by generating a large transaction. It will start to use a lot of memory on the node executing the transaction and then later on goes to all nodes.

All writes in the cluster get stuck in the meantime.

How to repeat:
mysql> create table incr (it int auto_increment primary key )engine=innodb;
Query OK, 0 rows affected (0.10 sec)

mysql> insert into incr select null;
Query OK, 1 row affected (0.02 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into incr select null from incr;;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

ERROR: 
No query specified

mysql> insert into incr select null from incr;;
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

ERROR: 
No query specified

mysql> insert into incr select null from incr;;
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0

ERROR: 
No query specified

mysql> insert into incr select null from incr;;
Query OK, 8 rows affected (0.01 sec)
Records: 8  Duplicates: 0  Warnings: 0

ERROR: 
No query specified

mysql> insert into incr select null from incr;;
Query OK, 16 rows affected (0.01 sec)
Records: 16  Duplicates: 0  Warnings: 0

ERROR: 
No query specified

mysql> insert into incr select null from incr;;
Query OK, 32 rows affected (0.01 sec)
Records: 32  Duplicates: 0  Warnings: 0

ERROR: 
No query specified

mysql> insert into incr select null from incr;;
Query OK, 64 rows affected (0.01 sec)
Records: 64  Duplicates: 0  Warnings: 0

ERROR: 
No query specified

mysql> insert into incr select null from incr;;
Query OK, 128 rows affected (0.01 sec)
Records: 128  Duplicates: 0  Warnings: 0

ERROR: 
No query specified

mysql> insert into incr select null from incr;;
Query OK, 256 rows affected (0.01 sec)
Records: 256  Duplicates: 0  Warnings: 0

ERROR: 
No query specified

mysql> insert into incr select null from incr;
Query OK, 512 rows affected (0.01 sec)
Records: 512  Duplicates: 0  Warnings: 0

mysql> insert into incr select null from incr;
Query OK, 1024 rows affected (0.02 sec)
Records: 1024  Duplicates: 0  Warnings: 0

mysql> insert into incr select null from incr;
Query OK, 2048 rows affected (0.03 sec)
Records: 2048  Duplicates: 0  Warnings: 0

mysql> insert into incr select null from incr;
Query OK, 4096 rows affected (0.05 sec)
Records: 4096  Duplicates: 0  Warnings: 0

mysql> insert into incr select null from incr;
Query OK, 8192 rows affected (0.08 sec)
Records: 8192  Duplicates: 0  Warnings: 0

mysql> insert into incr select null from incr;
^[[AQuery OK, 16384 rows affected (0.16 sec)
Records: 16384  Duplicates: 0  Warnings: 0

mysql> insert into incr select null from incr;
^[[A
^[[AQuery OK, 32768 rows affected (0.33 sec)
Records: 32768  Duplicates: 0  Warnings: 0

mysql> insert into incr select null from incr;
Query OK, 65536 rows affected (1.62 sec)
Records: 65536  Duplicates: 0  Warnings: 0

mysql> insert into incr select null from incr;
Query OK, 131072 rows affected (3.73 sec)
Records: 131072  Duplicates: 0  Warnings: 0

mysql> insert into incr select null from incr;
Query OK, 262144 rows affected (5.02 sec)
Records: 262144  Duplicates: 0  Warnings: 0

mysql> insert into incr select null from incr;
Query OK, 524288 rows affected (18.71 sec)
Records: 524288  Duplicates: 0  Warnings: 0

mysql> insert into incr select null from incr;
^Z

waiting for minutes...

---TRANSACTION 14181, ACTIVE (PREPARED) 43 sec
mysql tables in use 1, locked 1
1 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 1
MySQL thread id 36, OS thread handle 140258532763392, query id 281 localhost root query end
insert into heloise values (1000, 'bleh')
---TRANSACTION 14176, ACTIVE (PREPARED) 187 sec
mysql tables in use 2, locked 2
18618 lock struct(s), heap size 2384080, 2100255 row lock(s), undo log entries 1048576
MySQL thread id 30, OS thread handle 140258533029632, query id 265 localhost root query end
insert into incr select null from incr

You can see the second one is the large transaction. it's stuck.

It also blocks writes to another table, basically GR is blocked right now.

mysql> show processlist;
+----+-------------+-----------+--------------+---------+------+--------------------------------------------------------+-------------------------------------------+
| Id | User        | Host      | db           | Command | Time | State                                                  | Info                                      |
+----+-------------+-----------+--------------+---------+------+--------------------------------------------------------+-------------------------------------------+
|  4 | root        | localhost | NULL         | Sleep   | 6283 |                                                        | NULL                                      |
| 12 | system user |           | NULL         | Connect | 6535 | executing                                              | NULL                                      |
| 15 | system user |           | NULL         | Connect | 6535 | Slave has read all relay log; waiting for more updates | NULL                                      |
| 30 | root        | localhost | lefredissimo | Query   |  261 | query end                                              | insert into incr select null from incr    |
| 36 | root        | localhost | lefredissimo | Query   |  118 | query end                                              | insert into heloise values (1000, 'bleh') |
| 37 | root        | localhost | NULL         | Query   |    0 | starting                                               | show processlist                          |
+----+-------------+-----------+--------------+---------+------+--------------------------------------------------------+-------------------------------------------+
6 rows in set (0.80 sec)

[vagrant@gr-1 ~]$ sudo cat /proc/`pidof mysqld`/status
Name:	mysqld
State:	S (sleeping)
Tgid:	2596
Ngid:	0
Pid:	2596
PPid:	1
TracerPid:	0
Uid:	27	27	27	27
Gid:	27	27	27	27
FDSize:	128
Groups:	27 
VmPeak:	 1694600 kB
VmSize:	 1629324 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	  179508 kB
VmRSS:	  119108 kB
RssAnon:	  119108 kB
RssFile:	       0 kB
RssShmem:	       0 kB
VmData:	 1569456 kB
VmStk:	     136 kB
VmExe:	   23288 kB
VmLib:	    5836 kB
VmPTE:	    2012 kB
VmSwap:	  758960 kB
Threads:	37
SigQ:	0/864
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000084007
SigIgn:	0000000000003000
SigCgt:	00000001800006e8
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
CapBnd:	0000001fffffffff
CapAmb:	0000000000000000
Seccomp:	0
Cpus_allowed:	1
Cpus_allowed_list:	0
Mems_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	1356
nonvoluntary_ctxt_switches:	377

Suggested fix:
Feature Request: Quick measure: Add configuration variable to limit the amount of rows a single transaction can change to avoid this from happening!
[2 Feb 2017 3:43] Kenny Gryp
OOM after a while, as expected:

[11609.611675] mysqld invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
[11609.611679] mysqld cpuset=/ mems_allowed=0
[11609.611681] CPU: 0 PID: 4025 Comm: mysqld Tainted: G           OE  ------------   3.10.0-514.2.2.el7.x86_64 #1
[11609.611682] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[11609.611683]  ffff88000a670fb0 0000000063f9302a ffff88000041f990 ffffffff816861cc
[11609.611685]  ffff88000041fa20 ffffffff81681177 ffffffff812ae87b ffff88000bb88e50
[11609.611687]  ffff88000bb88e68 002000d000000206 fffeffff00000000 0000000000000001
[11609.611688] Call Trace:
[11609.611694]  [<ffffffff816861cc>] dump_stack+0x19/0x1b
[11609.611696]  [<ffffffff81681177>] dump_header+0x8e/0x225
[11609.611700]  [<ffffffff812ae87b>] ? cred_has_capability+0x6b/0x120
[11609.611703]  [<ffffffff8113cc03>] ? proc_do_uts_string+0x123/0x130
[11609.611706]  [<ffffffff8118476e>] oom_kill_process+0x24e/0x3c0
[11609.611708]  [<ffffffff81184fa6>] out_of_memory+0x4b6/0x4f0
[11609.611710]  [<ffffffff81681c80>] __alloc_pages_slowpath+0x5d7/0x725
[11609.611713]  [<ffffffff8118b0b5>] __alloc_pages_nodemask+0x405/0x420
[11609.611715]  [<ffffffff811d221a>] alloc_pages_vma+0x9a/0x150
[11609.611718]  [<ffffffff811c300b>] read_swap_cache_async+0xeb/0x160
[11609.611719]  [<ffffffff811c3128>] swapin_readahead+0xa8/0x110
[11609.611721]  [<ffffffff811b138c>] handle_mm_fault+0xb1c/0xfe0
[11609.611724]  [<ffffffff81691c94>] __do_page_fault+0x154/0x450
[11609.611725]  [<ffffffff81691fc5>] do_page_fault+0x35/0x90
[11609.611727]  [<ffffffff8168e288>] page_fault+0x28/0x30
[11609.611728] Mem-Info:
[11609.611731] active_anon:21798 inactive_anon:22172 isolated_anon:0
 active_file:144 inactive_file:1106 isolated_file:0
 unevictable:0 dirty:0 writeback:0 unstable:0
 slab_reclaimable:3524 slab_unreclaimable:5768
 mapped:116 shmem:0 pagetables:1524 bounce:0
 free:707 free_pcp:14 free_cma:0
[11609.611733] Node 0 DMA free:1004kB min:128kB low:160kB high:192kB active_anon:3612kB inactive_anon:4788kB active_file:52kB inactive_file:392kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:44kB shmem:0kB slab_reclaimable:1612kB slab_unreclaimable:1636kB kernel_stack:688kB pagetables:296kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:22611 all_unreclaimable? yes
[11609.611736] lowmem_reserve[]: 0 219 219 219
[11609.611738] Node 0 DMA32 free:1824kB min:1828kB low:2284kB high:2740kB active_anon:83580kB inactive_anon:83900kB active_file:524kB inactive_file:4032kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:245696kB managed:226448kB mlocked:0kB dirty:0kB writeback:0kB mapped:420kB shmem:0kB slab_reclaimable:12484kB slab_unreclaimable:21436kB kernel_stack:1888kB pagetables:5800kB unstable:0kB bounce:0kB free_pcp:56kB local_pcp:56kB free_cma:0kB writeback_tmp:0kB pages_scanned:254622 all_unreclaimable? yes
[11609.611741] lowmem_reserve[]: 0 0 0 0
[11609.611743] Node 0 DMA: 35*4kB (UE) 8*8kB (UE) 46*16kB (UEM) 2*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1004kB
[11609.611748] Node 0 DMA32: 456*4kB (UM) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1824kB
[11609.611753] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[11609.611754] 15057 total pagecache pages
[11609.611755] 13802 pages in swap cache
[11609.611756] Swap cache stats: add 7132227, delete 7118425, find 2152785/3366239
[11609.611756] Free swap  = 1299920kB
[11609.611757] Total swap = 2064380kB
[11609.611758] 65422 pages RAM
[11609.611758] 0 pages HighMem/MovableOnly
[11609.611759] 4833 pages reserved
[11609.611759] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[11609.611765] [  461]     0   461     8276        0      20       92             0 systemd-journal
[11609.611767] [  479]     0   479    48780        0      30      396             0 lvmetad
[11609.611768] [  487]     0   487    11616        1      23      507         -1000 systemd-udevd
[11609.611772] [  592]     0   592    13854        0      26      129         -1000 auditd
[11609.611773] [  611]     0   611     6048        0      15       79             0 systemd-logind
[11609.611775] [  615]    81   615     8207       65      16       93          -900 dbus-daemon
[11609.611776] [  626]   998   626   132068        0      52     1008             0 polkitd
[11609.611778] [  631]   997   631    28962        0      26       98             0 chronyd
[11609.611779] [  645]     0   645    31556        0      20      170             0 crond
[11609.611781] [  652]     0   652    27509        1      10       33             0 agetty
[11609.611782] [  663]     0   663    82862       16      82     6116             0 firewalld
[11609.611783] [  742]     0   742    84613        0      20       86             0 VBoxService
[11609.611785] [  746]     0   746   109415        0      64      510             0 NetworkManager
[11609.611786] [  870]     0   870    28206        1      53     3119             0 dhclient
[11609.611787] [ 1169]     0  1169   138288       84      88     2578             0 tuned
[11609.611789] [ 1172]     0  1172    70837        0      40      220             0 rsyslogd
[11609.611790] [ 1190]     0  1190    20617        0      42      219         -1000 sshd
[11609.611791] [ 1730]     0  1730    22459        1      44      280             0 master
[11609.611793] [ 1855]    89  1855    22502        0      43      275             0 qmgr
[11609.611794] [ 2687]    27  2687   429830    31311     468   169962             0 mysqld
[11609.611795] [ 3987]     0  3987    35208        1      70      308             0 sshd
[11609.611797] [ 3990]  1000  3990    35208        0      68      320             0 sshd
[11609.611798] [ 3991]  1000  3991     2942        0      11       99             0 bash
[11609.611799] [ 4010]  1000  4010     8058        0      20      678             0 mysql
[11609.611800] [ 4065]    89  4065    22485        0      42      272             0 pickup
[11609.611802] [ 4127]     0  4127    31330        0      17       65             0 anacron
[11609.611803] [ 4160]     0  4160    28280        0      14       53             0 run-parts
[11609.611804] [ 4171]     0  4171    28280        0      13       53             0 man-db.cron
[11609.611805] [ 4172]     0  4172    28371        0      13       41             0 awk
[11609.611807] [ 4186]     0  4186     1168        0       7       13             0 mandb
[11609.611808] [ 4189]  1000  4189     7541        0      19      182             0 mysql
[11609.611809] [ 4194]  1000  4194     7541        0      20      183             0 mysql
[11609.611811] Out of memory: Kill process 2687 (mysqld) score 349 or sacrifice child
[11609.612796] Killed process 2687 (mysqld) total-vm:1719320kB, anon-rss:125244kB, file-rss:0kB, shmem-rss:0kB
[7 Feb 2017 12:16] Umesh Shastry
Hello Kenny,

Thank you for the report and feedback!
In my few tests, didn't notice any OOM but memory usage reaching ~75% on small boxes with default settings.

--
root@localhost [test]> insert into incr select null from incr1;
Query OK, 2097152 rows affected (55.90 sec)
Records: 2097152  Duplicates: 0  Warnings: 0

root@localhost [test]> insert into incr select null from incr1;
ERROR 3100 (HY000): Error on observer while running replication hook 'before_commit'.

-
root@localhost [test]> insert into incr select null from incr;
Query OK, 4194304 rows affected (1 min 37.53 sec)
Records: 4194304  Duplicates: 0  Warnings: 0

root@localhost [test]> insert into incr select null from incr;
ERROR 3101 (HY000): Plugin instructed the server to rollback the current transaction.
root@localhost [test]> \q

- In another case noticed

root@localhost [test]> insert into incr select null from incr;
Query OK, 4194304 rows affected (1 min 37.53 sec)
Records: 4194304  Duplicates: 0  Warnings: 0

root@localhost [test]> insert into incr select null from incr;
ERROR 3101 (HY000): Plugin instructed the server to rollback the current transaction.

--

root@localhost [test]> insert into incr select null from incr;
Query OK, 4194304 rows affected (1 min 31.22 sec)
Records: 4194304  Duplicates: 0  Warnings: 0

root@localhost [test]> insert into incr select null from incr;
ERROR 2013 (HY000): Lost connection to MySQL server during query
root@localhost [test]>

In all the occasions, node exited from the cluster.
Verifying this as a feature request to limit the amount of rows a single transaction can change to avoid this from happening. 

Thanks,
Umesh
[9 Jun 2017 9:37] David Moss
Posted by developer:
 
Thank you for your feedback, this has been fixed in upcoming versions and the following was added to the 5.7.19 and 8.0.2 changelogs:
The group_replication_transaction_size_limit option was added to enable you to protect a group against large transactions causing a failure.