Bug #101779 mysql router memory leak
Submitted: 27 Nov 2020 3:28 Modified: 11 Jan 14:01
Reporter: bin zhang Email Updates:
Status: Need Feedback Impact on me:
None 
Category:MySQL Router Severity:S2 (Serious)
Version:8.0.21 OS:CentOS
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: router memory leak

[27 Nov 2020 3:28] bin zhang
Description:
when i use mysql router with mysql innodb cluster .
mysql router occurred  memory leak  .
mysql router took 1mb more memory about few minutes and never stop until linux kill it.
i did not even do any query .
this happened each time ,not accident

How to repeat:
mysql innodb cluster (one RW,two RO)
mysql router start with --bootstrap
all default options
[27 Nov 2020 7:27] bin zhang
This problem was not found in the old version(8.0.16)
[27 Nov 2020 7:29] bin zhang
mysql router 8.0.22 occured another problem cpu high to 15%-16%
[27 Nov 2020 9:45] MySQL Verification Team
Hi,

I just tested with 8.0.22 and I don't see any leak.
Can you retest with 8.0.22

Thanks
Bogdan
[27 Nov 2020 9:56] bin zhang
Thank you for your reply
I've tested it many times, and it comes up every time .
Is it possible that the problem is caused by different OS,which OS did you test with?
[27 Nov 2020 10:09] MySQL Verification Team
Hi,

testing 8.0.22 mysql server and mysql router binaries from dev.mysql.com
os: centos 7 64bit

what os/binaries are you using?

I don't see neither the memory issue nor cpu issue.

thanks
Bogdan
[17 Dec 2020 11:57] MATTHEW GOTT
I am having a similar experience with increasing memory usage by mysqlrouter process.
We are running 8.0.21 on RHEL 6.10.
Not starting with --bootstrap.

Capturing changes to total process memory using pmap once per minute indicates size of each increase is 64M. 
This occurs approx 2 to 4 times per day on each instance:
- process running since Oct 4 is now 15.3GB
- process running since Nov 8 is now 11.0GB
- process restarted 15:41 GMT Dec 16 was initially 1153300K, now 1284372K ( 2 x 64M increases in about 20 hours).

No link established yet to events in application. The level of activity between this instances does not appear to make much difference. A further instance running with just one client connection to the DB is 7.8GB from Nov 8th.
[28 Dec 2020 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[5 Jan 12:01] MATTHEW GOTT
Memory continued to leak in very linear fashion since 17th December.
Step increases in memory usage are always 64MB.
Tracking this on two servers with significantly different numbers of MySQL clients: 
- one VM running 2 Jboss nodes with connection pool size of 40 each, plus one Java client. Memory increase is every 505 to 506 minutes. Server has 16GB RAM.
- another VM running 8 Jboss nodes with connection pool size of 40 each, plus one Java client. Memory increase is every 546 to 547 minutes. Server has 64GB RAM.
[6 Jan 15:03] MySQL Verification Team
https://bugs.mysql.com/bug.php?id=102165 marked as duplicate of this one.
[7 Jan 11:51] MATTHEW GOTT
Yesterday I shut down all application clients on the first of the two servers previously showing the issue. The 64MB increases in memory usage have continued unchanged, at the same intervals:
6/1/2020 13:41,1349908K
-> application clients shut down at 17:27
6/1/2020 22:06,1415444K
7/1/2020 06:30,1480980K
DEBUG logging level still set on MySQL router. /var/log/mysqlrouter/mysqlrouter.log confirms all routing DEBUG messages ceased after 17:27 yesterday, so definitely no connections made to the DB cluster via this router.
[8 Jan 9:24] MATTHEW GOTT
Fault reproduced from scratch with no application involvement or any db connections made through the router at all:
- mysqlrouter stopped and re-started at 2021-01-07 11:53:03
- total mem usage according to pmap initially 606396KB
- increased to 671932KB at 2021-01-07 between 20:00 and 20:01
- increased to 727468KB at 2021-01-08 between 04:24 and 04:25
[8 Jan 9:28] MATTHEW GOTT
mysqlrouter.conf file in use:

# File automatically generated during MySQL Router bootstrap
[DEFAULT]
name=system
user=mysqlrouter
keyring_path=/var/lib/mysqlrouter/keyring
master_key_path=/etc/mysqlrouter/mysqlrouter.key
connect_timeout=15
read_timeout=30
dynamic_state=/var/lib/mysqlrouter/state.json

[logger]
#level = INFO
level = DEBUG

[metadata_cache:UitCluster]
cluster_type=gr
router_id=2
user=mysql_router2_8snojl2jv0j9
metadata_cluster=UitCluster
ttl=0.5
auth_cache_ttl=-1
auth_cache_refresh_interval=2
use_gr_notifications=0

[routing:UitCluster_rw]
bind_address=0.0.0.0
bind_port=6446
socket=/tmp/mysql.sock
destinations=metadata-cache://UitCluster/?role=PRIMARY
routing_strategy=first-available
protocol=classic

[routing:UitCluster_ro]
bind_address=0.0.0.0
bind_port=6447
socket=/tmp/mysqlro.sock
destinations=metadata-cache://UitCluster/?role=SECONDARY
routing_strategy=round-robin-with-fallback
protocol=classic

[routing:UitCluster_x_rw]
bind_address=0.0.0.0
bind_port=64460
socket=/tmp/mysqlx.sock
destinations=metadata-cache://UitCluster/?role=PRIMARY
routing_strategy=first-available
protocol=x

[routing:UitCluster_x_ro]
bind_address=0.0.0.0
bind_port=64470
socket=/tmp/mysqlxro.sock
destinations=metadata-cache://UitCluster/?role=SECONDARY
routing_strategy=round-robin-with-fallback
protocol=x
[8 Jan 9:42] MATTHEW GOTT
/etc/my.cnf from primary db server:

[mysqld]
server-id=300 # Change as required
character-set-server=utf8
collation-server=utf8_general_ci

binlog-format=ROW
log-slave-updates = 1
gtid-mode = ON
enforce-gtid-consistency = ON
master-info-repository=TABLE
relay-log-info-repository=TABLE
relay-log-recovery=1
sync-master-info=1
slave-parallel-workers=2
binlog-checksum = NONE
master-verify-checksum=1
slave-sql-verify-checksum=1
binlog-rows-query-log_events=1
report-port=3306
port=3306

#REPLICATION
binlog_checksum=NONE
enforce_gtid_consistency=ON
gtid_mode=ON
slave_preserve_commit_order=1
slave-parallel-type=LOGICAL_CLOCK
log_slave_updates=1
transaction_write_set_extraction=XXHASH64

innodb_flush_log_at_trx_commit=1
sync_binlog=1
autocommit=OFF
transaction_isolation=READ-COMMITTED
innodb_file_per_table=1
innodb_file_format=barracuda
innodb_large_prefix=1

socket=/usr/local/mysql/data/zdbkirdccd1.sock # Change as required
datadir=/usr/local/mysql/data/
report-host=zdbkirdccd1 # Change as required

sql_mode=REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE,TRADITIONAL
#sql_mode=ANSI,TRADITIONAL   # Changed following upgrade to 5.7
safe-user-create
symbolic-links = 0

local_infile=0
secure_file_priv=/usr/local/mysql/
log_warnings=2

skip-grant-tables=FALSE
log-raw=OFF

#LOGGING
slow_query_log=ON
slow_query_log_file=/usr/local/mysql/logs/zdbkirdccd1_slow.log # Change as required
general_log=OFF
general_log_file=/usr/local/mysql/logs/zdbkirdccd1_general.log # Change as required
log_error=/usr/local/mysql/logs/zdbkirdccd1_error.log # Change as required

#BINARY LOGS
log-bin=/usr/local/mysql/binlogs/zdbkirdccd1 # Change as required
log-bin-index=/usr/local/mysql/binlogs/zdbkirdccd1.index # Change as required

#TUNING
innodb_buffer_pool_size=10G
innodb_log_file_size=256M

auto_increment_increment = 1
auto_increment_offset = 2
loose_group_replication_allow_local_disjoint_gtids_join = OFF
loose_group_replication_allow_local_lower_version_join = OFF
loose_group_replication_auto_increment_increment = 7
loose_group_replication_bootstrap_group = OFF
loose_group_replication_components_stop_timeout = 31536000
loose_group_replication_compression_threshold = 1000000
loose_group_replication_enforce_update_everywhere_checks = OFF
loose_group_replication_exit_state_action = READ_ONLY
loose_group_replication_flow_control_applier_threshold = 25000
loose_group_replication_flow_control_certifier_threshold = 25000
loose_group_replication_flow_control_mode = QUOTA
loose_group_replication_force_members =
loose_group_replication_group_name = 0be74118-04c9-11eb-9c19-001a4a34ea16
loose_group_replication_group_seeds = zdbkirdccd2:33061,zdbkirdccd3:33061
loose_group_replication_gtid_assignment_block_size = 1000000
loose_group_replication_ip_whitelist = AUTOMATIC
loose_group_replication_local_address = zdbkirdccd1:33061
loose_group_replication_member_weight = 60
loose_group_replication_poll_spin_loops = 0
loose_group_replication_recovery_complete_at = TRANSACTIONS_APPLIED
loose_group_replication_recovery_reconnect_interval = 60
loose_group_replication_recovery_retry_count = 10
loose_group_replication_recovery_ssl_ca =
loose_group_replication_recovery_ssl_capath =
loose_group_replication_recovery_ssl_cert =
loose_group_replication_recovery_ssl_cipher =
loose_group_replication_recovery_ssl_crl =
loose_group_replication_recovery_ssl_crlpath =
loose_group_replication_recovery_ssl_key =
loose_group_replication_recovery_ssl_verify_server_cert = OFF
loose_group_replication_recovery_use_ssl = ON
loose_group_replication_single_primary_mode = ON
loose_group_replication_ssl_mode = REQUIRED
loose_group_replication_start_on_boot = ON
loose_group_replication_transaction_size_limit = 0
loose_group_replication_unreachable_majority_timeout = 0
super_read_only = ON

[client]
port=3306
socket=/usr/local/mysql/data/zdbkirdccd1.sock # Change as required
[8 Jan 9:44] MATTHEW GOTT
Please ask if you need any more details to replicate, or any other logs, evidence etc. from our environment.
[11 Jan 14:01] MySQL Verification Team
Hi,

I'm still not able to reproduce this. Anyhow, due to some other bugs in the router in the 8.0.22 lot of code that is related to where this leak might come from needs to be rewritten so before 8.0.23 is out I'm not sure there's much we can/should do.

thanks