MySQL Bugs: #84332: compression of mysql.gtid_executed has been prevented when super_read

Bug #84332	compression of mysql.gtid_executed has been prevented when super_read_only = 1
Submitted:	25 Dec 2016 3:30	Modified:	7 Jun 2017 3:49
Reporter:	yu zou	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.7.16 5.7.17	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	super_read_only mysql.gtid_executed

Description:
set super_read_only = 1 on slaves(with grid_mode=1), compression of mysql.gtid_executed has been prevented,  and finally cause OOM.

How to repeat:
1 : on slave(with grid_mode=1) :set super_read_only = 1;
2: wait until binlog retation(do not use flush logs)
3: you will see something like this:

>> select * from mysql.gtid_executed;
source_uuid	interval_start	interval_end
3578f86f-b0a1-11e6-b0a2-6c92bf293f74	1	1854357129
3578f86f-b0a1-11e6-b0a2-6c92bf293f74	1856109319	7587598376
3578f86f-b0a1-11e6-b0a2-6c92bf293f74	7587598377	7590158220
3578f86f-b0a1-11e6-b0a2-6c92bf293f74	7590158221	7592717525
3578f86f-b0a1-11e6-b0a2-6c92bf293f74	7592717526	7595273195

4: the memory of mysqld will increase slowly , finally OOM

Suggested fix:
enable compression of mysql.gtid_executed when super_read_only = 1.

with grid_mode=1 ----> with gtid_mode = 1

I'm waiting for my slave to start increasing substantially in memory usage.
In mean time, I've gotten these counts:

mysql> select count(*),now(),version(),@@super_read_only,@@read_only from mysql.gtid_executed;
+----------+---------------------+------------+-------------------+-------------+
| count(*) | now()               | version()  | @@super_read_only | @@read_only |
+----------+---------------------+------------+-------------------+-------------+
|    67636 | 2016-12-24 16:39:12 | 5.7.17-log |                 1 |           1 |
+----------+---------------------+------------+-------------------+-------------+
1 row in set (1.46 sec)

mysql> select count(*),now(),version(),@@super_read_only,@@read_only from mysql.gtid_executed;
+----------+---------------------+------------+-------------------+-------------+
| count(*) | now()               | version()  | @@super_read_only | @@read_only |
+----------+---------------------+------------+-------------------+-------------+
|    69444 | 2016-12-24 16:39:18 | 5.7.17-log |                 1 |           1 |
+----------+---------------------+------------+-------------------+-------------+
1 row in set (3.07 sec)

mysql> select count(*),now(),version(),@@super_read_only,@@read_only from mysql.gtid_executed;
+----------+---------------------+------------+-------------------+-------------+
| count(*) | now()               | version()  | @@super_read_only | @@read_only |
+----------+---------------------+------------+-------------------+-------------+
|    76575 | 2016-12-24 16:39:46 | 5.7.17-log |                 1 |           1 |
+----------+---------------------+------------+-------------------+-------------+
1 row in set (0.22 sec)

I've noticed this on the slave now....

---TRANSACTION 4706006, ACTIVE 253 sec rollback
mysql tables in use 1, locked 1
ROLLING BACK 5798 lock struct(s), heap size 433768, 1257611 row lock(s), undo log entries 22345
MySQL thread id 4, OS thread handle 5164, query id 0 Compressing gtid_executed table

Verifying on 5.7.17.

Though, I'm not seeing a memory leak yet.  But this gtid compression thread taking locks and rolling back is a serious problem.  When table grows larger and it tries to lock entire table row-by-row, innodb buffer pool won't have enough memory for the lock structs.

Another symptom, I'm seeing the slave complain more and more due to this:

[Warning] Slave SQL for channel '': Error in Xid_log_event: Commit could not be completed, 'Lock wait timeout exceeded; try restarting transaction', Error_code: 1205
[Warning] Slave SQL for channel '': Error in Xid_log_event: Commit could not be completed, 'Lock wait timeout exceeded; try restarting transaction', Error_code: 1205

fwiw,  this is fixed in the current source tree of 5.7.
I'll see what change fixed it, as this should be a duplicate of that bug...

In our case, it takes about 3-5 days to OOM, very slowly, it had happened on MySQL5.7.16(5.7.16-log MySQL Community Server (GPL))+centos6.2 & MySQL5.7.17(compile with tcmalloc) +centos 6.4 .

I will suggest we wait for the next version 5.7.18 and test.
I search quite a bit, but wasn't able to find which bugfix fixed this, but I've tested;

mysql> select count(*),now(),version(),@@super_read_only,@@read_only from mysql.gtid_executed;
+----------+---------------------+------------+-------------------+-------------+
| count(*) | now()               | version()  | @@super_read_only | @@read_only |
+----------+---------------------+------------+-------------------+-------------+
|        6 | 2016-12-25 14:41:09 | 5.7.18-log |                 1 |           1 |
+----------+---------------------+------------+-------------------+-------------+
1 row in set (0.00 sec)

mysql> select count(*),now(),version(),@@super_read_only,@@read_only from mysql.gtid_executed;
+----------+---------------------+------------+-------------------+-------------+
| count(*) | now()               | version()  | @@super_read_only | @@read_only |
+----------+---------------------+------------+-------------------+-------------+
|      753 | 2016-12-25 14:41:09 | 5.7.18-log |                 1 |           1 |
+----------+---------------------+------------+-------------------+-------------+
1 row in set (0.00 sec)

mysql> select count(*),now(),version(),@@super_read_only,@@read_only from mysql.gtid_executed;
+----------+---------------------+------------+-------------------+-------------+
| count(*) | now()               | version()  | @@super_read_only | @@read_only |
+----------+---------------------+------------+-------------------+-------------+
|       69 | 2016-12-25 14:41:10 | 5.7.18-log |                 1 |           1 |
+----------+---------------------+------------+-------------------+-------------+
1 row in set (0.00 sec)

we stumbled upon identical case; Running flush-logs before is a good workaround, but sadly there's another bug that prevents running unattended (http://bugs.mysql.com/bug.php?id=84350 ; https://bugs.mysql.com/bug.php?id=84437)

# shutdown thread 
mysqld_4305.pmp:   1 pthread_join(libpthread.so.0),terminate_compress_gtid_table_thread,mysqld_main,__libc_start_main(libc.so.6),_start

# gtid compress thread
mysqld_4305.pmp:   1 fsync(libpthread.so.0),os_file_flush_func,fil_flush,log_write_up_to,trx_commit_low,trx_commit,trx_rollback_to_savepoint_low,trx_rollback_low,trx_rollback_for_mysql,innobase_rollback,ha_rollback_low,MYSQL_BIN_LOG::rollback,ha_rollback_trans,ha_commit_trans,System_table_access::close_table,Gtid_table_access_context::deinit,Gtid_table_persistor::compress_in_single_transaction,Gtid_table_persistor::compress,compress_gtid_table,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)

Another pair of stacks

mysqld_4304.pmp:   1 pthread_join(libpthread.so.0),terminate_compress_gtid_table_thread,mysqld_main,__libc_start_main(libc.so.6),_start
...snip...
mysqld_4304.pmp:   1 cmp_rec_rec_with_match,btr_estimate_number_of_different_key_vals,dict_stats_update_transient_for_index,dict_stats_update_transient,dict_stats_update,row_update_for_mysql_using_upd_graph,row_update_for_mysql,ha_innobase::delete_row,handler::ha_delete_row,Gtid_table_persistor::compress_first_consecutive_range,Gtid_table_persistor::compress_in_single_transaction,Gtid_table_persistor::compress,compress_gtid_table,pfs_spawn_thread,start_thread(libpthread.so.0),clone(libc.so.6)

Post by developer:
==================

This bug is fixed in 5.7.18 with the following bug

"Bug #22857926	ASSERTION `! IS_SET()' AT SQL_ERROR.CC:38 IN READ_ONLY MODE FOR MANY RPL CMDS". 

Note copied from 5.7.18 release notes:
======================================
Replication: The server prevented several
       replication-related administrative statements from
       working if the read_only system variable was enabled.
       (Bug #22857926)

Regards,
Venkatesh.