Bug #85371 | Memory leak in multi-source replication when binlog_rows_query_log_events=1 | ||
---|---|---|---|
Submitted: | 8 Mar 2017 10:46 | Modified: | 24 Nov 2017 8:42 |
Reporter: | Przemyslaw Malkowski | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.7.17 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | multi source replication |
[8 Mar 2017 10:46]
Przemyslaw Malkowski
[8 Mar 2017 10:49]
Przemyslaw Malkowski
Forgot to add sysbench "run" test, after the prepare.
[9 Mar 2017 13:10]
MySQL Verification Team
Hello Przemyslaw, Thank you for the report and test case. Verified as described with 5.7.17 build. Thanks, Umesh
[9 Mar 2017 13:11]
MySQL Verification Team
test results
Attachment: 85371.5.7.17.results (application/octet-stream, text), 17.33 KiB.
[12 Apr 2017 6:24]
Wang Wen'an
We have the similar problem in multi-source replication(with two master); And not sure wether "set global binlog_rows_query_log_events = OFF" works(set to OFF now and wait for results) : ( checks in ps by "select * from memory_summary_by_thread_by_event_name order by CURRENT_NUMBER_OF_BYTES_USED desc limit 5\G" after enabled memory monitor; start multi-source replication after restart mysql(RO): *************************** 2. row *************************** THREAD_ID: 18189 EVENT_NAME: memory/sql/Log_event COUNT_ALLOC: 521692 COUNT_FREE: 0 SUM_NUMBER_OF_BYTES_ALLOC: 117988604 SUM_NUMBER_OF_BYTES_FREE: 0 LOW_COUNT_USED: 111993 CURRENT_COUNT_USED: 521692 HIGH_COUNT_USED: 521692 LOW_NUMBER_OF_BYTES_USED: 25286276 CURRENT_NUMBER_OF_BYTES_USED: 117988604 HIGH_NUMBER_OF_BYTES_USED: 117988604 *************************** 3. row *************************** THREAD_ID: 18183 EVENT_NAME: memory/sql/Log_event COUNT_ALLOC: 521426 COUNT_FREE: 0 SUM_NUMBER_OF_BYTES_ALLOC: 117732632 SUM_NUMBER_OF_BYTES_FREE: 0 LOW_COUNT_USED: 111583 CURRENT_COUNT_USED: 521426 HIGH_COUNT_USED: 521426 LOW_NUMBER_OF_BYTES_USED: 25154914 CURRENT_NUMBER_OF_BYTES_USED: 117732632 HIGH_NUMBER_OF_BYTES_USED: 117732632 two hours later: *************************** 1. row *************************** THREAD_ID: 18189 EVENT_NAME: memory/sql/Log_event COUNT_ALLOC: 2297022 COUNT_FREE: 0 SUM_NUMBER_OF_BYTES_ALLOC: 525744164 SUM_NUMBER_OF_BYTES_FREE: 0 LOW_COUNT_USED: 111993 CURRENT_COUNT_USED: 2297022 HIGH_COUNT_USED: 2297022 LOW_NUMBER_OF_BYTES_USED: 25286276 CURRENT_NUMBER_OF_BYTES_USED: 525744164 HIGH_NUMBER_OF_BYTES_USED: 525744164 *************************** 2. row *************************** THREAD_ID: 18183 EVENT_NAME: memory/sql/Log_event COUNT_ALLOC: 2296412 COUNT_FREE: 0 SUM_NUMBER_OF_BYTES_ALLOC: 524600639 SUM_NUMBER_OF_BYTES_FREE: 0 LOW_COUNT_USED: 111583 CURRENT_COUNT_USED: 2296412 HIGH_COUNT_USED: 2296412 LOW_NUMBER_OF_BYTES_USED: 25154914 CURRENT_NUMBER_OF_BYTES_USED: 524600639 HIGH_NUMBER_OF_BYTES_USED: 524600639 thread info in sys.session: mysql> select thd_id,conn_id,user,command,state,current_memory from session where thd_id in (18183,18189)\G *************************** 1. row *************************** thd_id: 18183 conn_id: 18158 user: sql/slave_sql command: Sleep state: Slave has read all relay log; waiting for more updates current_memory: 532.28 MiB *************************** 2. row *************************** thd_id: 18189 conn_id: 18164 user: sql/slave_sql command: Sleep state: Slave has read all relay log; waiting for more updates current_memory: 533.50 MiB 2 rows in set (0.10 sec)
[3 Aug 2017 14:14]
Vlad Lesin
The following analysis is for 5.7: 1) When binlog_rows_query_log_events is enabled, Rows_query_log_event is created, written to binary log on master prior Rows_log_event. 2) Slave is reading this event from relay log and applying it, see Rows_query_log_event::do_apply_event(). When the event is applied the event itself is stored in Relay_log_info::rows_query_ev: int Rows_query_log_event::do_apply_event(Relay_log_info const *rli) { DBUG_ENTER("Rows_query_log_event::do_apply_event"); DBUG_ASSERT(rli->info_thd == thd); /* Set query for writing Rows_query log event into binlog later.*/ thd->set_query(m_rows_query, strlen(m_rows_query)); DBUG_ASSERT(rli->rows_query_ev == NULL); const_cast<Relay_log_info*>(rli)->rows_query_ev= this; /* Tell worker not to free the event */ worker= NULL; DBUG_RETURN(0); } 3) Usually Relay_log_info::rows_query_ev is removed after Rows_log_event execution with the following stack trace: at ./sql/log_event.h:3795 at ./sql/rpl_rli.cc:1777 at ./sql/log_event.cc:11175 at ./sql/rpl_slave.cc:4750 But if we look into Rows_log_event::do_apply_event() the function rows_event_stmt_cleanup() is called at the end: int Rows_log_event::do_apply_event(Relay_log_info const *rli) { ... end: if (get_flags(STMT_END_F)) { if((error= rows_event_stmt_cleanup(rli, thd))) { ... } ... } ... } 4) But when the statement must be skipped because it has been already applied from another channel, Rows_log_event::do_apply_event() exits before rows_event_stmt_cleanup() call. See the following code: int Rows_log_event::do_apply_event(Relay_log_info const *rli) { ... enum_gtid_statement_status state= gtid_pre_statement_checks(thd); if (state == GTID_STATEMENT_EXECUTE) { if (gtid_pre_statement_post_implicit_commit_checks(thd)) state= GTID_STATEMENT_CANCEL; } if (state == GTID_STATEMENT_CANCEL) { uint error= thd->get_stmt_da()->mysql_errno(); DBUG_ASSERT(error != 0); rli->report(ERROR_LEVEL, error, "Error executing row event: '%s'", thd->get_stmt_da()->message_text()); thd->is_slave_error= 1; DBUG_RETURN(-1); } else if (state == GTID_STATEMENT_SKIP) DBUG_RETURN(0); ... end: if (get_flags(STMT_END_F)) { if((error= rows_event_stmt_cleanup(rli, thd))) { ... } ... } ... } So in this case rli->rows_query_ev will not be deleted. In debug build such case will lead to crash due to the corresponding assert in Rows_query_log_event::do_apply_event(): int Rows_query_log_event::do_apply_event(Relay_log_info const *rli) { ... DBUG_ASSERT(rli->rows_query_ev == NULL); ... } But for release build there will be just memory leak. How to fix: delete Rows_query_log_event instance if the next Rows_log_event instance read from relay log must be skipped.
[3 Aug 2017 14:18]
Vlad Lesin
delete Rows_query_log_event instance if the next Rows_log_event instance read from relay log must be skipped (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: lp-1667307.patch (text/x-patch), 6.09 KiB.
[5 Aug 2017 18:16]
Laurynas Biveinis
Bug 85371 fix for 8.0.2 (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: bug85371-8.0.2.patch (application/octet-stream, text), 9.34 KiB.
[24 Nov 2017 8:42]
Margaret Fisher
Posted by developer: Changelog entry added for MySQL 5.7.21 and 8.0.4: In a multi-source replication topology, a memory leak could occur on the slave when binlog_rows_query_log_events was enabled on the master, and a statement already applied from another channel was skipped on the slave. In this situation, the instance of the Rows_query log event stored on the slave was not being deleted. The log event instance is now cleaned up and the memory is freed. Thanks to Vlad Lesin for his contribution to the patch.
[28 Nov 2017 6:30]
Venkatesh Venugopal
Bug#86248 has been marked as a duplicate of this bug.