| Bug #61545 | commit stalls when mysql_binlog_send does disk read | ||
|---|---|---|---|
| Submitted: | 16 Jun 2011 20:13 | Modified: | 15 Jan 2014 8:30 |
| Reporter: | Mark Callaghan | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S5 (Performance) |
| Version: | 5.1.52 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | Contribution, LOCK_log, replication, stall | ||
[16 Jun 2011 22:11]
Mark Callaghan
And then run PMP to get thread stacks when there is a pileup. But you need a workload that reads binlog files that are not in the OS buffer cache.
[20 Jun 2011 23:42]
Mark Callaghan
I am evaluating a patch that does not lock LOCK_log for inactive log files:
diff --git a/sql/sql_repl.cc b/sql/sql_repl.cc
index fa27757..1cc4bf3 100644
--- a/sql/sql_repl.cc
+++ b/sql/sql_repl.cc
@@ -412,6 +412,10 @@ void mysql_binlog_send(THD* thd, char* log_ident, my_off_t pos,
int left_events = max_binlog_dump_events;
#endif
int old_max_allowed_packet= thd->variables.max_allowed_packet;
+ /* Optimization for bugs.mysql.com 61545 to avoid holding LOCK_log while
+ reading old binlogs */
+ bool is_active = TRUE;
+
DBUG_ENTER("mysql_binlog_send");
DBUG_PRINT("enter",("log_ident: '%s' pos: %ld", log_ident, (long) pos));
@@ -471,6 +475,8 @@ impossible position";
goto err;
}
+ is_active = mysql_bin_log.is_active(log_file_name);
+
/*
We need to start a packet with something other than 255
to distinguish it from error
@@ -535,7 +541,8 @@ impossible position";
Try to find a Format_description_log_event at the beginning of
the binlog
*/
- if (!(error = Log_event::read_log_event(&log, packet, log_lock)))
+ if (!(error = Log_event::read_log_event(&log, packet,
+ is_active ? log_lock : NULL)))
{
/*
The packet has offsets equal to the normal offsets in a binlog
@@ -601,7 +608,8 @@ impossible position";
while (!net->error && net->vio != 0 && !thd->killed)
{
- while (!(error = Log_event::read_log_event(&log, packet, log_lock)))
+ while (!(error = Log_event::read_log_event(&log, packet,
+ is_active ? log_lock: NULL)))
{
#ifndef DBUG_OFF
if (max_binlog_dump_events && !left_events--)
@@ -613,6 +621,15 @@ impossible position";
}
#endif
+ /* If log_file_name is the current binlog, is_active must be set */
+ DBUG_ASSERT(!(mysql_bin_log.is_active(log_file_name) && !is_active));
+ if (mysql_bin_log.is_active(log_file_name) && !is_active)
+ {
+ errmsg = "bad state in mysql_binlog_send";
+ my_errno= ER_UNKNOWN_ERROR;
+ goto err;
+ }
+
if ((*packet)[EVENT_TYPE_OFFSET+1] == FORMAT_DESCRIPTION_EVENT)
{
binlog_can_be_corrupted= test((*packet)[FLAGS_OFFSET+1] &
@@ -845,8 +862,8 @@ impossible position";
&linfo);
goto err;
}
-
DBUG_PRINT("info", ("Binlog filename: new binlog %s", log_file_name));
+ is_active = mysql_bin_log.is_active(log_file_name);
packet->length(0);
packet->append('\0');
[23 Jul 2011 11:00]
Sveta Smirnova
Thank you for the report. Verified as described.
[15 Jan 2014 8:30]
Jon Stephens
This was fixed as part of the refactoring of the replication dump thread that was implemented in 5.7.2.

Description: We get stalls and convoys of many threads waiting to commit with the binlog enabled when another thread holds LOCK_log while doing a disk read of a binlog file. This can happen when a replication slave asks for binlog data that is not in the OS buffer cache. The thread stack for the lock holder is: my_read,_my_b_read,Log_event::read_log_event,mysql_binlog_send,dispatch_command,do_command,handle_one_connection,start_thread,clone How to repeat: Read read_log_event to see that it locks LOCK_log before doing a read. And then run poor-mans-profiler to dump thread stacks during stalls to see many threads waiting on LOCK_log during commit when one thread is in read_log_event. int Log_event::read_log_event(IO_CACHE* file, String* packet, pthread_mutex_t* log_lock) { ulong data_len; int result=0; char buf[LOG_EVENT_MINIMAL_HEADER_LEN]; DBUG_ENTER("Log_event::read_log_event"); if (log_lock) pthread_mutex_lock(log_lock); if (my_b_read(file, (uchar*) buf, sizeof(buf))) { /* And then run