| Bug #75781 | log lock may not be unlocked if add_logged_gtid failed. | ||
|---|---|---|---|
| Submitted: | 5 Feb 2015 2:49 | Modified: | 26 Mar 2015 9:03 |
| Reporter: | Fangxin Flou (OCA) | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Utilities: Binlog Events | Severity: | S3 (Non-critical) |
| Version: | 5.6.22 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | binlog, GTID | ||
[6 Feb 2015 17:45]
MySQL Verification Team
It is evident from the source code analysis that log_lock is taken before the conditional block and released after it. However, there is a jump to the err tag, which does not have any unlock operation done. That means that under some scenarios, that mutex lock will not be unlocked, which may lead to serious server hanging and deadlocks. Fully verified.
[26 Mar 2015 9:03]
David Moss
Thanks for your feedback. This has been fixed in an upcoming release and the following has been added to the changelog: If an error was encountered while adding a GTID to the received GTID set, the log lock was not being correctly released. This could cause a deadlock.
[23 Jun 2015 15:29]
Laurynas Biveinis
commit b547bd853506fe51bb05febf16cf24b6f051864a
Author: Luis Soares <luis.soares@oracle.com>
Date: Fri Feb 27 10:20:03 2015 +0000
BUG#20492319: LOG LOCK MAY NOT BE UNLOCKED IF ADD_LOGGED_GTID FAILED.
When a Gtid_log_event is received from the master,
and after being queued in the relay_log, its GTID
is saved in the received GTID set. This happens
inside queue_event and is done while holding a lock,
the relay_log lock. The function that is called to
save the GTID can return an error. That is fine
and is being tested properly. However, in the event
of an error there will be a jump to the end of the
queue_event function, but the relay_log mutex ends
up NOT being released.
This patch adds the missing mysql_mutex_unlock call.
[2 Jul 2015 10:17]
Luis Soares
Thanks to Fangxin for the bug report and the suggested fix.

Description: this is the current code in rpl_slave.cc /* write the event to the relay log */ if (likely(rli->relay_log.append_buffer(buf, event_len, mi) == 0)) { mi->set_master_log_pos(mi->get_master_log_pos() + inc_pos); DBUG_PRINT("info", ("master_log_pos: %lu", (ulong) mi->get_master_log_pos())); rli->relay_log.harvest_bytes_written(&rli->log_space_total); if (event_type == GTID_LOG_EVENT) { global_sid_lock->rdlock(); int ret= rli->add_logged_gtid(gtid.sidno, gtid.gno); if (!ret) rli->set_last_retrieved_gtid(gtid); global_sid_lock->unlock(); if (ret != 0) goto err; } } ...... err: if (unlock_data_lock) mysql_mutex_unlock(&mi->data_lock); DBUG_PRINT("info", ("error: %d", error)); if (error) mi->report(ERROR_LEVEL, error, ER(error), (error == ER_SLAVE_RELAY_LOG_WRITE_FAILURE)? "could not queue event from master" : error_msg.ptr()); DBUG_RETURN(error); if add_logged_gtid failed, the log_lock may not be released accord the current logic, I am still checking when add_logged_gtid will failure, I am not whether it is a really bug, or it's designed. Thanks. How to repeat: N/A Suggested fix: diff -rc --exclude='*.orig' --exclude=sql_yacc.cc --exclude=sql_yacc.h mysql-5.6.22/sql/rpl_slave.cc mysql-5.6.22-temp/sql/rpl_slave.cc *** mysql-5.6.22/sql/rpl_slave.cc 2014-11-21 13:39:51.000000000 +0800 --- mysql-5.6.22-temp/sql/rpl_slave.cc 2015-02-05 10:40:49.000000000 +0800 *************** *** 7038,7044 **** --- 7038,7047 ---- rli->set_last_retrieved_gtid(gtid); global_sid_lock->unlock(); if (ret != 0) + { + mysql_mutex_unlock(log_lock); goto err; + } } } else