Bug #89331 Changing lock/release order in group commit causes a deadlock
Submitted: 20 Jan 0:36 Modified: 14 Feb 17:02
Reporter: Aliaksei Sandryhaila Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:8.0.3 OS:Red Hat
Assigned to: Venkatesh Duggirala CPU Architecture:Any

[20 Jan 0:36] Aliaksei Sandryhaila
Description:
MySQL 8.0 implementation of group commit first releases the previous stage’s lock, then takes the next stage’s lock. We are implementing START TRANSACTION WITH CONSTISTENT INNODB SNAPSHOT functionality in 8.0, and want to change it to first taking the next stage's lock, then releasing the previous stage's lock. The implementation of this feature for 5.6 is here: https://github.com/facebook/mysql-5.6/commit/c2b8ced

However, if the order is reversed, running the "rpl_nogtid" test suite causes a deadlock in the rpl_nogtid.rpl_semi_sync_deadlock test. However, if we run only this test separately or in combination with any other test (but not the entire suite), the tests passes fine. 

Inspecting the core dump shows that one follower is stuck in the commit stage, line 9749 in sql/binlog.cc, while other followers are stuck in the flush stage waiting for it, line 2413 in sql/binlog.cc (the specified file lines are based on 8.0.3 release branch, commit 020025d). It is not clear what causes the deadlock and why it only occurs when the entire test suite is run.

How to repeat:
1) Apply the below patch to MySQL 8.0, commit 020025: https://github.com/mysql/mysql-server/commit/020025d9f277fd3283b8c75024f590857f58fbbd

2) Run the rpl_nogtid test suite

Patch: 

diff --git a/include/mysql/plugin.h b/include/mysql/plugin.h
index 077e4d2..4cb9e19 100644
--- a/include/mysql/plugin.h
+++ b/include/mysql/plugin.h
@@ -644,10 +644,12 @@ void thd_set_kill_status(const MYSQL_THD thd);
   @param thd Use thread connection handle
   @param file_var Pointer to variable that will hold the file name.
   @param pos_var Pointer to variable that will hold the file position.
+  @param gtid_var Pointer to variable that will hold the GTID.
  */
 void thd_binlog_pos(const MYSQL_THD thd,
                     const char **file_var,
-                    unsigned long long *pos_var);
+                    unsigned long long *pos_var,
+                    const char **gtid_var);
 
 /**
   Return the thread id of a user thread
diff --git a/include/mysql/plugin_audit.h.pp b/include/mysql/plugin_audit.h.pp
index d3df255..040b221 100644
--- a/include/mysql/plugin_audit.h.pp
+++ b/include/mysql/plugin_audit.h.pp
@@ -119,7 +119,8 @@ int thd_killed(const void* thd);
 void thd_set_kill_status(const void* thd);
 void thd_binlog_pos(const void* thd,
                     const char **file_var,
-                    unsigned long long *pos_var);
+                    unsigned long long *pos_var,
+                    const char **gtid_var);
 unsigned long thd_get_thread_id(const void* thd);
 void thd_get_xid(const void* thd, MYSQL_XID *xid);
 void *thd_get_ha_data(const void* thd, const struct handlerton *hton);
diff --git a/include/mysql/plugin_auth.h.pp b/include/mysql/plugin_auth.h.pp
index 460fd77..0b7cc0e 100644
--- a/include/mysql/plugin_auth.h.pp
+++ b/include/mysql/plugin_auth.h.pp
@@ -119,7 +119,8 @@ int thd_killed(const void* thd);
 void thd_set_kill_status(const void* thd);
 void thd_binlog_pos(const void* thd,
                     const char **file_var,
-                    unsigned long long *pos_var);
+                    unsigned long long *pos_var,
+                    const char **gtid_var);
 unsigned long thd_get_thread_id(const void* thd);
 void thd_get_xid(const void* thd, MYSQL_XID *xid);
 void *thd_get_ha_data(const void* thd, const struct handlerton *hton);
diff --git a/include/mysql/plugin_ftparser.h.pp b/include/mysql/plugin_ftparser.h.pp
index 036cc6b..6ef4f4b 100644
--- a/include/mysql/plugin_ftparser.h.pp
+++ b/include/mysql/plugin_ftparser.h.pp
@@ -119,7 +119,8 @@ int thd_killed(const void* thd);
 void thd_set_kill_status(const void* thd);
 void thd_binlog_pos(const void* thd,
                     const char **file_var,
-                    unsigned long long *pos_var);
+                    unsigned long long *pos_var,
+                    const char **gtid_var);
 unsigned long thd_get_thread_id(const void* thd);
 void thd_get_xid(const void* thd, MYSQL_XID *xid);
 void *thd_get_ha_data(const void* thd, const struct handlerton *hton);
diff --git a/include/mysql/plugin_keyring.h.pp b/include/mysql/plugin_keyring.h.pp
index 3a5e07d..aec683f 100644
--- a/include/mysql/plugin_keyring.h.pp
+++ b/include/mysql/plugin_keyring.h.pp
@@ -119,7 +119,8 @@ int thd_killed(const void* thd);
 void thd_set_kill_status(const void* thd);
 void thd_binlog_pos(const void* thd,
                     const char **file_var,
-                    unsigned long long *pos_var);
+                    unsigned long long *pos_var,
+                    const char **gtid_var);
 unsigned long thd_get_thread_id(const void* thd);
 void thd_get_xid(const void* thd, MYSQL_XID *xid);
 void *thd_get_ha_data(const void* thd, const struct handlerton *hton);
diff --git a/include/mysql/services.h.pp b/include/mysql/services.h.pp
index 0981927..416e702 100644
--- a/include/mysql/services.h.pp
+++ b/include/mysql/services.h.pp
@@ -655,7 +655,8 @@ int thd_killed(const void* thd);
 void thd_set_kill_status(const void* thd);
 void thd_binlog_pos(const void* thd,
                     const char **file_var,
-                    unsigned long long *pos_var);
+                    unsigned long long *pos_var,
+                    const char **gtid_var);
 unsigned long thd_get_thread_id(const void* thd);
 void thd_get_xid(const void* thd, MYSQL_XID *xid);
 void *thd_get_ha_data(const void* thd, const struct handlerton *hton);
diff --git a/mysql-test/suite/rpl/r/rpl_gtid_innondb_sys_header.result b/mysql-test/suite/rpl/r/rpl_gtid_innondb_sys_header.result
new file mode 100644
index 0000000..3f15b12
--- /dev/null
+++ b/mysql-test/suite/rpl/r/rpl_gtid_innondb_sys_header.result
@@ -0,0 +1,16 @@
+include/master-slave.inc
+Warnings:
+Note	####	Sending passwords in plain text without SSL/TLS is extremely insecure.
+Note	####	Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
+[connection master]
+create table t1 (a int);
+insert into t1 values(1);
+SET GLOBAL debug = '+d,crash_before_writing_xid';
+insert into t1 values(2);
+ERROR HY000: Lost connection to MySQL server during query
+include/rpl_reconnect.inc
+SET GLOBAL debug = ``;
+include/start_slave.inc
+InnoDB: Last MySQL Gtid master_uuid:2
+drop table t1;
+include/rpl_end.inc
diff --git a/mysql-test/suite/rpl/t/rpl_gtid_innondb_sys_header-master.opt b/mysql-test/suite/rpl/t/rpl_gtid_innondb_sys_header-master.opt
new file mode 100644
index 0000000..d828b6c
--- /dev/null
+++ b/mysql-test/suite/rpl/t/rpl_gtid_innondb_sys_header-master.opt
@@ -0,0 +1 @@
+--gtid_mode=ON --enforce_gtid_consistency --log_slave_updates
diff --git a/mysql-test/suite/rpl/t/rpl_gtid_innondb_sys_header-slave.opt b/mysql-test/suite/rpl/t/rpl_gtid_innondb_sys_header-slave.opt
new file mode 100644
index 0000000..d828b6c
--- /dev/null
+++ b/mysql-test/suite/rpl/t/rpl_gtid_innondb_sys_header-slave.opt
@@ -0,0 +1 @@
+--gtid_mode=ON --enforce_gtid_consistency --log_slave_updates
diff --git a/mysql-test/suite/rpl/t/rpl_gtid_innondb_sys_header.test b/mysql-test/suite/rpl/t/rpl_gtid_innondb_sys_header.test
new file mode 100644
index 0000000..68eb897
--- /dev/null
+++ b/mysql-test/suite/rpl/t/rpl_gtid_innondb_sys_header.test
@@ -0,0 +1,35 @@
+source include/master-slave.inc;
+source include/have_debug.inc;
+source include/not_valgrind.inc;
+
+--let $old_debug = `select @@global.debug;`
+
+connection master;
+create table t1 (a int);
+insert into t1 values(1);
+--eval SET GLOBAL debug = '+d,crash_before_writing_xid'
+--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
+--error 2013
+insert into t1 values(2);
+
+--source include/wait_until_disconnected.inc
+--let $rpl_server_number = 1
+--source include/rpl_reconnect.inc
+
+--eval SET GLOBAL debug = `$old_debug`
+
+connection slave;
+disable_warnings;
+source include/start_slave.inc;
+enable_warnings;
+connection master;
+sync_slave_with_master;
+
+connection master;
+--let $master_uuid= query_get_value(select @@server_uuid, @@server_uuid, 1)
+--echo $master_uuid
+--replace_result $master_uuid master_uuid
+--exec grep 'InnoDB: Last MySQL Gtid' $MYSQLTEST_VARDIR/log/mysqld.1.err
+
+drop table t1;
+source include/rpl_end.inc;
diff --git a/sql/binlog.cc b/sql/binlog.cc
index 6b8770a..cdf64df 100644
--- a/sql/binlog.cc
+++ b/sql/binlog.cc
@@ -596,6 +596,8 @@ public:
     cache_log.disk_writes= 0;
     cache_state_map.clear();
     event_counter= 0;
+    m_last_trans_gtid = NULL;
+    last_trans_gtid[0] = 0;
     DBUG_ASSERT(is_binlog_empty());
   }
 
@@ -700,6 +702,11 @@ public:
     return is_binlog_empty() || has_empty_transaction();
   }
 
+  const char* get_last_trans_gtid()
+  {
+    return m_last_trans_gtid;
+  }
+
 protected:
   /*
     This structure should have all cache variables/flags that should be restored
@@ -865,6 +872,9 @@ private:
 
   binlog_cache_data& operator=(const binlog_cache_data& info);
   binlog_cache_data(const binlog_cache_data& info);
+
+  const char *m_last_trans_gtid;
+  char last_trans_gtid[Gtid::MAX_TEXT_LENGTH + 1];
 };
 
 
@@ -1437,10 +1447,15 @@ static int binlog_close_connection(handlerton*, THD *thd)
   DBUG_RETURN(0);
 }
 
-int binlog_cache_data::write_event(THD*, Log_event *ev)
+int binlog_cache_data::write_event(THD *thd, Log_event *ev)
 {
   DBUG_ENTER("binlog_cache_data::write_event");
 
+  global_sid_lock->rdlock();
+  thd->variables.gtid_next.to_string(global_sid_map, last_trans_gtid);
+  global_sid_lock->unlock();
+  m_last_trans_gtid = last_trans_gtid;
+
   if (ev != NULL)
   {
     DBUG_EXECUTE_IF("simulate_disk_full_at_flush_pending",
@@ -9020,6 +9035,16 @@ MYSQL_BIN_LOG::flush_thread_caches(THD *thd)
   binlog_cache_mngr *cache_mngr= thd_get_cache_mngr(thd);
   my_off_t bytes= 0;
   bool wrote_xid= false;
+  
+  // last_gtid_spec is reset after flush
+  const char * last_trans_gtid = NULL;
+  binlog_cache_data *cache_data=
+    cache_mngr->get_binlog_cache_data(true);
+  if (cache_data)
+  {
+    last_trans_gtid = cache_data->get_last_trans_gtid();
+  }
+
   int error= cache_mngr->flush(thd, &bytes, &wrote_xid);
   if (!error && bytes > 0)
   {
@@ -9027,7 +9052,7 @@ MYSQL_BIN_LOG::flush_thread_caches(THD *thd)
       Note that set_trans_pos does not copy the file name. See
       this function documentation for more info.
     */
-    thd->set_trans_pos(log_file_name, my_b_tell(&log_file));
+    thd->set_trans_pos(log_file_name, my_b_tell(&log_file), last_trans_gtid);
     if (wrote_xid)
       inc_prep_xids(thd);
   }
diff --git a/sql/rpl_gtid.h b/sql/rpl_gtid.h
index 52621ca..2f8323e 100644
--- a/sql/rpl_gtid.h
+++ b/sql/rpl_gtid.h
@@ -3743,27 +3743,6 @@ struct Gtid_specification
   }
 };
 
-
-/**
-  Represents a group in the group cache.
-
-  Groups in the group cache are slightly different from other groups,
-  because not all information about them is known.
-
-  Automatic groups are marked as such by setting gno<=0.
-*/
-struct Cached_group
-{
-  /// The gtid for this group.
-  Gtid_specification spec;
-  /**
-    The position of this GTID in the cache, i.e., the total size of
-    all previous groups.
-  */
-  rpl_binlog_pos binlog_offset;
-};
-
-
 /**
   Indicates if a statement should be skipped or not. Used as return
   value from gtid_before_statement.
diff --git a/sql/sql_class.cc b/sql/sql_class.cc
index 3cd65e1..8d8e9c0 100644
--- a/sql/sql_class.cc
+++ b/sql/sql_class.cc
@@ -372,6 +372,7 @@ THD::THD(bool enable_plugins)
    m_trans_end_pos(0),
    m_transaction(new Transaction_ctx()),
    m_attachable_trx(NULL),
+   m_trans_gtid(NULL),
    table_map_for_update(0),
    m_examined_row_count(0),
    m_stage_progress_psi(NULL),
diff --git a/sql/sql_class.h b/sql/sql_class.h
index 0bb0083..d0659ad 100644
--- a/sql/sql_class.h
+++ b/sql/sql_class.h
@@ -1560,6 +1560,7 @@ private:
   const char *m_trans_log_file;
   char *m_trans_fixed_log_file;
   my_off_t m_trans_end_pos;
+  const char *m_trans_gtid;
   /**@}*/
   // NOTE: Ideally those two should be in Protocol,
   // but currently its design doesn't allow that.
@@ -2256,7 +2257,8 @@ public:
      transaction written when committing this transaction.
    */
   /**@{*/
-  void set_trans_pos(const char *file, my_off_t pos)
+  void set_trans_pos(const char *file, my_off_t pos,
+                     const char* last_trans_gtid)
   {
     DBUG_ENTER("THD::set_trans_pos");
     DBUG_ASSERT(((file == 0) && (pos == 0)) || ((file != 0) && (pos != 0)));
@@ -2277,19 +2279,26 @@ public:
     }
 
     m_trans_end_pos= pos;
+
+    m_trans_gtid = last_trans_gtid;
+
     DBUG_PRINT("return", ("m_trans_log_file: %s, m_trans_fixed_log_file: %s, "
-                          "m_trans_end_pos: %llu", m_trans_log_file,
-                          m_trans_fixed_log_file, m_trans_end_pos));
+                          "m_trans_end_pos: %llu m_trans_gtid: %s",
+                          m_trans_log_file, m_trans_fixed_log_file,
+                          m_trans_end_pos, m_trans_gtid));
     DBUG_VOID_RETURN;
   }
 
-  void get_trans_pos(const char **file_var, my_off_t *pos_var) const
+  void get_trans_pos(const char **file_var, my_off_t *pos_var,
+                     const char **gtid_var) const
   {
     DBUG_ENTER("THD::get_trans_pos");
     if (file_var)
       *file_var = m_trans_log_file;
     if (pos_var)
       *pos_var= m_trans_end_pos;
+    if (gtid_var)
+      *gtid_var = m_trans_gtid;
     DBUG_PRINT("return", ("file: %s, pos: %llu",
                           file_var ? *file_var : "<none>",
                           pos_var ? *pos_var : 0));
diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc
index 10f758d..ee580b4 100644
--- a/sql/sql_parse.cc
+++ b/sql/sql_parse.cc
@@ -5290,7 +5290,7 @@ void THD::reset_for_next_command()
 
   thd->commit_error= THD::CE_NONE;
   thd->durability_property= HA_REGULAR_DURABILITY;
-  thd->set_trans_pos(NULL, 0);
+  thd->set_trans_pos(NULL, 0, NULL);
 
   thd->derived_tables_processing= false;
   thd->parsing_system_view= false;
diff --git a/sql/sql_thd_api.cc b/sql/sql_thd_api.cc
index 3cfeee7..3fba7f2 100644
--- a/sql/sql_thd_api.cc
+++ b/sql/sql_thd_api.cc
@@ -359,9 +359,10 @@ ulong get_max_connections()
 extern "C"
 void thd_binlog_pos(const MYSQL_THD thd,
                     const char **file_var,
-                    unsigned long long *pos_var)
+                    unsigned long long *pos_var,
+                    const char **gtid_var)
 {
-  thd->get_trans_pos(file_var, pos_var);
+  thd->get_trans_pos(file_var, pos_var, gtid_var);
 }
 
 
diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index 292f94f..ca7cf54 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -5194,7 +5194,7 @@ innobase_commit(
 			be a NULL pointer. */
 			ulonglong	pos;
 
-			thd_binlog_pos(thd, &trx->mysql_log_file_name, &pos);
+			thd_binlog_pos(thd, &trx->mysql_log_file_name, &pos, &trx->mysql_gtid);
 
 			trx->mysql_log_offset = static_cast<int64_t>(pos);
 
diff --git a/storage/innobase/include/trx0sys.h b/storage/innobase/include/trx0sys.h
index a7428e3..213cdec 100644
--- a/storage/innobase/include/trx0sys.h
+++ b/storage/innobase/include/trx0sys.h
@@ -260,7 +260,8 @@ trx_sys_update_mysql_binlog_offset(
 	int64_t		offset,	/*!< in: position in that log file */
 	ulint		field,	/*!< in: offset of the MySQL log info field in
 				the trx sys header */
-	mtr_t*		mtr);	/*!< in: mtr */
+	mtr_t*		mtr,	/*!< in: mtr */
+	const char*	gtid);	/*!< in: Gtid of the transaction */
 /*****************************************************************//**
 Prints to stderr the MySQL binlog offset info in the trx system header if
 the magic number shows it valid. */
@@ -365,6 +366,8 @@ remains the same. */
 
 /** Maximum length of MySQL binlog file name, in bytes. */
 #define TRX_SYS_MYSQL_LOG_NAME_LEN	512
+/** Maximum length of GTID string */
+#define TRX_SYS_MYSQL_GTID_LEN		57
 /** Contents of TRX_SYS_MYSQL_LOG_MAGIC_N_FLD */
 #define TRX_SYS_MYSQL_LOG_MAGIC_N	873422344
 
@@ -383,6 +386,8 @@ remains the same. */
 						within that file */
 #define TRX_SYS_MYSQL_LOG_NAME		12	/*!< MySQL log file name */
 
+#define TRX_SYS_MYSQL_GTID		(TRX_SYS_MYSQL_LOG_NAME + TRX_SYS_MYSQL_LOG_NAME_LEN)
+
 /** Doublewrite buffer */
 /* @{ */
 /** The offset of the doublewrite buffer header on the trx system header page */
diff --git a/storage/innobase/include/trx0trx.h b/storage/innobase/include/trx0trx.h
index 560bd71..486d728 100644
--- a/storage/innobase/include/trx0trx.h
+++ b/storage/innobase/include/trx0trx.h
@@ -1132,6 +1132,11 @@ struct trx_t {
 					contains a pointer to the latest file
 					name; this is NULL if binlog is not
 					used */
+	const char*	mysql_gtid;
+					/*!< if MySQL binlog is used, this field
+					contains a pointer to the latest GTID;
+					this is NULL if binlog or gtids is not
+					used*/
 	int64_t		mysql_log_offset;
 					/*!< if MySQL binlog is used, this
 					field contains the end offset of the
diff --git a/storage/innobase/trx/trx0sys.cc b/storage/innobase/trx/trx0sys.cc
index 07fc8ef..e8d6a79 100644
--- a/storage/innobase/trx/trx0sys.cc
+++ b/storage/innobase/trx/trx0sys.cc
@@ -134,7 +134,8 @@ trx_sys_update_mysql_binlog_offset(
 	int64_t		offset,	/*!< in: position in that log file */
 	ulint		field,	/*!< in: offset of the MySQL log info field in
 				the trx sys header */
-	mtr_t*		mtr)	/*!< in: mtr */
+	mtr_t*		mtr,	/*!< in: mtr */
+	const char*	gtid)	/*!< in: Gtid of the transaction */
 {
 	trx_sysf_t*	sys_header;
 
@@ -180,6 +181,18 @@ trx_sys_update_mysql_binlog_offset(
 			 + TRX_SYS_MYSQL_LOG_OFFSET_LOW,
 			 (ulint)(offset & 0xFFFFFFFFUL),
 			 MLOG_4BYTES, mtr);
+	if (gtid)
+	{
+		size_t gtid_length = ut_strlen(gtid);
+		if (gtid_length >= TRX_SYS_MYSQL_GTID_LEN) {
+			/* This should not happen */
+			DBUG_ASSERT(0);
+			return;
+		}
+		/* Write Gtid string */
+		mlog_write_string(sys_header + field + TRX_SYS_MYSQL_GTID,
+				  (byte*)gtid, 1 + gtid_length, mtr);
+	}
 }
 
 /*****************************************************************//**
@@ -220,6 +233,10 @@ trx_sys_print_mysql_binlog_offset(void)
 		<< sys_header + TRX_SYS_MYSQL_LOG_INFO
 		+ TRX_SYS_MYSQL_LOG_NAME;
 
+	fprintf(stderr,
+		"InnoDB: Last MySQL Gtid %s\n",
+		sys_header + TRX_SYS_MYSQL_LOG_INFO +
+		TRX_SYS_MYSQL_GTID);
 	mtr_commit(&mtr);
 }
 
diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
index e9c3b61..d6969c9 100644
--- a/storage/innobase/trx/trx0trx.cc
+++ b/storage/innobase/trx/trx0trx.cc
@@ -1698,9 +1698,11 @@ trx_write_serialisation_history(
 		trx_sys_update_mysql_binlog_offset(
 			trx->mysql_log_file_name,
 			trx->mysql_log_offset,
-			TRX_SYS_MYSQL_LOG_INFO, mtr);
+			TRX_SYS_MYSQL_LOG_INFO, mtr,
+			trx->mysql_gtid);
 
 		trx->mysql_log_file_name = NULL;
+		trx->mysql_gtid = NULL;
 	}
 
 	return(serialised);
[22 Jan 6:17] Umesh Shastry
Hello Aliaksei,

Thank you for the report and feedback.
Could you please confirm if the given patch in the bug report is same as that in https://github.com/facebook/mysql-5.6/commit/c2b8ced ? 5.6 link has some changes done in "Stage_manager" which is not seen in the patch provided in "How to repeat". Could you please provide 8.0 patch? Thank you!

Thanks,
Umesh
[22 Jan 13:22] Umesh Shastry
rpl_nogtid suit test results after applying patch on top of 8.0.3 - 020025d9f277fd3283b8c75024f590857f58fbbd

Attachment: 89331_build.results (application/octet-stream, text), 40.51 KiB.

[22 Jan 18:20] Aliaksei Sandryhaila
Umesh,

Thank you for looking into this. I wonder if the failure is due to our testing setup. I will look into this again and get back to you asap.
[23 Jan 5:10] Umesh Shastry
Thank you, Aliaksei.
We will wait for further details.

Regards,
Umesh
[24 Jan 20:05] Aliaksei Sandryhaila
Minimal patch to reproduce the reported issue

Attachment: patch.diff (application/octet-stream, text), 2.39 KiB.

[24 Jan 20:11] Aliaksei Sandryhaila
Hi Umesh,

I still can reproduce the issue. I've uploaded the updated minimal patch to reproduce the issue (see patch.diff). The complete list of steps is:
1. Clone the mysql repo from github, checkout branch 8.0 (8.0.3 RC, last commit: 020025d9)
2. mkdir bld; cd bld; cmake .. -DWITH_DEBUG=1; make
3. ./mysql-test-run --suite=rpl_nogtid # rpl_nogtid.rpl_semi_sync_deadlock passes
4. cd ..; patch -p1 < patch.diff; cd bld
5. ./mysql-test-run --suite=rpl_nogtid # Now rpl_nogtid.rpl_semi_sync_deadlock fails (timeout after 900 seconds due to a deadlock)

Can you apply the new patch and see if you can reproduce the issue?

Thank you.
[24 Jan 20:13] Aliaksei Sandryhaila
Missed one step in the reproduction steps above: ./mysql-test-run should be executed from bld/mysql-test directory.
[25 Jan 4:47] Umesh Shastry
Thank you for the detailed steps, Aliaksei.
I'll reattempt today and get back to you if there is any issue.

Regards,
Umesh
[25 Jan 9:33] Umesh Shastry
Thank you, Aliaksei.
With the latest reduced patch I'm seeing the issue finally.

Regards,
Umesh
[25 Jan 10:14] Umesh Shastry
test results

Attachment: 89331_latest.results (application/octet-stream, text), 417.68 KiB.

[25 Jan 10:24] Umesh Shastry
Hi Aliaksei,

I observed that test case failing even when ran individually as against initially you confirmed in the report. Could you please confirm on this? 

[umshastr@hod03]/export/umesh/server/source/GitSource/mysql-8.0.3/bld/mysql-test:
[umshastr@hod03]/export/umesh/server/source/GitSource/mysql-8.0.3/bld/mysql-test: ./mtr --big-test rpl_nogtid.rpl_semi_sync_deadlock
Logging: /export/umesh/server/source/GitSource/mysql-8.0.3/mysql-test/mysql-test-run.pl  --big-test rpl_nogtid.rpl_semi_sync_deadlock
MySQL Version 8.0.3
Checking supported features...
 - SSL connections supported
 - binaries are debug compiled
Collecting tests...
 - adding combinations for rpl_nogtid
Checking leftover processes...
Removing old var directory...
Creating var directory '/export/umesh/server/source/GitSource/mysql-8.0.3/bld/mysql-test/var'...
Installing system database...
Using parallel: 1

==============================================================================

TEST                                      RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
worker[1] Trying to dump core for [mysqltest - pid: 27973, winpid: 27973]
worker[1] Trying to dump core for [mysqld.1 - pid: 27912, winpid: 27912]
worker[1] Trying to dump core for [mysqld.2 - pid: 27914, winpid: 27914]
rpl_nogtid.rpl_semi_sync_deadlock 'mix'  [ fail ]  timeout after 900 seconds
        Test ended at 2018-01-25 11:10:45

Thanks,
Umesh
[25 Jan 19:08] Aliaksei Sandryhaila
Hi Umesh,

Yes, I have confirmed that the rpl_nogtid.rpl_semi_sync_deadlock is failing after patch.diff is applied, even when it is run individually. I guess this is a good news since it makes the failure deterministic.

We would like to understand why this patch leads to a deadlock.

Thank you.
[29 Jan 15:36] Venkatesh Duggirala
Post by Developer:
==================
Hello Aliaksei

Please find analysis on why the test script is hanging after applying your patch on mysql-8.0.3

Background: 
===========

1) Test script has SET GLOBAL debug= '+d,dump_thread_before_read_event';
So dump thread, in Binlog_sender::read_event(), will wait for signal "continue" before sending any events to Slave.

See code below:

  DBUG_EXECUTE_IF("dump_thread_before_read_event",
                  {
                    const char act[]= "now wait_for signal.continue no_clear_event";
                    DBUG_ASSERT(!debug_sync_set_action(m_thd,
                                                       STRING_WITH_LEN(act)));
                  };);

2) Problem is happening when rpl_semi_sync_master_wait_point=AFTER_SYNC (default value). AFTER_SYNC means that semi synchronous replication waits just after the binary log file is flushed, but before the engine commits, and so guarantees that no other sessions can see the data before replicated to slave.

3) Test script has SET GLOBAL debug= 'd,semi_sync_3-way_deadlock'; and
SET DEBUG_SYNC= "before_rotate_binlog SIGNAL signal.continue".
So FLUSH LOGS command after acquiring lock_log, it supposed to send signal "continue".  (so that dump thread will continue its work).

See code below in MYSQL_BIN_LOG::new_file_impl:

  if (need_lock_log)
    mysql_mutex_lock(&LOCK_log);
  else
    mysql_mutex_assert_owner(&LOCK_log);
  DBUG_EXECUTE_IF("semi_sync_3-way_deadlock",
                  DEBUG_SYNC(current_thd, "before_rotate_binlog"););

Analysis:
=========

Problem can be reproduced (explained) just with dump thread + 4 connections (even though test script has 10 connections + dump thread).

Dump thread:   Waiting for signal "continue" before sending any events to Slave (see point-1 & 3 in background section)

Connection 1: executed insert into t1 values (1) 
              => entered into BGC::commit_stage *after acquiring Lock_commit*                    
              => waiting for acknowledgement from slave before completing the 
                 commit (see point-2 in background section). 
                 But slave cannot send acknowledgement as
                 it did not yet receive the events from dump thread.

Connection 2: executed insert into t2 values (2) 
             => finished BGC::sync_stage 
             => change_stage to BGC::commit_stage
             => Because of the patch holding 'Lock_sync'
                and waiting for "Lock_commit"
                ( which is held by connection 1)

Connection-3: executed insert into t3 values (3) 
             => finished BGC::flush_stage
             => change_stage to BGC::sync_stage.
             => Because of the patch, holding "Lock_log"
                and waiting for "Lock_sync"
                (which is held by connection 2).

Connection-4:  executed FLUSH LOGS 
               => waiting for Lock_log 
               => can send "signal.continue" only after
                  acquiring Lock_log 
                  (which is held by connection-3)
               (see point-3 in background section)

As you can see dump thread is waiting for signal that should come from connection-4's FLUSH LOGs.  This forms full cyclic deadlock.

> connection 1 is waiting for dump thread
> connection 2 is waiting for connection 1
> connection 3 is waiting for connection 2
> connection 4 is waiting for connection 3
> dump_thread  is waiting for connection 4.

Fix: The test script should be adjust according to your lock reorder patch.

Hope the analysis clear. Please let us know if you have any questions on the same.

Disclaimer: Please note that we have *only* analyzed why the patch is causing the rpl_semi_sync_deadlock.test to be hang. We have not analyzed whether
the patch has any side effects in other areas of replication code.

Regards,
Venkatesh.
[2 Feb 22:26] Aliaksei Sandryhaila
Hi Venkatesh,

Thank you for the in-depth analysis of the deadlock!

Following your analysis, I've experimented with the test and noticed that the deadlock can be reproduced even with 3 connections (by setting $n= 4) in mysql-test/extra/rpl_tests/rpl_semi_sync_deadlock.test.

However, your deadlock scenario requires 4 connections. Can you please explain why the deadlock occurs with 3 connections?

Thank you.
[3 Feb 2:40] Venkatesh Duggirala
Post by Developer
=================

Hello Aliaksei

If you talk in terms of $n, even in my analysis $n=4 only.
Please see that in my analysis, I have '3' insert queries.
And in the '4' th connection 'FLUSH LOGS' is executed which is needed to explain the analysis. That's why I have mentioned that we need '4' connections.

Please observe that in the test script $n is to create connection1...connection$(n-1) and execute $(n-1) insert queries + there will be one connection 'master1' that executes 'FLUSH LOGS'. 

So when you are setting $n=4 , they are '3' insert connections + flush logs query from "master1" which is exactly same as my analysis.

Hope I am clear, please let me know if you have any other questions on my analysis.

Regards,
Venkatesh.
[3 Feb 5:14] Aliaksei Sandryhaila
Venkatesh,

Thank you for the clarification! Indeed, I missed that we create one more connection to flush logs.

We will investigate how to adjust our patch and/or the tests to make them pass.
[5 Feb 5:45] Umesh Shastry
Hello Aliaksei

Since this turned out to be an expected behavior(!bg), do you agree to close the bug for now? Please confirm!

Thanks,
Umesh
[5 Feb 19:02] Aliaksei Sandryhaila
Modified patch to avoid deadlock

Attachment: modified_patch.txt (text/plain), 2.97 KiB.

[5 Feb 19:10] Aliaksei Sandryhaila
Hi Venkatesh,

I've slightly modified our patch to avoid the cyclic deadlock (see modified_patch.txt).Specifically, I modified sql/binlog.cc to execute line

DBUG_EXECUTE_IF("semi_sync_3-way_deadlock", DEBUG_SYNC(current_thd, "before_rotate_binlog"););

in new_file_impl() to execute before acquiring LOCK_log.

This modification should have resolved the cyclic deadlock because it should unblock the dump thread. However, the deadlock still occurs. Can you please clarify why this happens?

Thank you.
[13 Feb 9:04] Venkatesh Duggirala
Post by Developer:
==================
Hello Aliaksei

For your patch to work, the correct place to put DEBUG_SYNC point is 
inside MYSQL_BIN_LOG::rotate_and_purge function and *not the place
you have specified in* modified_patch.diff.

Below is the diff that solves the problem. But please note that *to me*
it looks like this change is going to change the main intention
of the test script purpose. Please analyze the same before
you do the changes

The test script is written to find out if there is a deadlock occurs
when a FLUSH LOGs hold LOCK_log and wait for sessions to commit.

You can see from the commit log of mysql-test/extra/rpl_tests/rpl_semi_sync_deadlock.test, 
the purpose of the test. 

Copy pasting the text below for quick reference
=======================================================================================

    * Deadlock on semi-synchronous replication
      WL#6355 introduces a feature that semi-synchronous waits for ACK from
      slave when the transaction's events is written to disc but before the
      transaction is committed.
    
      It introduced a three way deadlock.
    
        SESSION 1(Rotate)              SESSION 2(Trx Commit) DUMP THREAD
        ---------                      ---------             -----------
      1                                Flush Stage
                                       Signal update
      2 Rotate(hold LOCK_log)
        WAITING for Session2 to commit
      3                                Sync Stage            WAITING for Session1
                                                             to release LOCK_log
      4                                semisync
                                       WAITING for ACK
    
      Session1 had to hold LOCK_log to block other sessions appending new events.
      It also blocked dump threads reading events.
=======================================================================================

In any case, please find the below diff that shows where exactly you have to put
"before_rotate_binlog" sync point.

@@ -7026,6 +7035,8 @@ int MYSQL_BIN_LOG::rotate_and_purge(THD *thd, bool force_rotate) {
   ha_binlog_wait(thd);
 
   DBUG_ASSERT(!is_relay_log);
+  DBUG_EXECUTE_IF("semi_sync_3-way_deadlock",
+                  DEBUG_SYNC(current_thd, "before_rotate_binlog"););
   mysql_mutex_lock(&LOCK_log);

Please let me know if you have any more questions on the same.

Regards,
Venkatesh.
[14 Feb 17:01] Aliaksei Sandryhaila
Venkatesh,

Thank you for the final clarification!
[14 Feb 17:02] Aliaksei Sandryhaila
The reason for the deadlock has been verified.