MySQL Bugs: #70536: can't use LOGICAL_CLOCK if gtid is enabled

Bug #70536	can't use LOGICAL_CLOCK if gtid is enabled
Submitted:	6 Oct 2013 14:05	Modified:	21 Oct 2013 16:15
Reporter:	zhai weixiang (OCA)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.7.2	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:

If gtid_mode and enforce_gtid_consistency were enabled (both on slave and master), then only the first worker thread was choosed to apply events. 

If disabled gtid , then everything is ok.

I followed this blog step by step: http://geek.rohitkalhans.com/2013/09/enhancedMTS-configuration.html

I don't know if this is related to some new config options. So I looked into the related code. 

Since LOGICAL_CLOCK value was stored in the first event of the transaction (GTID_LOG_EVENT if gtid is enabled and QUERY_EVENT if gtid is disabled), So In my opinion, if GTID_LOG_EVENT is checked, then we don't need to check the subsequent QUERY_EVENT("BEGIN")

But in function Mts_submode_logical_clock::assign_group

327   switch (ev->get_type_code())
328   {
329   case QUERY_EVENT:
330     commit_seq_no= static_cast<Query_log_event*>(ev)->commit_seq_no;
331     break;

---here commit_seq_no is -1 because seq no was only stored in GTID_LOG_EVENT, and later this value was checked:

363   if (/* Rewritten event without commit seq_number. */
364       commit_seq_no == SEQ_UNINIT ||
365       /* Not same as last seq number. */
366       commit_seq_no != mts_last_known_commit_parent ||
367       /* First event after a submode switch. */
368       first_event ||
369       /* Require a fresh group to be started. */
370       force_new_group)
371   {
372     mts_last_known_commit_parent= commit_seq_no;
373     worker_seq= 0;

---Here worker_seq was set to 0 and mts_last_known_commit_parent was set to -1. This is the key reason why the first thread was always used because worker_seq  indicate the worker threads choosed by next transaction.

quoted code from Mts_submode_logical_clock::get_least_occupied_worker

578   if (rli->last_assigned_worker)
579     worker= rli->last_assigned_worker;
580   else
581   {
582     if (worker_seq < ws->elements)
583     {
584       worker= *((Slave_worker **)dynamic_array_ptr(ws, worker_seq));
585       worker_seq++;
586     }

 

How to repeat:
easy to repeat

Suggested fix:
A simple fix . 

=== modified file 'sql/log_event.cc'
--- sql/log_event.cc    2013-08-16 19:17:40 +0000
+++ sql/log_event.cc    2013-10-06 14:00:35 +0000
@@ -3021,10 +3021,13 @@
       insert_dynamic(&rli->curr_group_da, (uchar*) &ptr_curr_ev);
       rli->curr_group_seen_begin= true;
       rli->mts_end_group_sets_max_dbs= true;
-      if (schedule_next_event(this, rli))
+      if (!rli->curr_group_seen_gtid)
       {
-        rli->abort_slave= 1;
-        DBUG_RETURN(NULL);
+        if (schedule_next_event(this, rli))
+        {
+          rli->abort_slave= 1;
+          DBUG_RETURN(NULL);
+        }
       }
 
       DBUG_ASSERT(rli->curr_group_da.elements == 2);

correct the Synopsis

Thank you for the report.

Verified as described.

test case for MTR

Attachment: rpl_bug70536.test (application/octet-stream, text), 1.23 KiB.

slave option file, master's should have same options

Attachment: rpl_bug70536-slave.opt (application/octet-stream, text), 74 bytes.

Documented fix in the MySQL 5.7.3 changelog, as follows:

        When GTIDs were used with an intra-schema multi-threaded slave,
        transactions were assigned to the first worker thread only.

Closed.

mysql-server$ bzr log -r 6741
------------------------------------------------------------
revno: 6741
committer: Rohit Kalhans<rohit.kalhans@oracle.com>
branch nick: mysql-trunk
timestamp: Mon 2013-10-21 11:42:47 +0530
message:
  BUG#17590616:CAN'T USE LOGICAL_CLOCK IF GTID IS ENABLED
        
  Problem: When GTID is used with Intra-schema
  multi-threaded slave the transactions are only
  assigned to the first worker, thereby compromizing
  the multi-threaded nature of replication slave.
        
  Background: Master stores the commit parent in the
  first query of a transaction (i.e. BEGIN query) if
  GTID is disabled or in GTID event if GTID is enabled.
  But in the slave we called the scheduler method for
  both the GTID event and the begin event. But since in
  case of GTID the BEGIN event does not have the commit
  parent, the scheduler logic blocks the execution of the
  event until the previous transaction is applied completely.
  This causes the slave coordinator to always assign the event
  to the first worker only.
        
  Fix: We fix this problem by calling the scheduler method for
  the "BEGIN" only when the coordinator has not seen the GTID
  event already. This ensures that it is never called for the
  first event of the transaction when GTID is enabled.