Bug #70327 Assersion error when setting future binlog file/pos with semisync
Submitted: 13 Sep 2013 8:05 Modified: 3 Sep 2014 15:03
Reporter: Yoshinori Matsunobu (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.6.13 OS:Any
Assigned to: CPU Architecture:Any

[13 Sep 2013 8:05] Yoshinori Matsunobu
Description:
See how to repeat.

How to repeat:
1. Setup one master and two slaves. Install semisync master plugin and slave plugin on all instances, and configure semisync replication.

2. Run any DML queries. I used mysqlslap like below.
mysqlslap --concurrency=100 --iterations=1 --engine=innodb --auto-generate-sql --auto-generate-sql-load-type=write --number-of-queries=1000000  --host=master_host

3. While running #2, stop one of the slaves and set ahead binlog name/pos and start slave. In the below case, master's file should be mysql-bin.000001.
mysql --host=slave2 -e "stop slave; change master to master_log_file='mysql-bin.000002', master_log_pos=4; start slave;"

Then mysqld gets assertion failure.
------
mysqld: /export/home/pb2/build/sb_0-9714019-1373474833.36/mysql-5.6.13/plugin/semisync/semisync_master.cc:764: int ReplSemiSyncMaster::commitTrx(const char*, my_off_t): Assertion `!active_tranxs_->is_tranx_end_pos(trx_wait_binlog_name, trx_wait_binlog_pos)' failed.
07:55:43 UTC - mysqld got signal 6 ;
------
[13 Sep 2013 8:54] MySQL Verification Team
Verified as described.

----------------------------------------------------------

On the slave;

mysql> > stop slave; change master to master_log_file='mysql-bin.000002'; start slave;
Query OK, 0 rows affected (0.06 sec)

Query OK, 0 rows affected (0.02 sec)

Query OK, 0 rows affected (0.00 sec)

Then the server crashes with signal 6;

This is because active_tranxs_ doesn't include the future binlog position. (The same problem will happen if a very old position is specified.)

----------------------------------------------------------
2013-09-13 17:45:49 22182 [Note] Stop asynchronous binlog_dump to slave (server_id: 101)
2013-09-13 17:49:06 22182 [Note] Start semi-sync binlog_dump to slave (server_id: 101), pos(mysql-bin.000002, 20179840)
2013-09-13 17:49:06 22182 [Note] Stop semi-sync binlog_dump to slave (server_id: 101)
2013-09-13 17:49:06 22182 [Note] Stop semi-sync binlog_dump to slave (server_id: 101)
mysqld: /export/home/pb2/build/sb_0-9714019-1373474833.36/mysql-5.6.13/plugin/semisync/semisync_master.cc:764: int ReplSemiSyncMaster::commitTrx(const char*, my_off_t): Assertion `!active_tranxs_->is_tranx_end_pos(trx_wait_binlog_name, trx_wait_binlog_pos)' failed.
08:49:06 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
[3 Sep 2014 15:03] David Moss
The following was added to the 5.6.21 and 5.7.5 changelog with commit 4127:
When using semisynchronous replication, if the binary log position was changed to a future position on a slave then an assertion error was generated on the master. This fix ensures that in such a situation the future position is correctly acknowledged and an error is instead generated on the slave.
[3 Sep 2014 15:04] David Moss
Posted by developer:
 
The following was added to the 5.6.21 and 5.7.5 changelog with commit 4127:
When using semisynchronous replication, if the binary log position was changed to a future position on a slave then an assertion error was generated on the master. This fix ensures that in such a situation the future position is correctly acknowledged and an error is instead generated on the slave.
[25 Sep 2014 11:36] Laurynas Biveinis
revno: 6055
committer: Sujatha Sivakumar <sujatha.sivakumar@oracle.com>
branch nick: Bug17453826_mysql-5.6
timestamp: Wed 2014-07-02 11:34:11 +0530
message:
  Bug#17453826:ASSERTION ERROR WHEN SETTING FUTURE BINLOG
  FILE/POS WITH SEMISYNC
  
  Problem:
  ========
  When DMLs are in progress on the master stopping a slave and
  setting ahead binlog name/pos will cause an assert on the
  master.
  
  Analysis:
  ========
  Trx1 is waiting for ack on certain position meanwhile change
  master command is issued from the slave to master with a
  future log file name and position. Upon receipt master's
  dump thread assumes that it has received ack till the future
  log file and position and clears the active transaction list
  and sets the reply_file_name_ and reply_file_pos_ to future
  file name and pos. since reply file name and position are
  ahead of Trx1's current position Trx1 completes
  successfully. Execute a new DML Trx2. During commit Trx2
  waits for an ack on some position but the reply file name
  and position are already pointing to future file name and
  position so this current transaction thinks that it has
  already got an ack and no need to wait and proceeds further.
  But the active transaction has not actually received any ack
  the transaction node is not cleared from active transaction
  list. During the exit of commit operation there is an assert
  which expects active transaction node to be cleared from the
  transaction list which fails this leads to an assert.
  
  Fix:
  ===
  Ideally when a request for change master is received the
  request file name and position should be validated for
  correctness and only after the validation the transmit_start
  hook should be fired.  Hence moved the transmit_start hook
  position to appropriate place where it is placed post the
  validation of file name and position. Post fix when a future
  log file name and position is specified
  ER_MASTER_FATAL_ERROR_READING_BINLOG error will be returned
  to slave.
  
  Note: During fixing this bug one more bug was identified.
  
  Rpl_semi_sync_master_clients this parameter can become
  negative at certain conditions causing UNINSTALL PLUGIN
  rpl_semi_sync_master to fail.
  
  'Rpl_semi_sync_master_clients' client is incremented when
  dump thread initiates transmission and when a new slave
  connection is established. This value is decremented when
  dump thread exits and slave gets disconnected. On a certain
  condition if IO thread stops due to an error and dump thread
  exited before "transmit_start" hook dump thread will still
  try to call "transmit_stop" and this will cause
  Rpl_semi_sync_master_clients= -1 which is invalid state.
  Hence as part of the fix a new flag is introduced such that
  "transmit_stop" will be called only when "transmit_start" is
  called.
[25 Sep 2014 11:43] Laurynas Biveinis
revno: 6072
committer: Sujatha Sivakumar <sujatha.sivakumar@oracle.com>
branch nick: Bug17453826_mysql-5.6
timestamp: Fri 2014-07-11 11:20:54 +0530
message:
  Bug#17453826:ASSERTION ERROR WHEN SETTING FUTURE BINLOG
  FILE/POS WITH SEMISYNC
  
  Fixing a post push test issue.
[30 Sep 2014 13:47] David Moss
Posted by developer:
 
Reopened by merge, no new changelog entries needed. Set back to Closed.