Bug #81375 Assert `m_status == DA_ERROR' in Diagnostics_area::mysql_errno:sql_error.h:385
Submitted: 11 May 2016 6:46 Modified: 21 Sep 2016 17:10
Reporter: Narendra Chauhan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: XA transactions Severity:S2 (Serious)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any

[11 May 2016 6:46] Narendra Chauhan
Description:
Scenario:- In a GR setup of two members M1(bootstrap) and M2. If XA is trying to prepare a conflicting transaction it should hit error and rollback xa txn.
Expected Result:- Report error and rollback duplicate xa txn.
Actual Result:- ASSERTION FAILURE.

Steps followed:-
=================
In a GROUP of two members (M1 and M2):-
M1> change master to master_user='root' for channel 'group_replication_recovery'; start group_replication;
M1> use test;
M1> create table t1 (a int primary key);
M1> xa start 'xa1'; insert into t1 values (1); xa end 'xa1';

M2> change master to master_user='root' for channel 'group_replication_recovery'; start group_replication;
M2> use test;
M2> xa start 'xa2'; insert into t1 values (1); xa end 'xa2';
M2> xa prepare 'xa2';

M1> xa prepare 'xa1';  ## Assertion failure here

mysqld.1.err:-
==============
2016-05-11T06:27:04.421888Z 0 [Note] Plugin group_replication reported: '[XCOM_BINDING_DEBUG] ::xcom_receive_data():: Delivered message to client handler= 1799399780'
2016-05-11T06:27:04.485804Z 3 [Note] InnoDB: Forced rollback : MySQL thread id 3, OS thread handle 139966765750016, query id 50 localhost root starting
xa prepare 'xa1'
mysqld: /Narendra/mysql_work/git_repo/mysql-5.7-wl9053/sql/sql_error.h:385: uint Diagnostics_area::mysql_errno() const: Assertion `m_status == DA_ERROR' failed.
06:27:04 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=3
max_threads=151
thread_count=4
connection_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 61086 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f4c240009e0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f4c8d5a4dd0 thread_stack 0x40000
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(my_print_stacktrace+0x35)[0x1866704]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(handle_fatal_signal+0x405)[0xea04cc]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f4cb4362cb0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f4cb37b90d5]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7f4cb37bc83b]
/lib/x86_64-linux-gnu/libc.so.6(+0x2ed9e)[0x7f4cb37b1d9e]
/lib/x86_64-linux-gnu/libc.so.6(+0x2ee42)[0x7f4cb37b1e42]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(_ZNK16Diagnostics_area11mysql_errnoEv+0x3a)[0xea2ec2]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(_ZN9XID_STATE9set_errorEP3THD+0x32)[0x163ab50]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(_Z15ha_rollback_lowP3THDb+0x17f)[0xf191a7]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(_ZN13MYSQL_BIN_LOG8rollbackEP3THDb+0x295)[0x17c7019]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(_Z17ha_rollback_transP3THDb+0x135)[0xf19314]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(_Z10ha_prepareP3THD+0x1a4)[0xf18110]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(_ZN18Sql_cmd_xa_prepare16trans_xa_prepareEP3THD+0xca)[0x163a338]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(_ZN18Sql_cmd_xa_prepare7executeEP3THD+0x23)[0x163a493]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(_Z21mysql_execute_commandP3THDb+0x6e67)[0x1525214]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x543)[0x15273a4]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(_Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command+0xc2e)[0x151be7d]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(_Z10do_commandP3THD+0x4b9)[0x151ad55]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(handle_connection+0x1f2)[0x1652aab]
/Narendra/mysql_work/git_repo/mysql-5.7-wl9053/BIN/INSTALL/bin/mysqld(pfs_spawn_thread+0x170)[0x1892d95]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f4cb435ae9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f4cb387638d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f4c24007de0): xa prepare 'xa1'
Connection ID (thread ID): 3
Status: NOT_KILLED

I'm attaching detailed gdb output with this bug page.

How to repeat:
Details:-
==========
GR Version: 0.8.0
MySQL Version: 5.7.14

Steps to repro:-
=================
1. ./mtr group_replication.group_replication_bug.test --start
2. ./mysql -uroot -S/tmp/mysqld1.sock --prompt='M1> '
M1> change master to master_user='root' for channel 'group_replication_recovery'; start group_replication;
M1> use test;
M1> create table t1 (a int primary key);
M1> xa start 'xa1'; insert into t1 values (1); xa end 'xa1';
3. ./mysql -uroot -S/tmp/mysqld2.sock --prompt='M2> '
M2> change master to master_user='root' for channel 'group_replication_recovery'; start group_replication;
M2> use test;
M2> xa start 'xa2'; insert into t1 values (1); xa end 'xa2';
M2> xa prepare 'xa2';

4. Assertion failure here
M1> xa prepare 'xa1';
ERROR 2013 (HY000): Lost connection to MySQL server during query
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysqld1.sock' (111)
ERROR: 
Can't connect to the server

ERROR 2006 (HY000): MySQL server has gone away

Where,
$ cat group_replication_bug.cnf
!include ../my.cnf

[mysqld]
rpl_stop_slave_timeout=30
lock_wait_timeout=30
innodb_lock_wait_timeout=10
report-host=localhost
log-bin
enforce-gtid-consistency=ON
log-slave-updates=ON
gtid-mode=on
binlog-format=row
transaction-write-set-extraction=MURMUR32
binlog-checksum=NONE
master-info-repository=TABLE
relay_log_info_repository=TABLE
plugin-load=group_replication.so
plugin-dir=lib/plugin
relay-log-recovery=on
slave_parallel_workers=0
group_replication_components_stop_timeout = 30
group_replication_recovery_reconnect_interval = 2
group_replication_recovery_retry_count = 3
group_replication_allow_local_lower_version_join = OFF
group_replication_auto_increment_increment = 7
group_replication_group_name = 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa'
group_replication_recovery_complete_at = 'TRANSACTIONS_CERTIFIED'
group_replication_start_on_boot = OFF
group_replication_peer_addresses="127.0.0.1:11300,127.0.0.1:11301,127.0.0.1:11302"

[mysqld.1]
socket=/tmp/mysqld1.sock
group_replication_local_address="127.0.0.1:11300"
group_replication_bootstrap_group=1

[mysqld.2]
socket=/tmp/mysqld2.sock
group_replication_local_address="127.0.0.1:11301"

[mysqld.3]
socket=/tmp/mysqld3.sock
group_replication_local_address="127.0.0.1:11302"

[ENV]
SERVER_MYPORT_3=@mysqld.3.port
SERVER_MYSOCK_3='/tmp/mysqld3.sock'

And,
$ cat group_replication_bug.test
--source include/have_group_replication_plugin.inc
--let $group_replication_group_name= aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa

--disable_abort_on_error
--echo
--connection server1
--echo [SERVER1]
--echo ## Start MEMBER#1
change master to master_user='root' for channel 'group_replication_recovery';
START GROUP_REPLICATION;
--let $group_replication_member_state= ONLINE
--source ../inc/group_replication_wait_for_member_state.inc

--echo
--connection server2
--echo [SERVER2]
--echo ## Start MEMBER#2
change master to master_user='root' for channel 'group_replication_recovery';
START GROUP_REPLICATION;
--let $group_replication_member_state= ONLINE
--source ../inc/group_replication_wait_for_member_state.inc

--echo
--connection server1
--echo [SERVER1]
USE test;
CREATE TABLE t1 (a INT PRIMARY KEY);
--echo
--echo == XA trxn: xa1 ==
XA START 'xa1';
INSERT INTO t1 VALUES (1);
XA END 'xa1';

--echo
--connection server2
--echo [SERVER2]
--echo == XA conflicting trxn: xa2 ==
USE test;
XA START 'xa2';
INSERT INTO t1 VALUES (1);
XA END 'xa2';
--echo
XA PREPARE 'xa2';

--connection server1
--echo [SERVER1]
--echo
--echo == ISSUE HERE: THIS SHOULD ROLLBACK. BUT IT ASSERT FAILED ==
XA PREPARE 'xa1'
SHOW BINLOG EVENTS;

--enable_abort_on_error

--die "OK TO STOP HERE"

Note* Not able to repro issue through MTR testcase. But, consistently able to repro it on the client, using MTR --start option (with above mentioned steps in 'Steps to repro').
[21 Sep 2016 17:10] Paul Dubois
Posted by developer:
 
Noted in 8.0.1 changelog.

For debug builds, failure to prepare a branch of an XA transaction
could lead to a server exit.
[21 Feb 2017 16:12] Paul Dubois
Noted in 5.7.17 changelog.