Bug #97707 | group replication lost transaction when init by ansible | ||
---|---|---|---|
Submitted: | 20 Nov 2019 12:39 | Modified: | 12 Sep 2023 10:49 |
Reporter: | Haixing Weng (OCA) | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: Group Replication | Severity: | S1 (Critical) |
Version: | all version with MGR | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | group replication; ansible |
[20 Nov 2019 12:39]
Haixing Weng
[20 Nov 2019 12:40]
Haixing Weng
binlog & gtid_executed is Inconsistent in MGR
Attachment: status_in_mgr.txt (text/plain), 11.01 KiB.
[20 Nov 2019 12:49]
Haixing Weng
cat make-group-replication.sh #!/bin/bash #set -o nounset set -o pipefail usage() { echo echo "Usage:" echo " make-group-replication.sh node" echo " node:" echo " primary" echo " secondary" echo echo "Help:" echo " zhengsilong@unionpay.com" echo } basic_single_escape() { echo "$1" | sed 's/\(['"'"'\]\)/\\\1/g' } process_path() { path=$1 tmp_path1=`echo $path | sed -e "s=\(\.\.\/\)=???=g" | sed -e "s=[^?]==g" |sed -e "s=???=\.\.\/=g"` if [[ ! -z $tmp_path1 ]]; then tmp_path1=$(cd $tmp_path1;pwd) tmp_path2=`echo $path | sed -e "s=\(\.\.\/\)==g"` path=$tmp_path1/$tmp_path2 #echo $path fi echo $path } read_config() { scriptdir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" # 脚本目录 cd $scriptdir conf_file="upsql-deploy.cfg" linux_user=`sed '/^linux_user=/!d;s/.*=//' $conf_file` linux_group=`sed '/^linux_group=/!d;s/.*=//' $conf_file` linux_user_pwd=`sed '/^linux_user_pwd=/!d;s/.*=//' $conf_file` tarfile=`sed '/^tar_file=/!d;s/.*=//' $conf_file` # instance_name instance_name=`sed '/^instance_name=/!d;s/.*=//' $conf_file` port=`sed '/^port=/!d;s/.*=//' $conf_file` basedir=`sed '/^base_dir=/!d;s/.*=//' $conf_file` conf_dir=`sed '/^conf_dir=/!d;s/.*=//' $conf_file` sock_dir=`sed '/^sock_dir=/!d;s/.*=//' $conf_file` datadir=`sed '/^data_dir=/!d;s/.*=//' $conf_file` binlogdir=`sed '/^binlog_dir=/!d;s/.*=//' $conf_file` relaylogdir=`sed '/^relaylog_dir=/!d;s/.*=//' $conf_file` redologdir=`sed '/^redolog_dir=/!d;s/.*=//' $conf_file` undologdir=`sed '/^undolog_dir=/!d;s/.*=//' $conf_file` dba_name=`sed '/^dba_name=/!d;s/.*=//' $conf_file` dba_pwd=`sed '/^dba_pwd=/!d;s/.*=//' $conf_file` root_pass=$dba_pwd rpl_user=`sed '/^rpl_user=/!d;s/.*=//' $conf_file` master_user=$rpl_user rpl_pwd=`sed '/^rpl_pwd=/!d;s/.*=//' $conf_file` # master host master_host=`sed '/^master_host=/!d;s/.*=//' $conf_file` # master port master_port=`sed '/^master_port=/!d;s/.*=//' $conf_file` mode=`sed '/^mode=/!d;s/.*=//' $conf_file` decompress=`sed '/^install=/!d;s/.*=//' $conf_file` upsql_create=`sed '/^upsql_create=/!d;s/.*=//' $conf_file` upsql_sec=`sed '/^upsql_sec=/!d;s/.*=//' $conf_file` upsql_repl=`sed '/^upsql_repl=/!d;s/.*=//' $conf_file` } add_mgr_config(){ cat $scriptdir/group-replication.cfg >> $conf_dir/$instance_name.cnf } execute_sql() { source /tmp/mybashrc $basedir/bin/mysql -u$dba_name -p$root_pass -S $sock_dir/$instance_name.sock -BNe "$1" } # need one param if [[ $# -ne 1 ]]; then usage exit 1 fi node=$1 if [[ x$node != x"primary" && x$node != x"secondary" ]]; then echo "Error: need one param of 'primary' or 'secondary'" exit 1 fi # root if [[ $LOGNAME != root ]]; then echo "Please use the root account operation." exit 1 fi read_config add_mgr_config count_rpluser=`execute_sql "select count(*) from mysql.user where user = '$rpl_user'"` if [[ $count_rpluser -eq 0 ]]; then execute_sql "create user $rpl_user; alter user $rpl_user identified by '$rpl_pwd'" execute_sql "grant replication slave on *.* to $rpl_user" fi execute_sql "drop user if exists root@localhost" execute_sql "create user root@localhost; alter user root@localhost identified by '$root_pass'" execute_sql "grant all on *.* to root@localhost with grant option" execute_sql "reset master" execute_sql "change master to master_user = '$rpl_user', master_password = '$rpl_pwd' for channel 'group_replication_recovery'" # user su - $linux_user <<EOF source /tmp/mybashrc $scriptdir/upsqlimgm $instance_name restart -p $root_pass -u $dba_name -c $conf_dir/$instance_name.cnf -b $basedir -s $sock_dir/$instance_name.sock EOF if [[ $node == "primary" ]]; then execute_sql "set global group_replication_bootstrap_group = 1" fi execute_sql "start group_replication" if [[ $node == "primary" ]]; then execute_sql "set global group_replication_bootstrap_group = 0" fi
[21 Nov 2019 1:42]
Haixing Weng
ansible-play-book and config file template
Attachment: upsql.cnf (application/octet-stream, text), 6.57 KiB.
[26 Nov 2019 5:35]
Haixing Weng
Any ideas about this issue?
[23 Dec 2019 6:27]
Haixing Weng
The vital clue to incorrect MGR cluster is that ansible's killing signal, I found that when use ansible to initial MGR, ansible would,somehow , send SIGHUP frequently to mysqld, and if mysqld receive SIGHUP before signal_handler inited compeletly, it will shutdown in abnormal flow.Due to SIGHUP, primary's binlog will endding without a rotate event, I think it lead to an unexpected MGR established.
[16 Jan 2020 0:56]
MySQL Verification Team
Hi, Thanks for the test case, verified!
[17 Jan 2020 1:44]
Haixing Weng
Hi: Could you please tell me something about cause of defect, so that I could add some patches for database kernel or ansible playbook in our production env for temporary ?