MySQL Bugs: #80241: Starting group replication causes MySQL to crash with sig 6 with selinux

Bug #80241	Starting group replication causes MySQL to crash with sig 6 with selinux
Submitted:	2 Feb 2016 19:35	Modified:	2 Aug 2016 19:31
Reporter:	Ben Stillman	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Group Replication	Severity:	S3 (Non-critical)
Version:	0.7.0-labs	OS:	CentOS (el7-x86-64bit)
Assigned to:		CPU Architecture:	Any
Tags:	group replication, selinux

Description:
When "START GROUP_REPLICATION;" is issued, the server crashes with sig 6 if selinux is enabled and enforcing (not configured for xcom port). The error message in the log says port is in use, which it is not. Disable selinux or configure selinux to allow the xcom port and group replication starts fine. Using el7-x86-64bit version on CentOS 7. 

To configure selinux to allow MySQL to use the xcom port (change 3309 to your port):
/usr/sbin/semanage port -a -t mysqld_port_t -p tcp 3309

[XCOM_BINDING_DEBUG] ::initialize_xcom():: Configuring Xcom group: XCom Group ID=1038572691 Name=e04c5d29-c9d8-11e5-b48f-08002784681c
[XCOM_BINDING_DEBUG] ::initialize_peer_nodes():: Configured Peer Nodes: 192.168.56.102:3309
[XCOM_BINDING_DEBUG] ::initialize_peer_nodes():: Configured Peer Nodes: 192.168.56.103:3309
[XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Total number of peers: 2
[XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Local Node: 192.168.56.101:3309
[XCOM_BINDING_DEBUG] ::initialize_xcom():: Configured Bootstrap: true
2016-02-02T19:06:20.883351Z 4 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2016-02-02T19:06:20.902030Z 6 [Note] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log './localhost-relay-bin-group_replication_applier.000001' position: 4
2016-02-02T19:06:20.902533Z 3 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
2016-02-02T19:06:20.902723Z 3 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7'
2016-02-02T19:06:20.902730Z 3 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 1'
[XCOM BINDING DEBUG] ::join()
Unable to announce tcp port 3309. Port already in use?mysqld: /export/home2/pb2/build/sb_0-17587881-1452683313.64/build/BUILD/mysql-server/plugin/group_replication/gcs/src/bindings/xcom/xcom/task.c:735: add_fd: Assertion `fd >= 0' failed.
connecting to 192.168.56.101 3309

state 0 action xa_init

state 3525 action xa_terminate

new state x_start

19:06:20 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=2
max_threads=151
thread_count=4
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68185 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...

stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xeb6b5b]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x788f91]
/lib64/libpthread.so.0(+0xf100)[0x7faf10b80100]
/lib64/libc.so.6(gsignal+0x37)[0x7faf0f5745f7]
/lib64/libc.so.6(abort+0x148)[0x7faf0f575ce8]
/lib64/libc.so.6(+0x2e566)[0x7faf0f56d566]
/lib64/libc.so.6(+0x2e612)[0x7faf0f56d612]
/usr/lib64/mysql/plugin/group_replication.so(accept_tcp+0x302)[0x7faec1d18752]
/usr/lib64/mysql/plugin/group_replication.so(tcp_server+0x76)[0x7faec1d262c6]
/usr/lib64/mysql/plugin/group_replication.so(task_loop+0x56)[0x7faec1d17a16]
/usr/lib64/mysql/plugin/group_replication.so(xcom_taskmain2+0x67)[0x7faec1d1eda7]
/usr/lib64/mysql/plugin/group_replication.so(_ZN19Gcs_xcom_proxy_impl9xcom_initEi+0x20)[0x7faec1d10310]
/usr/lib64/mysql/plugin/group_replication.so(+0x7af24)[0x7faec1d28f24]
/lib64/libpthread.so.0(+0x7dc5)[0x7faf10b78dc5]
/lib64/libc.so.6(clone+0x6d)[0x7faf0f63521d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

How to repeat:
Start group replication on el7 with selinux enforcing and not configured for xcom. 

Suggested fix:
Don't crash, return a more relevant/appropriate error message.

I am facing the same crash on 
START GROUP_REPLICATION; 
command
even SELINUX is disabled.

Hi Team,

I too facing same issue. Same bug experienced with Oracle Linux 7.
Even after disabling SELINUX, the issue remains the same.

Thanks,
Naresh

Marked Bug #82275 (Crash in accept_tcp) as a duplicate

Hi All,

Thank you for trying out Group Replication and providing helpful feedback! 

I was not able to repeat the problem with the new 0.8 Beta release (available now on labs.mysql.com).

I'm using Oracle Linux 7 x86_64 with UEK4:
bash# uname -a
Linux hanode3 3.10.0-327.el7.x86_64 #1 SMP Fri Nov 20 00:18:34 PST 2015 x86_64 x86_64 x86_64 GNU/Linux

bash# cat /etc/os-release 
NAME="Oracle Linux Server"
VERSION="7.2"
ID="ol"
VERSION_ID="7.2"
PRETTY_NAME="Oracle Linux Server 7.2"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:7:2:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://bugzilla.oracle.com/"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 7"
ORACLE_BUGZILLA_PRODUCT_VERSION=7.2
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=7.2

SELinux is enabled with a targeting policy in force. There is also currently an active mysql module:
bash# sestatus 
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28

bash# getenforce 
Enforcing

bash# semodule -l | grep mysql
mysql	1.14.1	

I am able to have the MySQL GCS (XCom) bind to the port that I've added to the selinux policy--6606--fine:
semanage port -l | grep mysql
mysqld_port_t                  tcp      6606, 1186, 3306, 63132-63164
mysqlmanagerd_port_t           tcp      2273

When I tried to use 3309, however, it failed as expected (no crash).
...
2016-07-27T17:53:46.104869Z 0 [Note] Plugin group_replication reported: 'state 0 action xa_init'
2016-07-27T17:53:46.115957Z 0 [ERROR] Plugin group_replication reported: 'Unable to announce tcp port 3309. Port already in use?'
2016-07-27T17:53:46.116162Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error joining the group while waiting for the network layer to become ready.'
2016-07-27T17:53:46.116177Z 0 [Note] Plugin group_replication reported: 'state 4115 action xa_exit'
2016-07-27T17:53:46.116350Z 0 [Note] Plugin group_replication reported: 'Exiting xcom thread'
2016-07-27T17:53:46.116363Z 0 [ERROR] Plugin group_replication reported: '[GCS] The member was unable to join the group. Local port: 3309'
2016-07-27T17:54:46.105271Z 4 [ERROR] Plugin group_replication reported: 'Timeout on wait for view after joining group'
2016-07-27T17:54:46.105321Z 4 [Note] Plugin group_replication reported: 'Requesting to leave the group despite of not being a member'
2016-07-27T17:54:46.105339Z 4 [ERROR] Plugin group_replication reported: '[GCS] The member is leaving a group without being on one.'
2016-07-27T17:54:46.105426Z 4 [Note] Plugin group_replication reported: 'Destroying SSL'
2016-07-27T17:54:46.105433Z 4 [Note] Plugin group_replication reported: 'Success destroying SSL'
2016-07-27T17:54:46.105465Z 4 [Note] Plugin group_replication reported: 'auto_increment_increment is reset to 1'
2016-07-27T17:54:46.105468Z 4 [Note] Plugin group_replication reported: 'auto_increment_offset is reset to 1'
2016-07-27T17:54:46.105706Z 8 [Note] Error reading relay log event for channel 'group_replication_applier': slave SQL thread was killed
2016-07-27T17:54:46.107848Z 5 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'
...

Can you let me know if you're still able to repeat it with Group Replication 0.8?

Thanks again!

I'm closing this as fixed in 0.8 (available on labs.mysql.com today). Please do let me know if you still see related issues and we can always re-open it. 

Thanks again!