Bug #91671 | stop slave sql_thread for channel 'group_replication_applier' could not return | ||
---|---|---|---|
Submitted: | 17 Jul 2018 2:08 | Modified: | 27 Nov 2018 12:05 |
Reporter: | Zhenghu Wen (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Group Replication | Severity: | S3 (Non-critical) |
Version: | 5.7.22 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[17 Jul 2018 2:08]
Zhenghu Wen
[17 Jul 2018 10:47]
MySQL Verification Team
Hello Zhenghu, Thank you for the report and feedback. I tried to follow your steps from "how to repeat" but not seeing any issues i.e no blocking etc observed. GR set up was same as in Bug#91042(91042_5.7.22.results) and sysbench 1.0.13 (using bundled LuaJIT 2.1.0-beta2) and tried almost all tests such as below: bulk_insert.lua oltp_common.lua oltp_delete.lua oltp_insert.lua oltp.lua oltp_point_select.lua oltp_read_only.lua oltp_read_write.lua oltp_update_index.lua oltp_update_non_index.lua oltp_write_only.lua select_random_points.lua select_random_ranges.lua bin/sysbench --tables=10 --table-size=5000000 --threads=50 --mysql-db=sbtest --mysql-user=root --mysql-socket=/tmp/mysql_hod03.sock --time=300 share/sysbench/select_random_points.lua prepare ^^ While above was running, tried on non-primary node mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+-------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ | group_replication_applier | 5e18a539-898e-11e8-8d96-0010e05f3e06 | hod03 | 3333 | ONLINE | | group_replication_applier | 64179dc1-898e-11e8-9db2-0010e05f4178 | hod04 | 3333 | ONLINE | | group_replication_applier | 6a03f852-898e-11e8-ac3f-0010e0734b98 | hod06 | 3333 | ONLINE | +---------------------------+--------------------------------------+-------------+-------------+--------------+ 3 rows in set (0.00 sec) mysql> SHOW STATUS LIKE 'group_replication_primary_member'; +----------------------------------+--------------------------------------+ | Variable_name | Value | +----------------------------------+--------------------------------------+ | group_replication_primary_member | 5e18a539-898e-11e8-8d96-0010e05f3e06 | +----------------------------------+--------------------------------------+ 1 row in set (0.00 sec) mysql> stop slave sql_thread for channel 'group_replication_applier'; Query OK, 0 rows affected (0.02 sec) mysql> stop slave sql_thread for channel 'group_replication_applier'; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> start slave sql_thread for channel 'group_replication_applier'; Query OK, 0 rows affected (0.00 sec) Could you please provide exact sysbench prepare command you were using, config details from all the nodes? Thank you! Thanks, Umesh
[18 Jul 2018 1:06]
Zhenghu Wen
Hi, Umesh: sysbench version is: hzwenzhh@db-181:~$ /home/hzwenzhh/sysbench/bin/sysbench --version sysbench 0.5 sysbench prepare script is: ./mysql/bin/mysql -uroot -S ./node1/mysql.sock -e 'create database sbtest' /home/hzwenzhh/sysbench/bin/sysbench --mysql-host=127.0.0.1 --mysql-port=10001 --mysql-user=rpl_user --mysql-password=rpl_pass --mysql-db=sbtest --test=/home/hzwenzhh/sysbench/db/oltp.lua --oltp_tables_count=10 --oltp-table-size=1000000000 --rand-init=on --num-threads=256 --report-interval=2 prepare
[18 Jul 2018 1:29]
Zhenghu Wen
Hi, Umesh: may be this script will help you: ./stop-slave-applier-loop.sh 2 hzwenzhh@db-181:~$ cat stop-slave-applier-loop.sh while true; do echo "begin stop slave applier channel" ./mysql/bin/mysql -uroot -S ./node$1/mysql.sock -e "stop slave sql_thread for channel 'group_replication_applier'" echo "finish stop slave, and start again" ./mysql/bin/mysql -uroot -S ./node$1/mysql.sock -e "start slave sql_thread for channel 'group_replication_applier'" echo "finish start slave, will do next loop..." sleep 5 done
[18 Jul 2018 11:32]
MySQL Verification Team
Thank you for the details. Verified as described. Thanks, Umesh
[27 Nov 2018 12:05]
David Moss
Posted by developer: Thank you for your feedback, this has been fixed in upcoming versions and the following was added to the 5.7.25 / 8.0.14 changelog: When stopping replication, any channels that had pending transactions could cause a deadlock in Group Replication.