Bug #91042 stop group_replication hang with select replication_connection_status
Submitted: 28 May 2018 6:19 Modified: 8 Aug 2018 15:45
Reporter: Zhenghu Wen (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:5.7.22 OS:Any
Assigned to: CPU Architecture:Any

[28 May 2018 6:19] Zhenghu Wen
Description:
when doing sql "select * from performance_schema.replication_connection_status where channel_name like 'group_replication_%'" and "stop group_replication" concurrently. both will not return. 

show processlist result:
| 47 | root        | localhost | performance_schema | Query   | 8646 | starting                              | stop group_replication                                                                               |
| 59 | root        | localhost | NULL               | Query   | 8646 | Sending data                          | select * from performance_schema.replication_connection_status where channel_name like 'group_replic |

How to repeat:
1、execute sqla:
while true;
do
./mysql/bin/mysql -uroot -S ./node$1/mysql.sock -Bse "select * from performance_schema.replication_connecton_%'"
sleep 0.5
done

2、 open a client and excute sqlb:
stop group_replication;

3、both will not return.

sqla: stop group_replication will acquire lock/mutex LOCK_group_replication_handler and wait 'sql thread' stop.
sqlb: select * from replication_connection_status will acquire lock/mutex rli->data_lock and mi->data_lock and wait for LOCK_group_replication_handler(acquired by sqla).
but ‘sql thread’ is waiting for rli->data_lock(acquired by sqlb) and could not stop,so sqla and sqlb dead locked

Suggested fix:
may be we should modify the lock  sequence in table_replication_connection_status
[28 May 2018 9:29] Umesh Shastry
Hello Zhenghu,

Thank you for the report and feedback.

Thanks,
Umesh
[28 May 2018 9:30] Umesh Shastry
5.7.22 - test results

Attachment: 91042_5.7.22.results (application/octet-stream, text), 29.97 KiB.

[1 Jun 2018 1:12] Zhenghu Wen
add a input parameter locked for get_group_replication_connection_status_info

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bugfix-91042.patch (application/octet-stream, text), 2.57 KiB.

[1 Jun 2018 1:38] Zhenghu Wen
together with  bugfix-91042.patch 

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: bugfix-91042-add.patch (application/octet-stream, text), 1.75 KiB.

[8 Aug 2018 15:45] David Moss
Posted by developer:
 
Thank you for your feedback, this has been fixed in upcoming versions and the following was added to the 5.7.24 changelog:
Attempting to uninstall the plugin while START GROUP_REPLICATION executed could result in unexpected behavior.