MySQL Bugs: #89194: Wrong certification lead to data inconsistency and GR breakage.

Bug #89194	Wrong certification lead to data inconsistency and GR breakage.
Submitted:	11 Jan 2018 19:14	Modified:	11 Apr 2018 16:23
Reporter:	Jean-François Gagné	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Group Replication	Severity:	S2 (Serious)
Version:	5.7.20, 8.0.3	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Hi,

This bug is a variation, for Group Replication, of Bug#86078 (Bad Write Set tracking with UNIQUE KEY on a DELETE followed by an INSERT).  Contrary to Bug#86078 and Bug#89141 where the consequences of the bad Write Set tracking are minor, this one will cause a false-positive certification which leads to data inconsistency and group breakage.

As a side note and related to “How to repeat” below, I am surprised that the INSERT on the member running the while commits before the INSERT from the other node as the INSERT from the other node was certified before the INSERT from this node.  This leads to different ordering (and views) of transaction on the different multi-writer nodes of the group.  It looks like violation of total ordering of transactions: I will also open a bug on that and add the bug number in the comments.

See below blog post for more details:
https://jfg-mysql.blogspot.com/2018/01/more-write-set-in-mysql-5-7-group-replication-certi...

Many thanks for looking into that,

JFG

How to repeat:
Deploy an instance of Group Replication and enable multi-writer/primary (SET GLOBAL group_replication_single_primary_mode = OFF).

Run the following as initialization on one member of the group:

> SELECT version();
+------------+
| version()  |
+------------+
| 5.7.20-log |
+------------+
1 row in set (0.00 sec)

> CREATE DATABASE test_jfg_ws;
Query OK, 1 row affected (0.01 sec)

> CREATE TABLE test_jfg_ws.test_jfg_ws (
    ->   id int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
    ->   str varchar(80) NOT NULL UNIQUE);
Query OK, 0 rows affected (0.01 sec)

> CREATE TABLE test_jfg_ws.test_jfg_ws2 (
    ->   id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    ->   str VARCHAR(80) NOT NULL);
Query OK, 0 rows affected (0.01 sec)

To trigger the false-positive certification (which leads to data inconsistency and group breakage), you can create a table with many rows (one million in my case) by running the following commands in a Linux shell on one member of the group:

# sql="INSERT INTO test_jfg_ws.test_jfg_ws2 (str) VALUES (RAND());"
# for i in $(seq 100); do
>   ( echo "BEGIN;"; yes "$sql" | head -n 10000; echo "COMMIT;"; ) | mysql;
> done

The following ALTER needs to take more than two seconds so below has a chance to trigger the false-positive certification.  If it does not take more than two seconds, you can add more rows in the table by running the above loop more times.

> ALTER TABLE test_jfg_ws.test_jfg_ws2 MODIFY COLUMN str VARCHAR(60);
Query OK, 1000000 rows affected (3.57 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

Then, with the help of above table, trigger a false-positive certification by running the two below script in the Linux shell of two different members of the group:

### On one member, run this in a Linux shell:
# while sleep 1; do
>   mysql <<< "SHOW PROCESSLIST" | grep -q "test_jfg_ws.test_jfg_ws2" || continue;
>   mysql <<< "INSERT INTO test_jfg_ws.test_jfg_ws (str) VALUES ('B')";
>   break;
> done

### On a different member, run this in a Linux shell:
# mysql <<< "
>   ALTER TABLE test_jfg_ws.test_jfg_ws2 MODIFY COLUMN str VARCHAR(80);
>   INSERT INTO test_jfg_ws.test_jfg_ws (str) VALUES ('b');"

The two inserts above are conflicting, but certification does not detect the conflict because of Bug#86078.  We will get a broken group in MySQL 5.7.20 and 8.0.3:

> SELECT MEMBER_ID, MEMBER_STATE FROM performance_schema.replication_group_members;
+--------------------------------------+--------------+
| MEMBER_ID                            | MEMBER_STATE |
+--------------------------------------+--------------+
| _member_uuid_                        | ERROR        |
+--------------------------------------+--------------+
1 row in set (0.00 sec)

We also get below in the error log of the waiting member (where the while was run):

2018-01-01T19:30:23.914570Z 89 [ERROR] Slave SQL for channel 'group_replication_applier': Could not execute Write_rows event on table test_jfg_ws.test_jfg_ws; Duplicate entry 'b' for key 'str', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 168, Error_code: 1062
2018-01-01T19:30:23.914599Z 89 [Warning] Slave: Duplicate entry 'b' for key 'str' Error_code: 1062
2018-01-01T19:30:23.914604Z 89 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 144
2018-01-01T19:30:23.914615Z 89 [ERROR] Plugin group_replication reported: 'The applier thread execution was aborted. Unable to process more transactions, this member will now leave the group.'
2018-01-01T19:30:23.914667Z 86 [ERROR] Plugin group_replication reported: 'Fatal error during execution on the Applier process of Group Replication. The server will now leave the group.'
2018-01-01T19:30:23.914742Z 86 [ERROR] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'

And we also get the following in the error log of the other member (where the ALTER was run):

2018-01-01T19:30:20.976713Z 86 [ERROR] Slave SQL for channel 'group_replication_applier': Could not execute Write_rows event on table test_jfg_ws.test_jfg_ws; Duplicate entry 'B' for key 'str', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 168, Error_code: 1062
2018-01-01T19:30:20.976744Z 86 [Warning] Slave: Duplicate entry 'B' for key 'str' Error_code: 1062
2018-01-01T19:30:20.976766Z 86 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 65
2018-01-01T19:30:20.976777Z 86 [ERROR] Plugin group_replication reported: 'The applier thread execution was aborted. Unable to process more transactions, this member will now leave the group.'
2018-01-01T19:30:20.976853Z 83 [ERROR] Plugin group_replication reported: 'Fatal error during execution on the Applier process of Group Replication. The server will now leave the group.'
2018-01-01T19:30:20.976992Z 83 [ERROR] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.' 

We can also see the following in Performance Schema on the two members:

> SELECT * FROM performance_schema.replication_applier_status_by_worker WHERE CHANNEL_NAME = 'group_replication_applier'\G
*************************** 1. row ***************************
         CHANNEL_NAME: group_replication_applier
            WORKER_ID: 0
            THREAD_ID: NULL
        SERVICE_STATE: OFF
LAST_SEEN_TRANSACTION: UUID:111
    LAST_ERROR_NUMBER: 1062
   LAST_ERROR_MESSAGE: Could not execute Write_rows event on table test_jfg_ws.test_jfg_ws; Duplicate entry 'b' for key 'str', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 168
 LAST_ERROR_TIMESTAMP: 2018-01-06 15:21:52
1 row in set (0.00 sec)

> SELECT * FROM performance_schema.replication_applier_status_by_worker WHERE CHANNEL_NAME = 'group_replication_applier'\G
*************************** 1. row ***************************
         CHANNEL_NAME: group_replication_applier
            WORKER_ID: 0
            THREAD_ID: NULL
        SERVICE_STATE: OFF
LAST_SEEN_TRANSACTION: UUID:1000003
    LAST_ERROR_NUMBER: 1062
   LAST_ERROR_MESSAGE: Could not execute Write_rows event on table test_jfg_ws.test_jfg_ws; Duplicate entry 'B' for key 'str', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 168
 LAST_ERROR_TIMESTAMP: 2018-01-06 15:21:49
1 row in set (0.00 sec) 

And looking at the data on each node, we can see that it is inconsistent:

> SELECT * FROM test_jfg_ws.test_jfg_ws;
+----+-----+
| id | str |
+----+-----+
| 13 | B   |
+----+-----+
2 rows in set (0.00 sec)

> SELECT * FROM test_jfg_ws.test_jfg_ws;
+----+-----+
| id | str |
+----+-----+
| 11 | b   |
+----+-----+
2 rows in set (0.00 sec)

This inconsistency is caused by the false-positive certification (no conflict detected while one should have been detected).

Suggested fix:
Like for Bug#89141, backport the solution to Bug#86078 in MySQL 5.7.

Hello Jean,

Thank you for the report and detailed steps.

Thanks,
Umesh

5.7.20 - test results

Attachment: 89194_5.7.20.results (application/octet-stream, text), 23.38 KiB.

Fixed as a duplicate of

Bug#88120 Backport BUG#26277771 to 5.7