Bug #84900 | Getting inconsistent result on different nodes | ||
---|---|---|---|
Submitted: | 9 Feb 2017 2:39 | Modified: | 17 Feb 2017 18:38 |
Reporter: | Vadim Tkachenko | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Group Replication | Severity: | S1 (Critical) |
Version: | 5.7.17 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[9 Feb 2017 2:39]
Vadim Tkachenko
[9 Feb 2017 12:45]
MySQL Verification Team
Hello Vadim, Thank you for the report and feedback. Thanks, Umesh
[9 Feb 2017 12:46]
MySQL Verification Team
test results
Attachment: 84900.results (application/octet-stream, text), 7.82 KiB.
[9 Feb 2017 16:10]
Nuno Carvalho
Posted by developer: Hi Vadim, Thank you for evaluating Group Replication. Can you please include the member state after "select count(*) from t1 where balance=1000;" on each member? That is, execute: SELECT * FROM performance_schema.replication_group_members; Best regards, Nuno Carvalho
[9 Feb 2017 17:02]
Vadim Tkachenko
Node1: mysql> select count(*) from t1 where balance=1000; +----------+ | count(*) | +----------+ | 9475864 | +----------+ 1 row in set (4.13 sec) mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+--------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+--------------+-------------+--------------+ | group_replication_applier | d144d406-eee7-11e6-a430-0242ac120002 | 133f838e4f91 | 3306 | ONLINE | | group_replication_applier | d77757d9-eee7-11e6-a3b2-0242ac120003 | b7fcf6e3a4a1 | 3306 | ONLINE | | group_replication_applier | dda0209b-eee7-11e6-a37f-0242ac120004 | 2a66e0cd885e | 3306 | ONLINE | +---------------------------+--------------------------------------+--------------+-------------+--------------+ 3 rows in set (0.00 sec) Node2: mysql> select count(*) from t1 where balance=1000; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (4.87 sec) mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+--------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+--------------+-------------+--------------+ | group_replication_applier | d144d406-eee7-11e6-a430-0242ac120002 | 133f838e4f91 | 3306 | ONLINE | | group_replication_applier | d77757d9-eee7-11e6-a3b2-0242ac120003 | b7fcf6e3a4a1 | 3306 | ONLINE | | group_replication_applier | dda0209b-eee7-11e6-a37f-0242ac120004 | 2a66e0cd885e | 3306 | ONLINE | +---------------------------+--------------------------------------+--------------+-------------+--------------+ 3 rows in set (0.00 sec)
[17 Feb 2017 9:09]
Nuno Carvalho
Hi Vadim, Thank you for this bug report and for giving Group Replication a try. We value all the feedback we can get. Summary ======= In this particular case, the system is operating properly. The fact that you change 30.000.000 rows in a single transaction, means that these changes need to be *propagated*, and then *applied* on the receiving end. Originating node only waits for the propagation to take place. And this [may] take more time for transactions changing a large amount of rows, than those changing a small amount. In GR or any other replicated system. Yes, GR has provisioning for on-the-wire compression, which can help, but then it is a balance of how much CPU cycles you trade for doing compression instead of working on propagating data. So, it depends really on the context (workload, network properties, ...). Also, you may be running into a misconception or wrong assumption from the start. GR is not a synchronous replication system. This means that changes are propagated to each node in the group and eventually applied to the database. Yes, there is a *logical* synchronization point, servers agree on message delivery order, but the entire process is asynchronous. Scenario ======== Here are some specific details about the scenario reported in this bug. For a transaction to commit, the majority of the group have to agree on the order of a given transaction in the global sequence of transactions. Deciding to commit or abort a transaction is done by each server individually, but all servers make the same decision. After the agreement on the order of a given transaction in the global sequence of transactions, each server will apply the transaction asynchronously. That is, all servers apply data at each own pace, which may means that data may be committed sooner on some members than others. This is exactly what is happening on this scenario, after the order of T1 is agree by all servers, server 1 only has to commit it to InnoDB and binary log, all processing is already done. While on server 2, it still needs to process everything: read the replication events, apply rows and then commit it to InnoDB and binary log. Since this transaction has more than 30.000.000 rows, when you query server 2, it is still applying T1, you can check that with SHOW @@GLOBAL.GTID_EXECUTED and SHOW PROCESSLIST command. Example: [connection server1] > SELECT @@GLOBAL.GTID_EXECUTED; 62682ba7-f114-11e6-a941-a6c3dd0a98de:1-6 > > [connection server2] > SELECT @@GLOBAL.GTID_EXECUTED; 62682ba7-f114-11e6-a941-a6c3dd0a98de:1-5 T1 is being read from relay log and applied by applier thread. [connection server2] > SHOW PROCESSLIST; Id User Host db Command Time State Info ... 9 system user NULL Connect 62 executing NULL 12 system user NULL Connect 11 Reading event from the relay log T1 was read completely and is being applied. [connection server2] > SHOW PROCESSLIST; Id User Host db Command Time State Info ... 9 system user NULL Connect 71 executing NULL 12 system user NULL Connect 20 Slave has read all relay log; waiting for more updates NULL We can also check the progress on replication performance tables. T1 was written to relay log, RECEIVED_TRANSACTION_SET already contains T1 (62682ba7-f114-11e6-a941-a6c3dd0a98de:6). [connection server2] > SELECT * FROM performance_schema.replication_connection_status WHERE channel_name='group_replication_applier'; CHANNEL_NAME group_replication_applier GROUP_NAME 62682ba7-f114-11e6-a941-a6c3dd0a98de SOURCE_UUID 62682ba7-f114-11e6-a941-a6c3dd0a98de THREAD_ID NULL SERVICE_STATE ON COUNT_RECEIVED_HEARTBEATS 0 LAST_HEARTBEAT_TIMESTAMP 0000-00-00 00:00:00 RECEIVED_TRANSACTION_SET 62682ba7-f114-11e6-a941-a6c3dd0a98de:1-6 Applier did already seen T1. [connection server2] > SELECT * FROM performance_schema.replication_applier_status_by_worker WHERE channel_name='group_replication_applier'; CHANNEL_NAME group_replication_applier WORKER_ID 0 THREAD_ID 29 SERVICE_STATE ON LAST_SEEN_TRANSACTION 62682ba7-f114-11e6-a941-a6c3dd0a98de:6 T1 is applied. [connection server2] > SELECT @@GLOBAL.GTID_EXECUTED; 62682ba7-f114-11e6-a941-a6c3dd0a98de:1-6 On the current version, Group Replication does not provide synchronous replication, that is stated on the manual at: https://dev.mysql.com/doc/refman/5.7/en/group-replication-frequently-asked-questions.html#... https://dev.mysql.com/doc/refman/5.7/en/group-replication-flow-control.html We will state that more clearly on the beginning of Group Replication manual chapter. T1: begin; update t1 set balance=1000 where id < 70000000; update t2 set balance=1000 where id < 70000000; commit; If you feel that I've missed something, please let me know. Best regards, Nuno Carvalho
[17 Feb 2017 18:38]
Vadim Tkachenko
ok, I see your point. Then please consider this as a feature request: Provide the way for read transaction to read actual, not the stale data.