| Bug #84900 | Getting inconsistent result on different nodes | ||
|---|---|---|---|
| Submitted: | 9 Feb 2017 2:39 | Modified: | 17 Feb 2017 18:38 |
| Reporter: | Vadim Tkachenko | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: Group Replication | Severity: | S1 (Critical) |
| Version: | 5.7.17 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[9 Feb 2017 2:39]
Vadim Tkachenko
[9 Feb 2017 12:45]
MySQL Verification Team
Hello Vadim, Thank you for the report and feedback. Thanks, Umesh
[9 Feb 2017 12:46]
MySQL Verification Team
test results
Attachment: 84900.results (application/octet-stream, text), 7.82 KiB.
[9 Feb 2017 16:10]
Nuno Carvalho
Posted by developer: Hi Vadim, Thank you for evaluating Group Replication. Can you please include the member state after "select count(*) from t1 where balance=1000;" on each member? That is, execute: SELECT * FROM performance_schema.replication_group_members; Best regards, Nuno Carvalho
[9 Feb 2017 17:02]
Vadim Tkachenko
Node1: mysql> select count(*) from t1 where balance=1000; +----------+ | count(*) | +----------+ | 9475864 | +----------+ 1 row in set (4.13 sec) mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+--------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+--------------+-------------+--------------+ | group_replication_applier | d144d406-eee7-11e6-a430-0242ac120002 | 133f838e4f91 | 3306 | ONLINE | | group_replication_applier | d77757d9-eee7-11e6-a3b2-0242ac120003 | b7fcf6e3a4a1 | 3306 | ONLINE | | group_replication_applier | dda0209b-eee7-11e6-a37f-0242ac120004 | 2a66e0cd885e | 3306 | ONLINE | +---------------------------+--------------------------------------+--------------+-------------+--------------+ 3 rows in set (0.00 sec) Node2: mysql> select count(*) from t1 where balance=1000; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (4.87 sec) mysql> SELECT * FROM performance_schema.replication_group_members; +---------------------------+--------------------------------------+--------------+-------------+--------------+ | CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | +---------------------------+--------------------------------------+--------------+-------------+--------------+ | group_replication_applier | d144d406-eee7-11e6-a430-0242ac120002 | 133f838e4f91 | 3306 | ONLINE | | group_replication_applier | d77757d9-eee7-11e6-a3b2-0242ac120003 | b7fcf6e3a4a1 | 3306 | ONLINE | | group_replication_applier | dda0209b-eee7-11e6-a37f-0242ac120004 | 2a66e0cd885e | 3306 | ONLINE | +---------------------------+--------------------------------------+--------------+-------------+--------------+ 3 rows in set (0.00 sec)
[17 Feb 2017 9:09]
Nuno Carvalho
Hi Vadim,
Thank you for this bug report and for giving Group Replication a
try. We value all the feedback we can get.
Summary
=======
In this particular case, the system is operating properly. The fact
that you change 30.000.000 rows in a single transaction, means that
these changes need to be *propagated*, and then *applied* on the
receiving end. Originating node only waits for the propagation to take
place. And this [may] take more time for transactions changing a large
amount of rows, than those changing a small amount. In GR or any other
replicated system.
Yes, GR has provisioning for on-the-wire compression, which can help,
but then it is a balance of how much CPU cycles you trade for doing
compression instead of working on propagating data. So, it depends
really on the context (workload, network properties, ...).
Also, you may be running into a misconception or wrong assumption from
the start. GR is not a synchronous replication system. This means that
changes are propagated to each node in the group and eventually
applied to the database. Yes, there is a *logical* synchronization
point, servers agree on message delivery order, but the entire process
is asynchronous.
Scenario
========
Here are some specific details about the scenario reported in this
bug.
For a transaction to commit, the majority of the group have to agree
on the order of a given transaction in the global sequence of
transactions. Deciding to commit or abort a transaction is done by
each server individually, but all servers make the same decision.
After the agreement on the order of a given transaction in the global
sequence of transactions, each server will apply the transaction
asynchronously. That is, all servers apply data at each own pace, which
may means that data may be committed sooner on some members than others.
This is exactly what is happening on this scenario, after the order of
T1 is agree by all servers, server 1 only has to commit it to InnoDB and
binary log, all processing is already done. While on server 2, it still
needs to process everything: read the replication events, apply rows and
then commit it to InnoDB and binary log. Since this transaction has more
than 30.000.000 rows, when you query server 2, it is still applying T1,
you can check that with SHOW @@GLOBAL.GTID_EXECUTED and SHOW PROCESSLIST
command.
Example:
[connection server1]
> SELECT @@GLOBAL.GTID_EXECUTED;
62682ba7-f114-11e6-a941-a6c3dd0a98de:1-6
>
> [connection server2]
> SELECT @@GLOBAL.GTID_EXECUTED;
62682ba7-f114-11e6-a941-a6c3dd0a98de:1-5
T1 is being read from relay log and applied by applier thread.
[connection server2]
> SHOW PROCESSLIST;
Id User Host db Command Time State Info
...
9 system user NULL Connect 62 executing NULL
12 system user NULL Connect 11 Reading event from the relay log
T1 was read completely and is being applied.
[connection server2]
> SHOW PROCESSLIST;
Id User Host db Command Time State Info
...
9 system user NULL Connect 71 executing NULL
12 system user NULL Connect 20 Slave has read all relay log;
waiting for more updates NULL
We can also check the progress on replication performance tables.
T1 was written to relay log, RECEIVED_TRANSACTION_SET already contains
T1 (62682ba7-f114-11e6-a941-a6c3dd0a98de:6).
[connection server2]
> SELECT * FROM performance_schema.replication_connection_status WHERE
channel_name='group_replication_applier';
CHANNEL_NAME group_replication_applier
GROUP_NAME 62682ba7-f114-11e6-a941-a6c3dd0a98de
SOURCE_UUID 62682ba7-f114-11e6-a941-a6c3dd0a98de
THREAD_ID NULL
SERVICE_STATE ON
COUNT_RECEIVED_HEARTBEATS 0
LAST_HEARTBEAT_TIMESTAMP 0000-00-00 00:00:00
RECEIVED_TRANSACTION_SET 62682ba7-f114-11e6-a941-a6c3dd0a98de:1-6
Applier did already seen T1.
[connection server2]
> SELECT * FROM performance_schema.replication_applier_status_by_worker
WHERE channel_name='group_replication_applier';
CHANNEL_NAME group_replication_applier
WORKER_ID 0
THREAD_ID 29
SERVICE_STATE ON
LAST_SEEN_TRANSACTION 62682ba7-f114-11e6-a941-a6c3dd0a98de:6
T1 is applied.
[connection server2]
> SELECT @@GLOBAL.GTID_EXECUTED;
62682ba7-f114-11e6-a941-a6c3dd0a98de:1-6
On the current version, Group Replication does not provide synchronous
replication, that is stated on the manual at:
https://dev.mysql.com/doc/refman/5.7/en/group-replication-frequently-asked-questions.html#...
https://dev.mysql.com/doc/refman/5.7/en/group-replication-flow-control.html
We will state that more clearly on the beginning of Group Replication
manual chapter.
T1:
begin;
update t1 set balance=1000 where id < 70000000;
update t2 set balance=1000 where id < 70000000;
commit;
If you feel that I've missed something, please let me know.
Best regards,
Nuno Carvalho
[17 Feb 2017 18:38]
Vadim Tkachenko
ok, I see your point. Then please consider this as a feature request: Provide the way for read transaction to read actual, not the stale data.
