Bug #46762 | Missing gap-event causes the slave to not stop resulting in data inconsistency | ||
---|---|---|---|
Submitted: | 17 Aug 2009 17:32 | Modified: | 23 Sep 2009 10:39 |
Reporter: | Premraj Nallasivampillai | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Documentation | Severity: | S2 (Serious) |
Version: | MySQL Cluster 7.0.6 | OS: | Linux (RedHat EL 4 update 6 - 32bit) |
Assigned to: | Jon Stephens | CPU Architecture: | Any |
Tags: | cluster, geo-replication |
[17 Aug 2009 17:32]
Premraj Nallasivampillai
[17 Aug 2009 17:34]
Premraj Nallasivampillai
binlogs, relalogs, my.cnf, etc
Attachment: mysqld_logs.zip (application/x-zip-compressed, text), 338.84 KiB.
[18 Aug 2009 8:50]
Susanne Ebrecht
Do you have this replication problem also by using MySQL server and InnoDB/MyISAM or is it cluster related?
[18 Aug 2009 13:20]
Martin Skold
This cannot happen in a non-cluster setup since the master is the node/process that stores the rows and will still binlog them even if it is not connected to the slave. In a cluster, if the master is disconnected the rows can still be inserted into the cluster, but these are not binlogged. When the master SQL-node reconnects to the cluster a gap event is to be inserted to inform the slave to stop since it is probably out-of-sync. In a HA setup one usually have two master SQL nodes and monitor them using some external clusterware. If one master SQL node fails this will be detected externally and one fails over to the other master SQL node (using a different binlog/position).
[18 Aug 2009 13:51]
Martin Skold
I did not find any attached (ndb) logs from the cluster? We need to analyze the cluster logs to see if the SQL node was actually detached from the cluster. This should be seen in the cluster log if there was a TCP disconnect, also one can see possible heartbeat failures, when SQL nodes are not responding. I will keep analyzing by testing a setup myself, but we need to check your logs as well.
[18 Aug 2009 13:59]
Martin Skold
Please notice these warnings found in the mysqld logs from all the clusters: 090812 14:10:38 [Warning] NDB: server id set to zero will cause any other mysqld with bin log to log with wrong server id 090812 14:15:42 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=db7-relay-bin' to avoid this problem. Replication does not seem to be properly setup.
[18 Aug 2009 14:36]
Premraj Nallasivampillai
cluster logs
Attachment: cluster-1.zip (application/x-zip-compressed, text), 363.79 KiB.
[18 Aug 2009 16:32]
Premraj Nallasivampillai
The configuration shortcomings are not relevant to this issue.
[19 Aug 2009 9:46]
Martin Skold
Please add all logs from the same test run, the oldest cluster logs seem to be from a run 2009-08-13, but the mysqld log says 090812. We need to correlate when cluster detected that the mysqld was disconnected (or missed heartbeats) with what the mysqld did at the same precise moment. It is important to add all logs always to save time in analysis!
[31 Aug 2009 6:49]
Martin Skold
This did not appear to be a bug, but documentation seems a bit unclear when the GAP event is received and the appropriate action to take. Changing category and assign to docs.
[31 Aug 2009 10:17]
Lars Thalmann
Does not seem any feedback is requested anymore. Changing to 'verified' state.
[23 Sep 2009 10:39]
Jon Stephens
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.