Bug #16875 | Using stale MySQLD FRM files can cause restored cluster to fail | ||
---|---|---|---|
Submitted: | 28 Jan 2006 21:12 | Modified: | 22 May 2006 9:28 |
Reporter: | Jonathan Miller | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 4.1 -> | OS: | |
Assigned to: | Tomas Ulin | CPU Architecture: | Any |
[28 Jan 2006 21:12]
Jonathan Miller
[28 Jan 2006 21:42]
Jonas Oreland
I could not find any tracefiles... BTW: Can you start using the ndb_error_reporter tool that Stewart wrote?
[28 Jan 2006 23:14]
Jonathan Miller
I restored the database again, and then went to each MySQLD and wipped the file system clean and recreated the TPCB for each of the 8 Processes. The test then started and the cluster has stayed up. Before some mysqld I could use w/o issues, but other as soon as the test started the cluster would come down.
[28 Jan 2006 23:33]
Jonathan Miller
Everyonce in a while I would get; ERROR 1412 (HY000): Table definition has changed, please retry transaction
[29 Jan 2006 1:29]
Jonathan Miller
I just restored the DD Cluster database and total recreated all the TPCB database files for each MySQLD process. Test started w/o issue.
[30 Jan 2006 12:29]
Jonathan Miller
What do you need feed back on?
[30 Jan 2006 12:33]
Jonathan Miller
Sorry, did not see the question. I think the way to produce this is to have several MySQLD instances, use them for a while, restore a/the database and atemp to do a transaction such as an insert. you will get a temp error and cluster it gone. If you removed all the file for the mysqld and recreate them before attaching to the cluster with the restored database, then attach and create the new database, all if fine.
[31 Jan 2006 8:35]
Jonas Oreland
Jeb, when you say "restored", did you do a initial start before restoring?
[31 Jan 2006 11:57]
Jonathan Miller
Tomas, I will be moving to the 64 bit tests today, and will see if I can get it down to a set of steps on my side. Jonas, Actaully I would do a rm -rf ndb_#_fs before attempting the restore. This ensured that the ndb fs and the disk data and undo files were are removed before the restore, as --initial does not remove disk data files. Thanks JBM
[31 Jan 2006 12:07]
Jonas Oreland
Ok, then this a "know bug" also present in 4.1,5.0 The problem is that the mysqld keeps a copy of a table object (tableid, tableversion) And after initial start/restore then this table might not be the same one. So mysqld sends data with tableid/tableversion that ndb dont know is incorrect which yields inpredicatble results. The solution is to close all ndb objects/ndb handler on cluster failure And let mysqld retry instead. Tomas suggested that we fix this in 5.1 but dont do it in 4.1,5.0. The problem can only occur with initial start/restore + keeping mysqld's alive
[31 Jan 2006 12:14]
Jonathan Miller
I am okay with not fixing in 4.1, but not totally sure why we would want to leave out 5.0. But I am glad that you know what it casuing the issues. JBM
[2 Feb 2006 3:46]
Stewart Smith
If we solved this by introducing a cluster unique id and sending it around when nodes join we could then solve the potential yucky situation of where a (arguably dumb) administrator starts swapping nodes between two different clusters. Even if the option is to barf saying "trying to join a different cluster, aborting connect" it would be better than now :)
[15 May 2006 12:32]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/6383
[16 May 2006 6:12]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/6430
[16 May 2006 17:22]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/6473
[17 May 2006 4:42]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/6489
[22 May 2006 6:43]
Tomas Ulin
pushed do 4.1.20, 5.0.22, 5.1.11
[22 May 2006 9:28]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bugfix, yourself. More information about accessing the source trees is available at http://www.mysql.com/doc/en/Installing_source_tree.html Additional info: Documented bugfix in 4.0.20/5.0.22/5.1.11 changelogs. Documented DD limitation in 5.1 Manual Cluster Chapter DD section.