Bug #41031 | All ndbd nodes crash on backup failure when giving <backup id> manualy | ||
---|---|---|---|
Submitted: | 25 Nov 2008 19:58 | Modified: | 19 Feb 2009 8:04 |
Reporter: | Daniel Salinas | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | ndb-6.3.17 ndb-6.4.0 | OS: | Linux (rhel5.2) |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
Tags: | ndb cluster segfault |
[25 Nov 2008 19:58]
Daniel Salinas
[25 Nov 2008 20:18]
Daniel Salinas
I wanted to qualify, I am running these packages across my cluster: MySQL-Cluster-gpl-client-6.3.17-0.rhel5.x86_64.rpm MySQL-Cluster-gpl-devel-6.3.17-0.rhel5.x86_64.rpm MySQL-Cluster-gpl-extra-6.3.17-0.rhel5.x86_64.rpm MySQL-Cluster-gpl-management-6.3.17-0.rhel5.x86_64.rpm MySQL-Cluster-gpl-server-6.3.17-0.rhel5.x86_64.rpm MySQL-Cluster-gpl-shared-6.3.17-0.rhel5.x86_64.rpm MySQL-Cluster-gpl-storage-6.3.17-0.rhel5.x86_64.rpm MySQL-Cluster-gpl-tools-6.3.17-0.rhel5.x86_64.rpm
[25 Nov 2008 21:13]
Daniel Salinas
Thanks to the masterful work of Matthew Montgomery, we backtraced this to backups. I had zeroed my cluster and was preparing the import when I saw the crashes. Everything linked back to a 5 minute hot backup script that was running in cron. The master node at the time the backup job was kicked off would die and restart. This appears to not be a problem with ndbd but with the online backup. I am moving the severity to s3 as it appears to work fine when you have table data in ndb. Online backup blows up the management node kicking off the backup and master(oldest) node if you don't have any table data in the cluster.
[25 Nov 2008 21:38]
Daniel Salinas
in the spirit of retesting I had created a test table and verified that backups run. I then dropped the ndb table and backups still run. The only other thing that happened was that these ndbd nodes were all started with --initial and no data was imported. Also I am using a custom backup id in the format of MMDDHHmm, not sure if that has anything to do with it.
[25 Nov 2008 21:54]
Daniel Salinas
it appears this only happens when backing up an empty cluster and using a custom backup id with the start backup command.
[25 Nov 2008 22:00]
Daniel Salinas
so the particular case that causes this is if you have a freshly initialized cluster and run START BACKUP <ID> on your management node where ID is a custom backup id then the cluster master node dies.
[25 Nov 2008 22:07]
MySQL Verification Team
To see this error you must execute a backup on a completely clean cluster (--initial) with the $datadir/BACKUPS completely empty. The START BACKUP also has to include an explicitly defined <backup id>. ndb_mgm> START BACKUP 1 No other backups issued by "START BACKUP" alone should be done before hand. Workaround: Simple. Run regular "START BACKUP" first before any START BACKUP <backup id>, or ensure that at least 1 user defined table exists in the cluster.
[18 Feb 2009 20:53]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/66803 2862 Jonas Oreland 2009-02-18 ndb - bug#41031 - incorrect handling of start backup <id>
[18 Feb 2009 22:06]
Bugs System
Pushed into 5.1.32-ndb-6.4.3 (revid:jonas@mysql.com-20090218220511-bgnaexvwjjfq2g6w) (version source revid:jonas@mysql.com-20090218205319-9bapz34b4uam3uno) (merge vers: 5.1.32-ndb-6.4.3) (pib:6)
[18 Feb 2009 22:08]
Bugs System
Pushed into 5.1.32-ndb-6.3.23 (revid:jonas@mysql.com-20090218220353-ih9lxz0jg5od9k2c) (version source revid:jonas@mysql.com-20090218205235-emzevgpji2jb2gwf) (merge vers: 5.1.32-ndb-6.3.23) (pib:6)
[19 Feb 2009 6:44]
Bugs System
Pushed into 5.1.32-ndb-6.3.23 (revid:tomas.ulin@sun.com-20090219064350-7jj9hsvvbgsp88g5) (version source revid:tomas.ulin@sun.com-20090219064350-7jj9hsvvbgsp88g5) (merge vers: 5.1.32-ndb-6.3.23) (pib:6)
[19 Feb 2009 7:08]
Bugs System
Pushed into 5.1.32-ndb-6.3.23 (revid:tomas.ulin@sun.com-20090219070811-p36a79y85qfv5vsz) (version source revid:tomas.ulin@sun.com-20090219070811-p36a79y85qfv5vsz) (merge vers: 5.1.32-ndb-6.3.23) (pib:6)
[19 Feb 2009 8:04]
Jon Stephens
Documented bugfix in the NDB-6.2.17, 6.3.23, and 6.4.3 changelogs as follows: Given a MySQL Cluster containing no data (that is, whose data nodes had all been started using --initial, and into which no data had yet been imported) and having an empty backup directory, executing START BACKUP with a user-specified backup ID caused the data nodes to crash.
[19 Feb 2009 10:40]
Bugs System
Pushed into 5.1.32-ndb-6.4.3 (revid:jonas@mysql.com-20090219103836-vz65tl5a9n7rji1h) (version source revid:jonas@mysql.com-20090219103836-vz65tl5a9n7rji1h) (merge vers: 5.1.32-ndb-6.4.3) (pib:6)
[19 Feb 2009 10:44]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/66877 2866 Tomas Ulin 2009-02-19 remove sleep and add comment after bug#41031 was fixed modified: mysql-test/suite/ndb_team/t/ndb_autodiscover3.test
[19 Feb 2009 13:03]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/66904 2865 Tomas Ulin 2009-02-19 remove sleep and add comment after bug#41031 was fixed modified: mysql-test/suite/ndb_team/t/ndb_autodiscover3.test