| Bug #41915 | Rare crash during recovery due to incorrect checkpoint of MM-part of disktable | ||
|---|---|---|---|
| Submitted: | 7 Jan 2009 10:05 | Modified: | 26 May 2009 7:57 |
| Reporter: | Pekka Nousiainen | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Cluster: Disk Data | Severity: | S2 (Serious) |
| Version: | mysql-5.1-telco-6.x | OS: | Any |
| Assigned to: | Jonas Oreland | CPU Architecture: | Any |
| Tags: | mysql-5.1.x-telco-6.x | ||
[23 Jan 2009 14:04]
Pekka Nousiainen
Same test case but 2 (instead of 4) LQH threads. crash at line 1065 Dbtup::disk_page_alloc.. ddassert(pagePtr.p->uncommitted_used_space > 0) This 2-thread case took several hours.
[27 Apr 2009 10:25]
Pekka Nousiainen
probably fixed by these: http://lists.mysql.com/commits/72427 http://lists.mysql.com/commits/71907
[27 Apr 2009 10:26]
Pekka Nousiainen
probably fixed by these: http://lists.mysql.com/commits/72427 http://lists.mysql.com/commits/71907
[27 Apr 2009 10:27]
Pekka Nousiainen
happens when you lose internet after submit
[26 May 2009 3:32]
Jonas Oreland
During checkpoint, DD and MM create a consistent point that they both restore to. There was in the MM part that (really rarely) could include/exclude one row that should be in the snapshot. This would later cause crash during/after recovery.
[26 May 2009 4:14]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/74930 2936 Jonas Oreland 2009-05-26 ndb - bug#41915 - fix spurious crash in recovery of DD tables
[26 May 2009 4:52]
Bugs System
Pushed into 5.1.34-ndb-7.0.6 (revid:jonas@mysql.com-20090526044928-bx3798wzc46ypnop) (version source revid:jonas@mysql.com-20090526044928-bx3798wzc46ypnop) (merge vers: 5.1.34-ndb-7.0.6) (pib:6)
[26 May 2009 4:53]
Bugs System
Pushed into 5.1.34-ndb-6.2.18 (revid:jonas@mysql.com-20090526041403-0qtjtehbumdqqdgc) (version source revid:jonas@mysql.com-20090526041403-0qtjtehbumdqqdgc) (merge vers: 5.1.34-ndb-6.2.18) (pib:6)
[26 May 2009 4:54]
Bugs System
Pushed into 5.1.34-ndb-6.3.26 (revid:jonas@mysql.com-20090526042602-qei3xzhbx53556k8) (version source revid:jonas@mysql.com-20090526042602-qei3xzhbx53556k8) (merge vers: 5.1.34-ndb-6.3.26) (pib:6)
[26 May 2009 5:02]
Jonas Oreland
note to docs: 1) read my comment above on checkpoint 2) this seems to be somewhat more likely if using ndbmtd in 7.x
[26 May 2009 7:57]
Jon Stephens
Documented bugfix in the NDB-6.2.18, 6.3.26, and 7.0.6 changelogs as follows:
During a checkpoint, restore points are created for both the on-disk and
in-memory parts of a Disk Data table. Under certain rare conditions,
the in-memory restore point could include or exclude a row that
should have been in the snapshot. This would later later lead to a crash
during or following recovery.
[7.0.6 version only:]
This issue was somewhat more likely to be encountered when using
ndbmtd.

Description: testSystemRestart -v -n SR_DD_1 D1 crash in DbtupDiskAlloc.cpp line 1102: Dbtup::disk_page_free(... if (tabPtrP->m_attributes[DD].m_no_of_varsize == 0)... ndbassert(* (src + 1) != Tup_fixsize_page::FREE_RECORD); How to repeat: see description Suggested fix: may be related to bug#41398