MySQL Bugs: #41915: Rare crash during recovery due to incorrect checkpoint of MM-part of disktable

Bug #41915	Rare crash during recovery due to incorrect checkpoint of MM-part of disktable
Submitted:	7 Jan 2009 10:05	Modified:	26 May 2009 7:57
Reporter:	Pekka Nousiainen	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Disk Data	Severity:	S2 (Serious)
Version:	mysql-5.1-telco-6.x	OS:	Any
Assigned to:	Jonas Oreland	CPU Architecture:	Any
Tags:	mysql-5.1.x-telco-6.x

Description:
testSystemRestart -v -n SR_DD_1 D1
crash in DbtupDiskAlloc.cpp line 1102:
Dbtup::disk_page_free(...
  if (tabPtrP->m_attributes[DD].m_no_of_varsize == 0)...
    ndbassert(* (src + 1) != Tup_fixsize_page::FREE_RECORD);

How to repeat:
see description

Suggested fix:
may be related to bug#41398

Same test case but 2 (instead of 4) LQH threads.
crash at line 1065
Dbtup::disk_page_alloc..
  ddassert(pagePtr.p->uncommitted_used_space > 0)
This 2-thread case took several hours.

probably fixed by these:
http://lists.mysql.com/commits/72427
http://lists.mysql.com/commits/71907

probably fixed by these:
http://lists.mysql.com/commits/72427
http://lists.mysql.com/commits/71907

happens when you lose internet after submit

During checkpoint, DD and MM create a consistent point that they both restore to.
There was in the MM part that (really rarely) could include/exclude one row that
should be in the snapshot. This would later cause crash during/after recovery.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/74930

2936 Jonas Oreland	2009-05-26
      ndb - bug#41915 - fix spurious crash in recovery of DD tables

Pushed into 5.1.34-ndb-7.0.6 (revid:jonas@mysql.com-20090526044928-bx3798wzc46ypnop) (version source revid:jonas@mysql.com-20090526044928-bx3798wzc46ypnop) (merge vers: 5.1.34-ndb-7.0.6) (pib:6)

Pushed into 5.1.34-ndb-6.2.18 (revid:jonas@mysql.com-20090526041403-0qtjtehbumdqqdgc) (version source revid:jonas@mysql.com-20090526041403-0qtjtehbumdqqdgc) (merge vers: 5.1.34-ndb-6.2.18) (pib:6)

Pushed into 5.1.34-ndb-6.3.26 (revid:jonas@mysql.com-20090526042602-qei3xzhbx53556k8) (version source revid:jonas@mysql.com-20090526042602-qei3xzhbx53556k8) (merge vers: 5.1.34-ndb-6.3.26) (pib:6)

note to docs:
1) read my comment above on checkpoint
2) this seems to be somewhat more likely if using ndbmtd in 7.x

Documented bugfix in the NDB-6.2.18, 6.3.26, and 7.0.6 changelogs as follows:

      During a checkpoint, restore points are created for both the on-disk and
      in-memory parts of a Disk Data table. Under certain rare conditions, 
      the in-memory restore point could include or exclude a row that
      should have been in the snapshot. This would later later lead to a crash
      during or following recovery.

      [7.0.6 version only:] 
      This issue was somewhat more likely to be encountered when using
      ndbmtd.