MySQL Bugs: #32992: falcon_read-committed.test causes a duplicate key error *now* occasionally

Bug #32992	falcon_read-committed.test causes a duplicate key error now occasionally
Submitted:	5 Dec 2007 9:36	Modified:	29 Sep 2008 19:49
Reporter:	Alexander Nozdrin	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S1 (Critical)
Version:	6.0	OS:	Any
Assigned to:	Hakan Küçükyılmaz	CPU Architecture:	Any
Tags:	pushbuild, test failure

Description:
falcon.falcon_read-committed   [ fail ]

==29889== Memcheck, a memory error detector.
==29889== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==29889== Using LibVEX rev 1732, a library for dynamic binary translation.
==29889== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==29889== Using valgrind-3.2.3, a dynamic binary instrumentation framework.
==29889== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==29889== For more details, rerun with: -v
==29889== 
mysqltest: At line 11: query 'CREATE TABLE t1 (a int)' failed: 2013: Lost connection to MySQL server during query

The result from queries just before the failure was:
SET storage_engine = Falcon;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a int);

More results from queries before failure can be found in /dev/shm/var-falcon-111/log/falcon_read-committed.log
==29889== 
==29889== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 3 from 1)
==29889== malloc/free: in use at exit: 256 bytes in 1 blocks.
==29889== malloc/free: 201 allocs, 200 frees, 241,680 bytes allocated.
==29889== For counts of detected errors, rerun with: -v
==29889== searching for pointers to 1 not-freed blocks.
==29889== checked 1,322,552 bytes.
==29889== 
==29889== LEAK SUMMARY:
==29889==    definitely lost: 0 bytes in 0 blocks.
==29889==      possibly lost: 0 bytes in 0 blocks.
==29889==    still reachable: 256 bytes in 1 blocks.
==29889==         suppressed: 0 bytes in 0 blocks.
==29889== Reachable blocks (those to which a pointer was found) are not shown.
==29889== To see them, rerun with: --leak-check=full --show-reachable=yes

Stopping All Servers
Restoring snapshot of databases
Resuming Tests

How to repeat:
https://intranet.mysql.com/secure/pushbuild/xref.pl?testname=falcon.falcon_read-committed

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/39292

ChangeSet@1.2740, 2007-12-05 13:05:02+03:00, anozdrin@station. +2 -0
  Disable failing tests:
    - BUG#32992: falcon_read-committed.test causes a server crash
      occasionally;
    - BUG#32993: falcon_ts.test fails sporadically;
    - BUG#32994: subselect_sj fails sporadically (timeout).

The test case disabled in 6.0-runtime.

Verified as described. Test case fails from time to time.

Pushed into 6.0.5-alpha

This might be associated with the previous test that restarts the engine.  Falcon_options2.test.  Jim will need to work on this.  It might be similar to the 'final bug fix' Jim pushed Monday into 6.0.4 before the clone-off which was related to restarting the engine several times.  I don't know how common this crash would be.  I think this is very uncommon and is related to quick restarts. But for the fix that Jim pushed Monday, he had this to say...

The bug was a failure to rebuild a Section page when required during 
recovery.  The bug has been there indefinitely.  Something changed in 
the timing and sequence of the test suite that brought it out.  And 
there was probably no way we could have caught the bug by systematic 
testing.

I caught the bug by doing a full database validation following recovery 
that would force a bugcheck on any database corruption.  While this 
validated the diagnosis and fix, it should also give us a greater 
confidence that the recovery is robust and tested.

I will add a configuration parameter post beta to force post-recovery 
validation.  Since recovery is automatic on every server start and 
validation is potentially quite expensive, I don't want to make it the 
default.  But we should always use it when testing.

Please confirm that my fix got into the code base before the 
clone-off?   I consider it essential for database integrity.

Triage values set according to this assessment by Jim;

For the last week or two we have been getting sporadic failures on the 
Falcon pushbuild test suite on a server restart.  It is reproducible in 
the sense that almost every pushbuild shows a single instance of the 
problem.  However, we have not gotten a failure on a hand-run test series.  
Resultantly, we don't have anything that might point us at the problem 
-- no left-over database files, no traces, nothing.

I believe that there is a timing sensitive recovery problem.  It's 
infrequency, however, suggests that it is an artifact of the test 
suite.  The fact that we have never seen it when manually running the 
test suite or during regular debugging and testing suggests to me that 
it highly unlikely to occur in situ. 

Unless we can find a way to reproduce it in an environment that can 
capture state that we send Falcon to beta with the problem outstanding.

The recent history of recovery bugs has been that a snapshot of the 
database and log files had lead to diagnosis, fix, and recovery of the 
database.  My estimate is that the probability of hitting the bug in the 
field is probably about 40%, and if so, a fix leading to a successful 
recovery is about 80%.  On the other hand, without post-failure files or 
a trace, there is absolutely nothing to go on.

Jim Starkey pushed a change to Falcon that requires it ro wait for a Page Cache Flush at the end of recovery.  It used to do this, but when the pool of IO threads were added, the recovery process just called for a flush to happen and then returned.  Now the flush waits for all the IO threads to complete.

When recovery does not wait for the flush to complete, new blocks can be written to the serial log before all the pages from the previous recovery can be written to disk.  These blocks indicate that recovery should start after the previous one.  If a shutdown happens before these pages are fully written, The following recovery will not start early enough.

This very likely solves the double recovery problems exhibited by Bugs 32992, 33608 & 33665.  I am putting these bugs into Patch Pending so that this fix can be verified, or not.

Explanation of the double recovery problem by Ann Harrison;
==============================
We have been plagued recently with failures in recovery after
a previous recovery had not succeeded.  That turns out to
have been the result of a change to the page cache checkpoint
algorithm.  The fix was simple, once the cause was understood.
I'm going to try to explain the interaction of recovery and
checkpointing.

When Falcon starts, it determines whether it was shutdown
cleanly.  If not, it invokes recovery to apply the remaining
active parts of the serial log.  Recovery begins by finding
the position in the log after which changes may need to be
applied.  That position is called the recovery start point.
Every entry in the serial log includes the recovery start
point that was current when the entry was written.  If the
shutdown was not clean, Falcon begins recovery by identifying
the active log, reading the last entry, then backing up to its
recovery start point.

Recovery consists of applying changes in the serial log to
page images in the page cache, until they are all done, then
flushing the cache by forcing a checkpoint.

The test systems often shut down the server before Falcon has
moved all changes from the serial log to the main database.
After some errors, the test system restarts the server again,
causing Falcon to start recovery, then shutdown again.  For
the last several weeks, recovery has failed on the second restart.
One odd characteristic Jim noticed was that if he restarted from
an earlier stable point in the serial log, recovery succeeded.

Originally, a cache checkpoint consisted of a single I/O thread
writing all the dirty pages from the cache, using buffered I/O
without an fsync.  That was pretty fast, but not durable after
an ill-timed crash.  Adding an fsync caused a checkpoint to take
longer than the interval between checkpoints, which was a
performance problem.

A checkpoint now causes dirty pages in the cache to be
consolidated in a separate area so contiguous pages can be
flushed in a single write, then a bevy of I/O threads push
the page clusters to disk using direct I/O or buffered I/O
with a sync on each write (O_SYNC) depending on the capabilities
of the system.

The original page cache checkpoint got all pages into the
system file cache before it continued.  The new system is
asynchronous.

The problem was that the recovery system declared the
database recovered before the dirty pages in the page cache
were actually on disk. Other processing began, including
writes to the serial log with a new recovery start point.
If the server stopped at that point, the next attempt to
recover would fail because the first recovery did not get
fully applied to the database.

The solution is for recovery to wait until the I/O threads
are complete.

Patch is in mysql-6.0-falcon-team and mysql-6.0-release version 6.0.4

Pushed into 6.0.4-alpha

The test now fails with a different symptom:
mysqltest: At line 30: query 'INSERT INTO t1 VALUES (1)' failed: 1022: Can't write; duplicate key in table 't1'

The result from queries just before the failure was:
SET storage_engine = Falcon;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a int);
SHOW CREATE TABLE t1;
Table	Create Table
t1	CREATE TABLE `t1` (
  `a` int(11) DEFAULT NULL
) ENGINE=Falcon DEFAULT CHARSET=latin1
DROP TABLE t1;
SET storage_engine = Falcon;
SET @@tx_isolation = 'REPEATABLE-READ';
SET @@autocommit=0;
SET @@tx_isolation = 'REPEATABLE-READ';
SET @@autocommit=0;
BEGIN;
CREATE TABLE t1 (a int);
COMMIT;
BEGIN;
INSERT INTO t1 VALUES (1);

https://intranet.mysql.com/secure/pushbuild/getlog.pl?dir=mysql-6.0&entry=kosipov@bodhi.no...

The same failure occurs on other Windows hosts:
https://intranet.mysql.com/secure/pushbuild/getlog.pl?dir=mysql-6.0&entry=kosipov@bodhi.no...

I'm going to disable the test for now. Please re-enable after fix.

Weird - I found it in mysql-test/suite/falcon/t/disabled.def already, but the test failed anyway...

Hakan,  The current issue with this testcase is that it fails intermittently.  It has been moved to falcon_team to make pushbuild green.  falcon_team has not been running since the move to bazaar.  The error is failed: 1022: Can't write; duplicate key in table 't1'.  But 't1' doe not have ANY keys.  I suspect this is a testcase issue somehow.

Please try to tweak the testcase until is occurs more reliably or it does not occur at all.  I ran it 50 times in a row and could not reproduce this.