Bug #36631 Assertion in SerialLogControl::nextRecord
Submitted: 9 May 2008 16:09 Modified: 26 May 2010 17:46
Reporter: Vladislav Vaintroub Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0.11 OS:Any
Assigned to: Kevin Lewis CPU Architecture:Any
Tags: F_CHILL THAW

[9 May 2008 16:09] Vladislav Vaintroub
Description:
falcon_bug_36294  fails with assertion on pushbuild machines (and also on my own).

pushbuild crash (search for nextRecord):
https://intranet.mysql.com/secure/pushbuild/getlog.pl?dir=mysql-6.0-falcon-team&entry=jsta...

The callstack:
0000000077B34EA0    ntdll.dll!DbgBreakPoint()
00000001407DA980    mysqld.exe!Error::debugBreak()[error.cpp:94]
00000001407DA85B    mysqld.exe!Error::error()[error.cpp:71]
00000001407DA93C    mysqld.exe!Error::assertionFailed()[error.cpp:77]
000000014089CFA2    mysqld.exe!SerialLogControl::nextRecord()[seriallogcontrol.cpp:316]
00000001408D894B   
mysqld.exe!SerialLogTransaction::commit()[seriallogtransaction.cpp:87]
00000001408D8DCA   
mysqld.exe!SerialLogTransaction::doAction()[seriallogtransaction.cpp:158])´

...

The line in question:
	ASSERT(*input >= (LOW_BYTE_FLAG | srlEnd) && *input < (LOW_BYTE_FLAG | srlMax));

*input points to the middle of the record (all records in the test case
are  repeat('a',500) )

How to repeat:
I can repeat this test on QuadCore x64 Windows Vista (built relwithdebinfo)
I cannot repeat this bug in the debug version anymore.
I cannot repeat this bug on my laptop
[9 May 2008 16:10] Vladislav Vaintroub
error log

Attachment: master.err (application/octet-stream, text), 3.16 KiB.

[9 May 2008 16:17] Vladislav Vaintroub
I attached the error log taken from running the test with falcon_debug_flags=4

perl mysql-test-run.pl --mysqld=--falcon_debug_flags=4 --suite=falcon_team falcon_bug_36294

Please note:
At the very end of file, after the crash callstack has been written, there is a 
debug message

4: Record thaw: transaction 84, 10301 records, 5243209 bytes

This message is written by the non-crashing thread. It suggest that there are chill/thaw mechanism involved and  it probably  does not play nice with positioning in SerialLogControl .
[14 May 2008 23:47] Vladislav Vaintroub
It looks like I was right suspecting thaw mechanism in conflicting with SerialLog.nextRecord(). Got an interesting callstack today:

Thread 1 (assertion):
 	Error::assertionFailed
 	SerialLogControl::nextRecord
        SerialLogTransaction::commit
        ....
Thread 2 (thaw):
       ReadFile
       SerialLogFile::read
       SerialLogWindow::activateWindow  
       SerialLog::findWindowGivenOffset
       SRLUpdateRecords::thaw
       Transaction::thaw
       RecordVersion::thaw

In particular, ReadFile in Thread2 reads into the same 1MB buffer that is used by Thread1 in commit. This buffer is known as window->buffer in SerialLogTransaction::commit() in Thread1 and as this->buffer in SerialLogWindow::activateWindow()in Thread2 (and address is 0x0000000004270200)

The SerialLogControl::input in the assert expression
ASSERT(*input >= (LOW_BYTE_FLAG | srlEnd) && *input < (LOW_BYTE_FLAG | srlMax));
is located near the end of this buffer(has address  0x0000000004370000)
[5 Aug 2008 12:38] Philip Stoev
You may get this crash using the following procedure, however there are plenty of other crashes that may be triggered as well:

Please clone the mysql-test-extra-6.0 repository and execute

$ cd mysql-test-extra-6.0/mysqltest/gentest
$ perl runall.pl \
  --basedir=/path/to/6.0-falcon/ \
  --engine=falcon \
  --grammar=conf/chill_thaw.yy \
  --threads=20 --queries=100000 \
  --mysqld=--falcon_debug_mask=7167 \
  --mysqld=--falcon_record_chill_threshold=32K
[3 Sep 2008 16:14] Kevin Lewis
Chris,  This is most likely a synchonization issue between a thaw and a gopher.  Philip has seen it again by lowering the chill threshold.  Please as this to your list of chill/thaw bugs.
[24 Oct 2008 5:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56961

2880 Christopher Powers	2008-10-24
      Bug#36631,"Assertion in SerialLogControl::nextRecord"
      Bug#38541,"Falcon RecordVersion::thaw assertion """"bytesRestored > 0 || data.record == NULL"""
      Bug#38567,"Falcon crash in  Record::getEncodedRecord at Record.cpp:743"
      Bug#38569,"Falcon assertion in SRLUpdateIndex::thaw : type == srlUpdateIndex"
      Bug#39694,"Crash in StorageTable::setRecord during falcon_chill_thaw test"
      Bug#39695,"Crash in SRLUpdateRecords::thaw during falcon_chill_thaw test"
      Bug#39696,"Assertion in Table.cpp (dup->state == recDeleted) fails during falcon_chill_thaw"
      
      Fix 1 of 3 for chill/thaw bugs.
      
      Modified Record::hasRecord() to thaw if necessary, removed redundant record thaws, added exception handling
[24 Oct 2008 5:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56962

2881 Christopher Powers	2008-10-24
      Bug#36631,"Assertion in SerialLogControl::nextRecord"
      Bug#38541,"Falcon RecordVersion::thaw assertion """"bytesRestored > 0 || data.record == NULL"""
      Bug#38567,"Falcon crash in  Record::getEncodedRecord at Record.cpp:743"
      Bug#38569,"Falcon assertion in SRLUpdateIndex::thaw : type == srlUpdateIndex"
      Bug#39694,"Crash in StorageTable::setRecord during falcon_chill_thaw test"
      Bug#39695,"Crash in SRLUpdateRecords::thaw during falcon_chill_thaw test"
      Bug#39696,"Assertion in Table.cpp (dup->state == recDeleted) fails during falcon_chill_thaw"
      
      Fix 2 of 3 for chill/thaw bugs.
      
      Synchronized record thaw operations using a pool of sync objects in the associated table.
      Record chill operations do not need to be synchronized because chills are performed in one place, and only by the active Transaction.
[26 Oct 2008 15:52] Kevin Lewis
Two patches approved to problems with concurent thaws.
1) enhance hasRecord to thaw when needed.
2) serialize thaws with a hash table of SyncOjects.
[28 Oct 2008 8:10] Bugs System
Pushed into 6.0.8-alpha  (revid:cpowers@mysql.com-20081024053538-6y1ma8lydl93oxza) (version source revid:alik@mysql.com-20081028074021-o49ijs3x4pxj1gbm) (pib:5)
[13 Nov 2008 10:59] Philip Stoev
This was seen using the iuds6 test against 6.0.8 with the default chill threshold.
[18 Nov 2008 12:27] Hakan Küçükyılmaz
I can see both pushes in the clone-off:

bzr branch -r tag:clone-6.0.8-build mysql-6.0-local-master/ mysql-6.0-clone-6.0.8-build
hakan@lu0011:~/work/mysql/mysql-6.0-clone-6.0.8-build$ bzr log > log.txt

grep for 36631

        ------------------------------------------------------------
        revno: 2749.45.2
        committer: Christopher Powers <cpowers@mysql.com>
        branch nick: mysql
        timestamp: Fri 2008-10-24 00:35:38 -0500
        message:
          Bug#36631,"Assertion in SerialLogControl::nextRecord"
          Bug#38541,"Falcon RecordVersion::thaw assertion """"bytesRestored > 0 || data.record == NULL"""

        ------------------------------------------------------------
        revno: 2749.45.1
        committer: Christopher Powers <cpowers@mysql.com>
        branch nick: mysql
        timestamp: Fri 2008-10-24 00:06:52 -0500
        message:
          Bug#36631,"Assertion in SerialLogControl::nextRecord"
          Bug#38541,"Falcon RecordVersion::thaw assertion """"bytesRestored > 0 || data.record == NULL"""
[13 Dec 2008 9:05] MC Brown
No documentation needed (internal change only)
[16 Dec 2008 19:33] Hakan Küçükyılmaz
Re-verified running falcon_bug_36294 on lu0009 with non-debug build.

falcon_team.falcon_bug_36294   [ fail ]

mysqltest: At line 65: query 'DROP TABLE ftest' failed: 2006: MySQL server has gone away

Version: '6.0.9-alpha-log'  socket: '/data0/work/mysql/mysql-6.0-falcon-team/mysql-test/var/tmp/master.sock'  p
ort: 9306  Source distribution
[Falcon] Error: assertion (*input >= (LOW_BYTE_FLAG | srlEnd) && *input < (LOW_BYTE_FLAG | srlMax)) failed at l
ine 316 in file SerialLogControl.cpp

081216 22:31:54 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=1
max_threads=151
thread_count=1
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 60607 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = (nil) thread_stack 0x40000
/data0/work/mysql/mysql-6.0-falcon-team/sql/mysqld(my_print_stacktrace+0x2e) [0xa65dde]
/data0/work/mysql/mysql-6.0-falcon-team/sql/mysqld(handle_segfault+0x35d) [0x640939]
/lib64/libpthread.so.0 [0x2b986c7b6c10]
/lib64/libpthread.so.0(raise+0x2d) [0x2b986c7b6abd]
/data0/work/mysql/mysql-6.0-falcon-team/sql/mysqld(Error::error(char const*, ...)+0xf9) [0x8741e5]
/data0/work/mysql/mysql-6.0-falcon-team/sql/mysqld(SerialLogControl::nextRecord()+0x4f) [0x8b357b]
/data0/work/mysql/mysql-6.0-falcon-team/sql/mysqld(SerialLogTransaction::commit()+0xc6) [0x8b9276]
/data0/work/mysql/mysql-6.0-falcon-team/sql/mysqld(Gopher::gopherThread()+0x145) [0x8e6c4f]
/data0/work/mysql/mysql-6.0-falcon-team/sql/mysqld(Thread::thread()+0x95) [0x842a11]
/data0/work/mysql/mysql-6.0-falcon-team/sql/mysqld(Thread::thread(void*)+0x11) [0x842bf1]
/lib64/libpthread.so.0 [0x2b986c7af143]
/lib64/libc.so.6(__clone+0x6d) [0x2b986cff974d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
[16 Dec 2008 19:34] Hakan Küçükyılmaz
Note: I am renaming the test to falcon_bug_36294-big.test, because it takes too much run time on PB.
[24 Dec 2008 8:55] Philip Stoev
Also seen in 6.0.9
[3 Feb 2009 3:18] Christopher Powers
Several factors contributed to the failure of falcon_bug_36294-big.test.

1. Record cache overflow.

The primary issue with this bug is that the Scavenger could not keep up with the inflow of record versions. Recent improvements to the Scavenger helped, but several issues remained and were addressed in other bugs.

The Falcon scavenger has been reworked and is now more responsive, however, the fixes in Bug#42424 further improve the scavenger response, and are necessary for the testcase in this bug to complete successfully. 

2. The default Falcon page cache size was too small.

The default Falcon page cache size was 4MB, much too small for memory-intensive operations such as the testcase in this bug.

Changing the default page cache size from 4MB to 250MB improved the overall performance of the system, including record scavenging. This was addressed in Bug #42510, "Falcon: Default memory parameters must be changed".

3. The record backlog facility is never enabled by the Scavenger.

The record backlog is a fallback mechanism triggered by a low-memory condition. The new scavenger did not properly enable record backlogging. This has been corrected in Bug #42505, "Falcon: Record backlogging not enabled by the Scavenger".

Finally, there have been several fixes to the chill/thaw mechanism over the last several months (tag: CHILLTHAW). The 36294 testcase creates very large transactions, which the chill/thaw and backlog mechanisms now handle correctly.
[3 Feb 2009 3:22] Christopher Powers
Relevant fixes:

1)  http://lists.mysql.com/commits/64759
2990 Christopher Powers	2009-01-31
      Bug #42505, "Falcon: Record backlogging not enabled by the Scavenger"

      Fixed several issues with record backlogging.

2)  http://lists.mysql.com/commits/64959
2995 Christopher Powers	2009-02-02
      Bug #42505 "Falcon: Record backlogging not enabled by the Scavenger"
      
      Limit updateCardinalities() to once every 20 scavenge cycles. This
      function is too expensive to call every scavenge cycle. 
      
      Made Changes per Kevin's comments in the bug report, with one
      exception: Database::scavengeRecords() still determines low
      memory with retiredActiveMemory > recordScavengeFloor. Using 
      spaceRemaining > recordScavengeFloor does not enable backlogging
      when needed (confirmed against falcon_bug_36294-big.test).

3)  http://lists.mysql.com/commits/64768
2991 Christopher Powers	2009-02-01
      Bug #42424, "Serious performance degradation after new scavenger"
      
      Several fixes for the Scavenger:
      
      1. Don't run updateCardinalities() in a low memory state
      2. Distinguish between scheduled and load-based or 'forced' scavenges
      3. Progessively increase the time that record alloc waits for a scavenge
      4. Retire records during a low memory state, even if active memory is
         below the cache threshold

4)  http://lists.mysql.com/commits/64769
2992 Christopher Powers	2009-02-01
      Bug #42424, "Serious performance degradation after new scavenger"
      
      Increased scavenger wait increment from 10 to 20ms.
      Use progressive scavenger wait timeout for record allocation methods.

5)  http://lists.mysql.com/commits/64771
2993 Christopher Powers	2009-02-01
      Bug #42510, "Falcon: Default memory parameters must be changed"
      Bug #36442, "Falcon: crash during optimize table"
      
      Increased default page cache size from 4MB to 250MB
      Increased record scavenge threshold from 67 to 90
      Increased record scavenge floor from 50 to 80
[12 Apr 2009 15:48] Philip Stoev
This issue just happened with the following RQG command

  perl runall.pl    --engine=Falcon   --reporters=Deadlock,ErrorLog,Backtrace,Recovery,Shutdown   --mysqld=--loose-falcon-lock-wait-timeout=1   --mysqld=--loose-innodb-lock-wait-timeout=1   --mysqld=--log-output=none   --mysqld=--skip-safemalloc   --mysqld=--falcon-page-size=8K --rows=10000 --threads=4 --mysqld=--falcon-scavenge-schedule='1 1 1 1 1' --varchar-length=1024   --mask=41491   --queries=100000000   --duration=900   --basedir=/export/home/pb2/test/sb_1-462247-1239511953.98/mysql-6.0.11-alpha-linux-x86_64-test  --gendata=conf/varchar.zz  --grammar=conf/varchar.yy

Note that the invalid scavenge schedule essentially disables the scheduled scavenger.
[22 May 2009 20:46] Kevin Lewis
This bug has a long and sordid history.  Several times, it seemed to go away, and it has been linked to several patches which seemed to work at the time.  But unfortunately, I reproduced this on my laptop running once again with the current codebase.  I reproduced it with this revision-id: 
------------------------------------------------------------
revno: 2711
revision-id: kevin.lewis@sun.com-20090521152120-5hgesdnekamgt5ce
parent: kevin.lewis@sun.com-20090520182109-7gy8i1injcvihkun
committer: Kevin Lewis <kevin.lewis@sun.com>
branch nick: mysql-6.0-falcon-team
timestamp: Thu 2009-05-21 10:21:20 -0500
------------------------------------------------------------

This seems to happen so rarely that I felt like I needed to shift all my attention to debugging it.  So here is what I know;

The assert is in SerialLogControl::nextRecord().
	ASSERT(*input >= (LOW_BYTE_FLAG | srlEnd) && *input < (LOW_BYTE_FLAG | srlMax));

SerialLogControl::input has the value of 0x05510600.  As the memory dump shows below, it is pointing into the middle of a record, just like previously reported.  

0x055103E0  70 64 70 64 6c 77 6c 77 6e 79 6e 79 6c 6a 6c 6a  pdpdlwlwnynyljlj
0x055103F0  76 78 76 78 6b 61 6b 61 65 6e 65 6e 37 d4 56 3a  vxvxkakaenen7ÔV:
0x05510400  aa 14 37 fd 94 f1 15 a9 71 79 71 79 63 74 63 74  ª.7ý”ñ.©qyqyctct
0x05510410  68 67 68 67 62 63 62 63 63 74 63 74 4a 00 a8 63  hghgbcbcctctJ.¨c
0x05510420  79 63 79 70 66 70 66 6b 6a 6b 6a 72 66 72 66 71  ycypfpfkjkjrfrfq
. . .   // this skipped part is all the same record.
0x055105F0  77 62 77 62 66 69 66 69 73 71 73 71 61 65 61 65  wbwbfifisqsqaeae
0x05510600  70 65 70 67 6a 67 6a 78 6a 78 6a 6f 70 6f 70 76  pepgjgjxjxjopopv
0x05510610  67 76 67 63 6d 63 6d 73 68 73 68 37 e1 0e 23 8e  gvgcmcmshsh7á.#Ž
0x05510620  14 37 26 87 a4 5c b9 63 78 63 78 68 77 68 77 79  .7&.¤\.cxcxhwhwy
0x05510630  61 79 61 76 6a 76 6a 6f 73 6f 73 70 77 70 77 77  ayavjvjosospwpww
0x05510640  72 77 72 78 65 78 65 78 77 78 77 99 75 74 75 74  rwrxexexwxw™utut
0x05510650  49 54 69 65 69 65 6b 67 6b 67 78 69 78 69 79 71  ITieiekgkgxixiyq
0x05510660  79 71 75 6e 75 6e 74 70 74 70 62 64 62 64 6a 78  yqununtptpbdbdjx
0x05510670  6a 78 64 66 64 66 77 74 77 74 66 6c 66 6c 6b 78  jxdfdfwtwtflflkx
0x05510680  6b 78 6b 79 6b 79 6a 79 6a 79 79 62 79 62 74 6f  kxkykyjyjyybybto
0x05510690  74 6f 6c 76 6c 76 62 68 62 68 74 77 74 77 69 79  tolvlvbhbhtwtwiy
0x055106A0  69 79 6a 70 6a 70 b9 6a 78 6a 78 72 61 72 61 79  iyjpjp.jxjxraray
0x055106B0  76 79 76 73 6d 73 6d 71 78 71 78 61 64 61 64 6e  vyvsmsmqxqxadadn
0x055106C0  78 6e 78 6c 62 6c 62 71 61 71 61 81 ff 01 46 b0  xnxlblbqaqa.ÿ.F°
0x055106D0  11 d8 ff ff 9d 6c 6a 6c 6a 64 61 64 61 4a 00 d4  .Øÿÿ.ljljdadaJ.Ô
0x055106E0  6b 67 6b 67 78 72 78 72 68 72 68 72 79 73 79 73  kgkgxrxrhrhrysys

How did this happen?  I was running RQG on a Centrino Duo laptop with Windows XP and these settings;

Starting
 gentest.pl \
 --gendata=conf/falcon_backlog.zz \
 --engine=Falcon \
 --threads=10 \
 --queries=100000 \
 --duration=1200 \
 --dsn1=dbi:mysql:host=127.0.0.1:port=12500:user=root:database=test \
 --grammar=conf/falcon_backlog.yy
The engine was running in the debugger with these non-default settings;
falcon-debug-mask=4095
falcon-lock-wait-timeout=10
falcon-page-cache-size=20Mb
falcon-page-size=16kb
falcon-record-chill-threshold=200Kb
falcon-record-memory-max=50Mb
falcon-serial-log-buffers=20  # this is a default

At the time of the crash, there are 181 SerialLogWindow s linked up in the SerialLog.  The last 19 of them have buffers attached to them, so the last 19 Mb of the serial log is cached.  The other buffer is being used by a SerialLogWindow about 52 Mb back from the end. The failing thread is doing this;

Error::debugBreak() Line 92	C++
Error::error(const char * string=0x00ca8c1c, ...) Line 73	C++
Error::assertionFailed(const char * text=0x00cca9e0, const char * fileName=0x00cca9c4, int line=322) Line 78 + 0x16 bytes	C++
SerialLogControl::nextRecord() Line 322 + 0x34 bytes	C++
SerialLogTransaction::commit() Line 88 + 0x16 bytes	C++
SerialLogTransaction::doAction() Line 159	C++
Gopher::gopherThread() Line 71 + 0xf bytes	C++

In SerialLogTransaction::commit(), it has been stepping through a whole bunch of SerialLogRecords in order commit the ones for transaction number 72.  It has called nextRecord() 40289 times (skipped=40154, processed=135).  So this transaction is a long one, probably explicitly designed that way by Philip to test backlogging.  The gopher thread working on it has traversed most of the SerialLogWindows, reusing the buffer and re-reading from the seria log file for each window.  

It certainly appears like the buffer was reread from disk with different stuff than what this gopher thread was previously reading.  But I cannot figure out how that could happen.

The SerialLogControl::inputBlock.blockNumber is 490.  It is the first block in the buffer since it has the same address.  This block is 197633 byes long, ony 20% of the 1Mb buffer.   The input pointer that asserts is pointing to 0x05510600 and control->inputEnd is one byt past that; 0x05510601, which is correct (buffer=0x054e0200 + 197633 ).  But the raw dump above shows that this is nothing like the end of a block.  The contents of the buffer must have been overwritten.
[10 Jun 2009 7:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/75978

2722 Kevin Lewis	2009-06-10
      Bug#36631 - Make sure that SerialLogControl::currentLength is always set correctly.  If it is too short and a buffer needs to be re-read, not enough of the serial log will be read back inot the buffer. When it gets to the end of the current length and calls SerialLogControl::nextRecord() it will hit an assert when it reads the previous contents of that buffer, which is now junk.  The problem is that the block length indicates a larger window than the currentLength.  
      
      SerialLogControl::currentLength is now updated in each location that the current writePtr is extended.  This is three more places than before.
      
      In addition, some comments and some just-in-case code is added to SRLUpdateRecords::thaw().  Since this thaw is not calling setWindow(), but only doing a subset of that function, the task of clearing anddeactivating the old inputWindow that is done in setWindow() is repeated here.
[10 Jun 2009 23:35] Kevin Lewis
This bug, or assert, can happen whenever the number of serial log windows that are needed to keep track of uncompleted transactions exceeds the setting specified by falcon-serial-log-buffer.  This is 20 by default, and each buffer is 1 Mb.  So this could happen with any transaction that exceeds 20 Mb.  

The bug was labeled as a Chill/Thaw bug because it was only seen in those tests, where the falcon-record-chill-threshold is set unusually low.  There is an additional bug in the chill code that causes serial log windows to be truncated at a much smaller size than the full 1 Mb.  This bug does not have any other affect than to waist memory and cause serial log windows to be re-read quite often.  Since these windows into the serial log were smaller than they should have been, the extra reading increased the likelihood of the assert that was fixed in the previous patch.

The next patch will allow all serial log windows to be almost a full 1Mb even when the falcon-record-chill-threshold is set below 1Mb.
[10 Jun 2009 23:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/76077

2725 Kevin Lewis	2009-06-10
      Bug#36631 - Make sure all serial log windows are almost 1 Mb even when the falcon-record-chill-threshold is set below 1 Mb.  
      
      SRLUpdateRecords::append() steps through the records in a transaction with two embedded loops.  The inner loop has two places where execution can break out.  When it does, and there are still more records to process, writeWindow will be flushed and a new writeWindow will be created.  One of these break conditions is appropriate for starting a new window and the other is not.
      
      The first condition has this comnment.
         // If this is a different savepoint, break out of the inner loop 
         // and start another serial log record 
      
      There should be a different serial log record for each savepoint, but it does not need to start a new writeWindow.  When large transactions must chill records changed at different savepoints, this code path was causing the serial log windows to use less that the full 1 Mb available in the buffer.  This waisted space and increased the chances that the previous patch for this bug occurred.
      
      The second break in the inner loop has this comment;
         // Ensure record fits within current window 
      
      The code is designed to predict if a record will not fit in the current window.  If not, it would explicitly flush the old writeWindow, start a new writeWindow, and put the current record there.  The ability to break to a new window at any time is built into putdata, but for the purposes af thawing, we need to be able to thaw an entire record from the same serialLogRecord.  
      
      To distinguish between thew two purposes of breaking out of the inner loop, the code now uses a boolean called  forceNewWindow.
[17 Jun 2009 15:19] John Embretsen
I see this bug very often when running the falcon_bug_22154-big test on various hosts including a Linux Red Hat 4-core and a Solaris 10 x86_64 8-core:

falcon.falcon_bug_22154-big              [ fail ]
        Test ended at 2009-06-15 16:58:16

CURRENT_TEST: falcon.falcon_bug_22154-big
mysqltest: At line 50: query 'call pj9()' failed: 2013: Lost connection to MySQL server during query
(...)
Program terminated with signal 6, Aborted.
#0  0x0000003f1100b122 in pthread_kill () from /lib64/libpthread.so.0
#0  0x0000003f1100b122 in pthread_kill () from /lib64/libpthread.so.0
#1  0x0000000000c8a492 in my_write_core (sig=6) at stacktrace.c:309
#2  0x00000000006c4b2f in handle_segfault (sig=6) at mysqld.cc:2711
#3  <signal handler called>
#4  0x0000003f1100dd3d in raise () from /lib64/libpthread.so.0
#5  0x00000000009f6228 in Error::debugBreak () at Error.cpp:94
#6  0x00000000009f6359 in Error::error (string=0xe8a1d0 "assertion (%s) failed at line %d in file %s\n") at Error.cpp:71
#7  0x00000000009f63f7 in Error::assertionFailed (text=0xe941d0 "*input >= (LOW_BYTE_FLAG | srlEnd) && *input < (LOW_BYTE_FLAG | srlMax)",
    fileName=0xe93f48 "SerialLogControl.cpp", line=322) at Error.cpp:78
#8  0x0000000000a4ea9c in SerialLogControl::nextRecord (this=0x465fd600) at SerialLogControl.cpp:322
#9  0x0000000000a5b2d0 in SerialLogTransaction::commit (this=0xee7fb30) at SerialLogTransaction.cpp:88
#10 0x0000000000a5b3ba in SerialLogTransaction::doAction (this=0xee7fb30) at SerialLogTransaction.cpp:158
#11 0x0000000000a98f25 in Gopher::gopherThread (this=0x2aaab0343ee0) at Gopher.cpp:71
#12 0x0000000000a9909f in Gopher::gopherThread (arg=0x2aaab0343ee0) at Gopher.cpp:37
#13 0x00000000009ae0ef in Thread::thread (this=0x2aaab034ded0) at Thread.cpp:167
#14 0x00000000009ae317 in Thread::thread (parameter=0x2aaab034ded0) at Thread.cpp:146
#15 0x0000003f110062e7 in start_thread () from /lib64/libpthread.so.0
#16 0x0000003f104ce3bd in clone () from /lib64/libc.so.6

The bug status is "Patch approved", does that mean there is a patch ready to be pushed for this bug?
[17 Jun 2009 15:45] Kevin Lewis
John will open a new bug against falcon_bug_22154-big.
[17 Jun 2009 17:50] John Embretsen
Filed Bug#45566	("falcon_bug_22154-big fails with assertion in SerialLogControl::nextRecord") for the latest test failure (falcon_bug_22154-big) seeing this assertion.