Bug #28174 Falcon concurrent BLOB updates leads to crash
Submitted: 30 Apr 2007 21:21 Modified: 26 Jul 2007 9:22
Reporter: Hakan Küçükyılmaz Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version: OS:Any
Assigned to: Jim Starkey CPU Architecture:Any

[30 Apr 2007 21:21] Hakan Küçükyılmaz
Description:
Using falcon_concurrent_blob_updates leads to crash:

./falcon_concurrent_blob_updates -r600 -s4

Running initializations ...
Pre-generating 16777216 bytes of random data.
About to spawn 2 threads
1 2
Completed spawning new database worker threads.
Runtime is 600.
Seed value is 4.
Isolation level is READ-COMMITTED.
Testcase is now running, so watch for error output.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1076373856 (LWP 28502)]
0x0000000000905270 in Cache::findBuffer (this=0x2a9e459d48,
dbb=0x2a9e497cf0, pageNumber=1187, lockType=Exclusive,
    sync=0x4027ff90) at Cache.cpp:334
334                                     for (candidate = bdb;
candidate->higher; candidate = candidate->higher->higher)
(gdb) p  candidate
$1 = (Bdb *) 0x2a9e3e47c8
(gdb) p bdb
$2 = (Bdb *) 0x2a9e3e47c8
(gdb) p candidate->higher
$3 = (class PagePrecedence *) 0x0
(gdb)

Full backtrace
(gdb) bt
#0  0x0000000000905270 in Cache::findBuffer (this=0x2a9e459d48,
dbb=0x2a9e497cf0, pageNumber=1187,
    lockType=Exclusive, sync=0x4027ff90) at Cache.cpp:334
#1  0x0000000000904f7f in Cache::fakePage (this=0x2a9e459d48,
dbb=0x2a9e497cf0, pageNumber=1187,
    type=PAGE_data_overflow, transId=30149) at Cache.cpp:259
#2  0x0000000000872b13 in Dbb::fakePage (this=0x2a9e497cf0,
pageNumber=1187, pageType=PAGE_data_overflow,
    transId=30149) at Dbb.cpp:193
#3  0x00000000008ace8c in PageInventoryPage::allocPage
(dbb=0x2a9e497cf0, pageType=PAGE_data_overflow,
    transId=30149) at PageInventoryPage.cpp:98
#4  0x0000000000872c25 in Dbb::allocPage (this=0x2a9e497cf0,
pageType=PAGE_data_overflow, transId=30149)
    at Dbb.cpp:226
#5  0x00000000008c8546 in Section::storeTail (this=0x2a9e4d4478,
stream=0x40280400, maxRecord=4084,
    pLength=0x402801f8, transId=30149, earlyWrite=true) at
Section.cpp:658
#6  0x00000000008c8189 in Section::storeRecord (this=0x2a9e4d4478,
recordLocatorPage=0x2aa057f000,
    indexPageNumber=104, index=0x2aa057ffc0, stream=0x40280400,
transId=30149, earlyWrite=true) at Section.cpp:572
#7  0x00000000008c7eaf in Section::updateRecord (this=0x2a9e4d4478,
recordNumber=502, stream=0x40280400,
    transId=30149, earlyWrite=true) at Section.cpp:511
#8  0x0000000000872f76 in Dbb::updateRecord (this=0x2a9e497cf0,
sectionId=44, recordId=502, stream=0x40280400,
    transId=30149, earlyWrite=true) at Dbb.cpp:310
#9  0x0000000000872f29 in Dbb::updateBlob (this=0x2a9e497cf0,
sectionId=44, recordId=502, stream=0x40280400,
    transId=30149) at Dbb.cpp:304

How to repeat:
Compile and run attached C program like

./falcon_concurrent_blob_updates -r600 -s4
[30 Apr 2007 21:24] Hakan Küçükyılmaz
C program for concurrent blob update with Falcon

Attachment: falcon_concurrent_blob_updates.tar.gz (application/x-gzip, text), 3.62 KiB.

[1 May 2007 0:19] MySQL Verification Team
Hakan,

With latest bk source and fresh tables:

[miguel@light 5.1f]$ ./falcon_concurrent_blob_updates -r600 -s4
Running initializations ...
Pre-generating 16777216 bytes of random data.
About to spawn 2 threads
1 2 
Completed spawning new database worker threads.
Runtime is 600.
Seed value is 4.
Isolation level is READ-COMMITTED.
Testcase is now running, so watch for error output.
Waiting for worker threads to finish ...
[miguel@light 5.1f]$ 

With a lot of the below errors messages:

070430 21:03:42 [ERROR] /home/miguel/dbs/5.1f/libexec/mysqld: Record has changed since last read in table 't1'
070430 21:03:42 [ERROR] /home/miguel/dbs/5.1f/libexec/mysqld: Sort aborted
070430 21:03:53 [ERROR] /home/miguel/dbs/5.1f/libexec/mysqld: Record has changed since last read in table 't1'
070430 21:03:53 [ERROR] /home/miguel/dbs/5.1f/libexec/mysqld: Sort aborted
070430 21:03:55 [ERROR] /home/miguel/dbs/5.1f/libexec/mysqld: Record has changed since last read in table 't1'

<CUT>

070430 21:13:04 [ERROR] /home/miguel/dbs/5.1f/libexec/mysqld: Sort aborted
070430 21:13:07 [ERROR] /home/miguel/dbs/5.1f/libexec/mysqld: Record has changed since last read in table 't1'
070430 21:13:07 [ERROR] /home/miguel/dbs/5.1f/libexec/mysqld: Sort aborted
070430 21:13:27 [ERROR] /home/miguel/dbs/5.1f/libexec/mysqld: Record has changed since last read in table 't1'
070430 21:13:27 [ERROR] /home/miguel/dbs/5.1f/libexec/mysqld: Sort aborted
[Thread -1441453168 (LWP 1062) exited]
[Thread -1309070448 (LWP 1065) exited]

Please ret-test with complete new tables.
[1 May 2007 3:56] Hakan Küçükyılmaz
Miguel,

I figuread out how it failed. I was going to start the test in a loop. It fails in it's second iteration:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1082665312 (LWP 11817)]
0x00000000009321f0 in Cache::findBuffer (this=0x2aaaab00cd50, dbb=0x2aaaab04ab00, pageNumber=11810,
    lockType=Exclusive, sync=0x4087fa40) at Cache.cpp:334
334                                     for (candidate = bdb; candidate->higher; candidate = candidate->higher->higher)
(gdb) p candidate->higher
$1 = (class PagePrecedence *) 0x0

To reproduce:
SEED=1
while ( [ $? -eq 0 ] )
    do
    ./falcon_concurrent_blob_updates -r660 -s$SEED
    SEED=$(($SEED + 3))
done
[1 May 2007 12:33] MySQL Verification Team
Thank you for the feedback.
[16 Jun 2007 12:20] Hakan Küçükyılmaz
Crashes or hits assertion.

Start with fresh installation (no Falcon databases at all).

Run one after other without restarting the server

./falcon_concurrent_blob_updates -r660 -s1
./falcon_concurrent_blob_updates -r660 -s1
./falcon_concurrent_blob_updates -r660 -s1

I got a crash on my first run. Backtrace is

Program received signal SIGILL, Illegal instruction.
[Switching to LWP 484]
0x00002b3db3b43fed in ?? ()
(gdb) bt
#0  0x00002b3db3b43fed in ?? ()
#1  0x00000000007d2622 in Error::debugBreak () at Error.cpp:92
#2  0x00000000007d271d in Error::error (string=<value optimized out>) at Error.cpp:69
#3  0x00000000007d277c in Error::assertionFailed (fileName=0x4 <Address 0x4 out of bounds>, line=484)
    at Error.cpp:76
#4  0x00000000007f5057 in RecordVersion::thaw (this=0x2b3db5571ad0) at RecordVersion.cpp:300
#5  0x00000000007f52c3 in RecordVersion::fetchVersion (this=0x2b3db5571ad0, trans=0x2b3db53b7f58)
    at RecordVersion.cpp:106
#6  0x000000000079eef2 in StorageDatabase::nextRow (this=<value optimized out>, storageTable=0x2b3db528e610,
    recordNumber=6694, lockForUpdate=false) at StorageDatabase.cpp:236
#7  0x00000000007a2821 in StorageTable::next (this=0x0, recordNumber=4, lockForUpdate=255)
    at StorageTable.cpp:126
#8  0x000000000079bd82 in StorageInterface::rnd_next (this=0xfaeef8, buf=0xfaf130 "����\005")
    at ha_falcon.cpp:470
#9  0x00000000006e8f93 in rr_sequential (info=0xfc8058) at records.cc:363
#10 0x000000000066731c in join_init_read_record (tab=0xfc7fd8) at sql_select.cc:11480
#11 0x000000000067610b in sub_select (join=0x2aaab00119a8, join_tab=0x1e4, end_of_records=4)
    at sql_select.cc:10830
#12 0x000000000067cf09 in do_select (join=0x2aaab00119a8, fields=0xf4b120, table=0x0, procedure=0x0)
    at sql_select.cc:10594
#13 0x0000000000685f56 in JOIN::exec (this=0x2aaab00119a8) at sql_select.cc:2099
#14 0x0000000000686b92 in mysql_select (thd=0xf49618, rref_pointer_array=0xf4b200, tables=0xfc71c0,
    wild_num=0, fields=<value optimized out>, conds=0xfc7660, og_num=0, order=0x0, group=0x0, having=0x0,
    proc_param=0x0, select_options=2148813312, result=0xfc7898, unit=0xf4ac50, select_lex=0xf4b018)
    at sql_select.cc:2275
#15 0x0000000000687094 in handle_select (thd=0xf49618, lex=0xf4abb0, result=0xfc7898,
    setup_tables_done_option=0) at sql_select.cc:258

On an other run I got an assertion on the second run:
./falcon_concurrent_blob_updates -r660 -s1
./falcon_concurrent_blob_updates -r660 -s1

Exception: assertion failed at line 560 in file Section.cpp

Exception during cleanupRecords of table TEST.T1: assertion failed at line 560 in file Section.cpp

Transactions: 3103 committed, 128 rolled back, 0 active, 2 post-commit
Activity on falcon_master.fts: 4 fetches, 4 reads, 13 writes
Activity on /data/mysql-sap/falcon_user.fts: 218923 fetches, 1239 reads, 23576 writes
SerialLog: 0 reads, 1741 writes, 742 transactions, 741 blocks, 2 windows
Stalled threads
  Thread 2ad095010690 (1116502336) sleep=0, grant=0, locks=1, who 0, javaThread 0
      prior=cccccccccccccccc, next=cccccccccccccccc, parent=0
Stalled synchronization objects:
------------------------------------
Stalled threads
  Thread 2ad094fc0088 (1074272576) sleep=0, grant=0, locks=1, who 0, javaThread 0
      prior=cccccccccccccccc, next=cccccccccccccccc, parent=0
    Pending Section::insertStub state 0 (1) syncObject 0x2ad095010350
  Thread 2ad095010690 (1116502336) sleep=1, grant=0, locks=1, who 0, javaThread 0
      prior=cccccccccccccccc, next=cccccccccccccccc, parent=0
Stalled synchronization objects:
  SyncObject 2ad095010350: state -1, monitor 0, waiters 1
    Exclusive thread 95010690 (1116502336), type 1;
    Waiting thread 94fc0088 (1074272576), type 1; Section::insertStub
------------------------------------
[26 Jun 2007 19:01] Ann Harrison
This looks a lot like 29081 - a crash after failing to find a buffer.
Does it still happen?
[26 Jun 2007 19:47] Hakan Küçükyılmaz
First run of ./falcon_concurrent_blob_updates -r660 -s1 succeeds now without crash but consequent run is not possible due to hang:

Exception: assertion failed at line 560 in file Section.cpp

Exception during cleanupRecords of table TEST.T1: assertion failed at line 560 in file Section.cpp

Transactions: 1198 committed, 48 rolled back, 2 active, 1 post-commit
SerialLog: 0 reads, 648 writes, 171 transactions, 612 completed, 171 blocks, 1 windows
Activity on falcon_master.fts: 4 fetches, 4 reads, 0 writes
Activity on /data/mysql-sap/falcon_user.fts: 60490 fetches, 405 reads, 7386 writes
Stalled threads
  Thread 2b1b63dddc60 (1116502336) sleep=0, grant=0, locks=1, who 0, javaThread 0
      prior=cccccccccccccccc, next=cccccccccccccccc, parent=0
Stalled synchronization objects:
------------------------------------
Stalled threads
  Thread 2b1b63d68088 (1074272576) sleep=0, grant=0, locks=1, who 0, javaThread 0
      prior=cccccccccccccccc, next=cccccccccccccccc, parent=0
Stalled synchronization objects:
------------------------------------
Transactions: 0 committed, 0 rolled back, 2 active, 1 post-commit
SerialLog: 0 reads, 1 writes, 2 transactions, 0 completed, 3 blocks, 1 windows
Activity on falcon_master.fts: 0 fetches, 0 reads, 2 writes
Activity on /data/mysql-sap/falcon_user.fts: 6 fetches, 0 reads, 24 writes
Transactions: 0 committed, 0 rolled back, 2 active, 1 post-commit
SerialLog: 0 reads, 1 writes, 1 transactions, 0 completed, 3 blocks, 1 windows
Transactions: 0 committed, 0 rolled back, 2 active, 1 post-commit
SerialLog: 0 reads, 1 writes, 0 transactions, 0 completed, 0 blocks, 1 windows
Transactions: 0 committed, 0 rolled back, 2 active, 1 post-commit
Transactions: 0 committed, 0 rolled back, 2 active, 1 post-commit
Transactions: 0 committed, 0 rolled back, 2 active, 1 post-commit
Transactions: 0 committed, 0 rolled back, 2 active, 1 post-commit
[26 Jun 2007 20:16] Hakan Küçükyılmaz
Now tried with default settings of falcon_concurrent_blog_updates.

1) It succeeds,
2) it hangs,
3) or it crashes.

Backtrace of a crash:

akan@lu0011:~/work/mysql/falcon/falcon_concurrent_blob_updates$ ./falcon_concurrent_blob_updates
Running initializations ...
Pre-generating 16777216 bytes of random data.
About to spawn 2 threads
1 2
Completed spawning new database worker threads.
Runtime is 500.
Seed value is 303.
Isolation level is READ-COMMITTED.
Testcase is now running, so watch for error output.

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 19809]
0x00000000007a8515 in Table::expungeRecordVersions (this=0x2b6a0b692660, record=0x1f9bcd8) at Table.cpp:1592
1592            for (Record *rec = prior; rec; rec = rec->getPriorVersion())
(gdb) bt
#0  0x00000000007a8515 in Table::expungeRecordVersions (this=0x2b6a0b692660, record=0x1f9bcd8) at Table.cpp:1592
#1  0x00000000007f66d5 in RecordVersion::scavenge (this=0x1f9bcd8, oldestActive=<value optimized out>)
    at RecordVersion.cpp:150
#2  0x00000000007f44c3 in RecordLeaf::retireRecords (this=<value optimized out>, table=<value optimized out>,
    base=<value optimized out>, age=-1, oldestActiveTransaction=26014) at RecordLeaf.cpp:126
#3  0x000000000083b581 in RecordGroup::retireRecords (this=<value optimized out>, table=0x2b6a0b692660,
    base=<value optimized out>, age=-1, oldestActiveTransaction=26014) at RecordGroup.cpp:122
#4  0x000000000083b581 in RecordGroup::retireRecords (this=<value optimized out>, table=0x2b6a0b692660,
    base=<value optimized out>, age=-1, oldestActiveTransaction=26014) at RecordGroup.cpp:122
#5  0x00000000007a7573 in Table::retireRecords (this=0x2b6a0b692660, age=-1, oldestActiveTransaction=26014)
    at Table.cpp:1515
#6  0x00000000007a75ec in Table::cleanupRecords (this=0x1fa41d8, threshold=33144024,
    oldestActiveTransaction=33144024) at Table.cpp:2299
#7  0x00000000007c52f2 in Database::cleanupRecords (this=0x2b6a0b443b98, threshold=-1,
    oldestActiveTransaction=26014) at Database.cpp:2062
#8  0x00000000007c580f in Database::retireRecords (this=0x2b6a0b443b98) at Database.cpp:1749
#9  0x00000000007c5910 in Database::scavenge (this=0x2b6a0b443b98) at Database.cpp:1651
#10 0x00000000007ff4b3 in Scavenger::scavenge (this=<value optimized out>) at Scavenger.cpp:58
#11 0x00000000007ff4ca in Scavenger::execute (this=0x1fa41d8, scheduler=0x1f9bcd8) at Scavenger.cpp:68
#12 0x000000000080034a in Scheduler::schedule (this=0x2b6a0b64e8a8) at Scheduler.cpp:136
#13 0x0000000000800443 in Scheduler::schedule (lpParameter=0x1fa41d8) at Scheduler.cpp:158
#14 0x00000000007ae54e in Thread::thread (this=0x2b6a0b68b6c8) at Thread.cpp:159
#15 0x00000000007ae6bf in Thread::thread (parameter=0x1fa41d8) at Thread.cpp:138
[27 Jun 2007 17:24] Jim Starkey
Problem turned out to be a short window in which a SerialLogWindow
had an incorrect "currentLength", preventing a successful record
"thaw" operation.
[28 Jun 2007 15:53] Hakan Küçükyılmaz
Running falcon_concurrent_blob_updates succeeds in most cases and sometimes it hangs due to a deadlock.

In one of twenty runs I had a crash:

./falcon_concurrent_blob_updates -n5 -r540

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 9224]
0x00000000007a8515 in Table::expungeRecordVersions (this=0x2b0a8a9d7870, record=0x2aaab113dab8)
    at Table.cpp:1592
1592            for (Record *rec = prior; rec; rec = rec->getPriorVersion())
(gdb) bt
#0  0x00000000007a8515 in Table::expungeRecordVersions (this=0x2b0a8a9d7870, record=0x2aaab113dab8)
    at Table.cpp:1592
#1  0x00000000007f6725 in RecordVersion::scavenge (this=0x2aaab113dab8, oldestActive=<value optimized out>)
    at RecordVersion.cpp:150
#2  0x00000000007f44c3 in RecordLeaf::retireRecords (this=<value optimized out>, table=<value optimized out>,
    base=<value optimized out>, age=-1, oldestActiveTransaction=43309) at RecordLeaf.cpp:126
#3  0x000000000083b641 in RecordGroup::retireRecords (this=<value optimized out>, table=0x2b0a8a9d7870,
    base=<value optimized out>, age=-1, oldestActiveTransaction=43309) at RecordGroup.cpp:122
#4  0x000000000083b641 in RecordGroup::retireRecords (this=<value optimized out>, table=0x2b0a8a9d7870,
    base=<value optimized out>, age=-1, oldestActiveTransaction=43309) at RecordGroup.cpp:122
#5  0x00000000007a7573 in Table::retireRecords (this=0x2b0a8a9d7870, age=-1, oldestActiveTransaction=43309)
    at Table.cpp:1515
#6  0x00000000007a75ec in Table::cleanupRecords (this=0x2aaab113bc58, threshold=-1324098888,
    oldestActiveTransaction=2970868408) at Table.cpp:2299
#7  0x00000000007c52f2 in Database::cleanupRecords (this=0x2b0a8a787b98, threshold=-1,
    oldestActiveTransaction=43309) at Database.cpp:2062
#8  0x00000000007c580f in Database::retireRecords (this=0x2b0a8a787b98) at Database.cpp:1749
#9  0x00000000007c5910 in Database::scavenge (this=0x2b0a8a787b98) at Database.cpp:1651
#10 0x00000000007ff503 in Scavenger::scavenge (this=<value optimized out>) at Scavenger.cpp:58
#11 0x00000000007ff51a in Scavenger::execute (this=0x2aaab113bc58, scheduler=0x2aaab113dab8) at Scavenger.cpp:68
#12 0x000000000080039a in Scheduler::schedule (this=0x2b0a8a9928a8) at Scheduler.cpp:136
#13 0x0000000000800493 in Scheduler::schedule (lpParameter=0x2aaab113bc58) at Scheduler.cpp:158
#14 0x00000000007ae54e in Thread::thread (this=0x2b0a8a9cf6c8) at Thread.cpp:159
#15 0x00000000007ae6bf in Thread::thread (parameter=0x2aaab113bc58) at Thread.cpp:138
[6 Jul 2007 13:39] Hakan Küçükyılmaz
Hits assertion in 5 out of 7 runs of

  ./falcon_concurrent_blob_updates -r540 -s963

I get two different assertions.

0x00002b774aed4fed in raise () from /lib/libpthread.so.0
(gdb) bt
#0  0x00002b774aed4fed in raise () from /lib/libpthread.so.0
#1  0x00000000007fc7da in Error::debugBreak () at Error.cpp:92
#2  0x00000000007fc8d0 in Error::error (string=<value optimized out>) at Error.cpp:69
#3  0x00000000007fc930 in Error::assertionFailed (fileName=0x4 <Address 0x4 out of bounds>, line=20047)
    at Error.cpp:76
#4  0x000000000082bb1c in Section::deleteLine (this=0x2b774c788900, bdb=0x2b774c590e98,
    line=<value optimized out>, sectionPageNumber=7, transId=0, locatorPage=0x2aaaaac9e000, locatorLine=98)
    at Section.cpp:561
#5  0x000000000082cf19 in Section::updateRecord (this=0x2b774c788900, recordNumber=<value optimized out>,
    stream=0x0, transId=0, earlyWrite=false) at Section.cpp:483
#6  0x00000000007f4e19 in Dbb::updateRecord (this=0x2b774c784620, section=0x4e44, recordId=20047, stream=0x4,
    transaction=<value optimized out>, earlyWrite=false) at Dbb.cpp:325

and

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 1116236096 (LWP 20015)]
0x00002ab9d089bfed in raise () from /lib/libpthread.so.0
(gdb) bt
#0  0x00002ab9d089bfed in raise () from /lib/libpthread.so.0
#1  0x00000000007fc7da in Error::debugBreak () at Error.cpp:92
#2  0x00000000007fc8d0 in Error::error (string=<value optimized out>) at Error.cpp:69
#3  0x00000000007fc930 in Error::assertionFailed (fileName=0x4 <Address 0x4 out of bounds>, line=20015)
    at Error.cpp:76
#4  0x000000000082bb1c in Section::deleteLine (this=0x2ab9d214f900, bdb=0x2ab9d1f44b80,
    line=<value optimized out>, sectionPageNumber=7, transId=0, locatorPage=0x2aaaaabeb000, locatorLine=159)
    at Section.cpp:561
#5  0x000000000082cf19 in Section::updateRecord (this=0x2ab9d214f900, recordNumber=<value optimized out>,
    stream=0x0, transId=0, earlyWrite=false) at Section.cpp:483
#6  0x00000000007f4e19 in Dbb::updateRecord (this=0x2ab9d214b5e0, section=0x4e24, recordId=20015, stream=0x4,
    transaction=<value optimized out>, earlyWrite=false) at Dbb.cpp:325
#7  0x00000000007d0f21 in Table::expungeBlob (this=0x2ab9d2151268, blob=<value optimized out>) at Table.cpp:1717
[7 Jul 2007 14:55] Hakan Küçükyılmaz
Not fixed.

7 out of 10 runs failed.

Hitting always same assertion. Please note that we see same assertion as in Bug#29613.

(gdb) f 4
#4  0x000000000082bbec in Section::deleteLine (this=0x2ba197508828, bdb=0x2ba19731d248,
    line=<value optimized out>, sectionPageNumber=7, transId=0, locatorPage=0x2aaaaaddc000, locatorLine=366)
    at Section.cpp:561
561                             ASSERT(false);
(gdb) l
556                     return spaceAvailable;
557                     }
558
559             for (int n = 0; n <= locatorPage->maxLine; ++n)
560                     if (n != locatorLine && locatorPage->elements[n].page == bdb->pageNumber)
561                             ASSERT(false);
562
563             if (!dbb->serialLog->recovering)
564                     dbb->serialLog->logControl->sectionLine.append(dbb, sectionPageNumber, bdb->pageNumber);
565
(gdb) p n
$1 = -1
(gdb) p locatorLine
$2 = 366

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 1116236096 (LWP 17336)]
0x00002ba195c54fed in raise () from /lib/libpthread.so.0
(gdb) bt
#0  0x00002ba195c54fed in raise () from /lib/libpthread.so.0
#1  0x00000000007fc85a in Error::debugBreak () at Error.cpp:92
#2  0x00000000007fc950 in Error::error (string=<value optimized out>) at Error.cpp:69
#3  0x00000000007fc9b0 in Error::assertionFailed (fileName=0x4 <Address 0x4 out of bounds>, line=17336)
    at Error.cpp:76
#4  0x000000000082bbec in Section::deleteLine (this=0x2ba197508828, bdb=0x2ba19731d248,
    line=<value optimized out>, sectionPageNumber=7, transId=0, locatorPage=0x2aaaaaddc000, locatorLine=366)
    at Section.cpp:561
#5  0x000000000082cfe9 in Section::updateRecord (this=0x2ba197508828, recordNumber=<value optimized out>,
    stream=0x0, transId=0, earlyWrite=false) at Section.cpp:483
#6  0x00000000007f4e99 in Dbb::updateRecord (this=0x2ba197504800, section=0x43ad, recordId=17336, stream=0x4,
    transaction=<value optimized out>, earlyWrite=false) at Dbb.cpp:325
#7  0x00000000007d0f51 in Table::expungeBlob (this=0x2ba19750a190, blob=<value optimized out>)
    at Table.cpp:1717
#8  0x00000000007d10c2 in Table::garbageCollect (this=0x2ba19750a190, leaving=0x160de38,
    staying=0x2ba1975efce0, transaction=<value optimized out>, quiet=<value optimized out>) at Table.cpp:1752
#9  0x00000000007d1159 in Table::expungeRecordVersions (this=0x2ba19750a190, record=0x2ba1975efce0)
    at Table.cpp:1600
#10 0x000000000081f585 in RecordVersion::scavenge (this=0x4, oldestActive=<value optimized out>)
    at RecordVersion.cpp:150
#11 0x000000000081d293 in RecordLeaf::retireRecords (this=<value optimized out>, table=<value optimized out>,
    base=<value optimized out>, age=-1, oldestActiveTransaction=36938) at RecordLeaf.cpp:126
#12 0x0000000000864a91 in RecordGroup::retireRecords (this=<value optimized out>, table=0x2ba19750a190,
    base=<value optimized out>, age=-1, oldestActiveTransaction=36938) at RecordGroup.cpp:122
#13 0x0000000000864a91 in RecordGroup::retireRecords (this=<value optimized out>, table=0x2ba19750a190,
    base=<value optimized out>, age=-1, oldestActiveTransaction=36938) at RecordGroup.cpp:122
#14 0x00000000007d0193 in Table::retireRecords (this=0x2ba19750a190, age=-1, oldestActiveTransaction=36938)
    at Table.cpp:1519
[7 Jul 2007 15:07] Hakan Küçükyılmaz
Please note that the failure is happening with
 ./falcon_concurrent_blob_updates -r540 -s963
[9 Jul 2007 9:20] Hakan Küçükyılmaz
Reproduced it on fluffy.
SEED=1
while ( [ $? -eq 0 ] );
     do
     ./falcon_concurrent_blob_updates -r540 -s$SEED;
     SEED=$(($SEED + 3));
done

...
...
...

Completed spawning new database worker threads.
Runtime is 540.
Seed value is 16.
Isolation level is READ-COMMITTED.
Testcase is now running, so watch for error output.
Waiting for worker threads to finish ...

(gdb) f 4
#4  0x00000000008ef475 in ~Record (this=0x2aab1c873000) at Record.cpp:129
129             ASSERT(!active);
(gdb) p active
$1 = 1 '\001'

Backtrace:

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 1085045088 (LWP 10547)]
0x00002aaaab16643a in raise () from /lib64/tls/libc.so.6
(gdb) bt
#0  0x00002aaaab16643a in raise () from /lib64/tls/libc.so.6
#1  0x00000000008b7338 in Error::debugBreak () at Error.cpp:92
#2  0x00000000008b744c in Error::error (string=0xc45508 "assertion failed at line %d in file %s\n")
    at Error.cpp:69
#3  0x00000000008b74e3 in Error::assertionFailed (fileName=0xc4a7f0 "Record.cpp", line=129) at Error.cpp:76
#4  0x00000000008ef475 in ~Record (this=0x2aab1c873000) at Record.cpp:129
#5  0x00000000008f08cf in ~RecordVersion (this=0x2aab1c873000) at RecordVersion.cpp:65
#6  0x00000000008ee7f6 in Record::release (this=0x2aab1c873000) at Record.cpp:517
#7  0x0000000000880198 in Transaction::commitRecords (this=0x2aaaabb9a7a0) at Transaction.cpp:697
#8  0x0000000000883369 in Transaction::commit (this=0x2aaaabb9a7a0) at Transaction.cpp:288
#9  0x0000000000899d83 in Connection::commit (this=0x2aaafed92fc8) at Connection.cpp:269
#10 0x00000000008623db in StorageConnection::commit (this=0x2aaafec88850) at StorageConnection.cpp:153
#11 0x000000000086683f in StorageHandler::commit (this=0x2aaaab8d6048, mySqlThread=0x1094278)
    at StorageHandler.cpp:211
#12 0x000000000085ab29 in StorageInterface::commit (hton=0x102d878, thd=0x1094278, all=true)
    at ha_falcon.cpp:966
#13 0x000000000078bcff in ha_commit_one_phase (thd=0x1094278, all=true) at handler.cc:779
#14 0x000000000078c414 in ha_commit_trans (thd=0x1094278, all=true) at handler.cc:749
#15 0x0000000000677baf in end_trans (thd=0x1094278, completion=COMMIT) at sql_parse.cc:555
#16 0x0000000000684084 in mysql_execute_command (thd=0x1094278) at sql_parse.cc:3557
#17 0x0000000000686e07 in mysql_parse (thd=0x1094278, inBuf=0x2aaaf7502a08 "COMMIT", length=6,
    found_semicolon=0x40ac6fa0) at sql_parse.cc:5381
#18 0x0000000000687a09 in dispatch_command (command=COM_QUERY, thd=0x1094278, packet=0x2aaaf7507ad9 "COMMIT",
    packet_length=7) at sql_parse.cc:907
#19 0x0000000000688d8e in do_command (thd=0x1094278) at sql_parse.cc:669
#20 0x000000000067711e in handle_one_connection (arg=0x109e058) at sql_connect.cc:1091
#21 0x00002aaaab46cfa5 in start_thread () from /lib64/tls/libpthread.so.0
#22 0x00002aaaab1f4cb2 in clone () from /lib64/tls/libc.so.6
[13 Jul 2007 9:19] Hakan Küçükyılmaz
Still hitting assertion

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 1116236096 (LWP 14686)]
0x00002ae972d77fed in raise () from /lib/libpthread.so.0
(gdb) bt
#0  0x00002ae972d77fed in raise () from /lib/libpthread.so.0
#1  0x00000000008e55b4 in Error::debugBreak () at Error.cpp:92
#2  0x00000000008e56c8 in Error::error (string=0xc73288 "assertion failed at line %d in file %s\n")
    at Error.cpp:69
#3  0x00000000008e575f in Error::assertionFailed (fileName=0xc7aa8e "Section.cpp", line=561) at Error.cpp:76
#4  0x0000000000933568 in Section::deleteLine (this=0x2ae97462b990, bdb=0x2ae97440c220, line=14,
    sectionPageNumber=7, transId=0, locatorPage=0x2aaaaac02000, locatorLine=167) at Section.cpp:561
#5  0x0000000000935519 in Section::updateRecord (this=0x2ae97462b990, recordNumber=167, stream=0x0,
    transId=0, earlyWrite=true) at Section.cpp:486
#6  0x00000000008db189 in Dbb::updateRecord (this=0x2ae974627658, section=0x2ae97462b990, recordId=167,
    stream=0x0, transaction=0x0, earlyWrite=true) at Dbb.cpp:325
#7  0x00000000008a16f2 in Table::expungeBlob (this=0x2ae97462d2f8, blob=0x42885b70) at Table.cpp:1723
#8  0x00000000008a719a in Table::garbageCollect (this=0x2ae97462d2f8, leaving=0x2aaab0512ad8,
    staying=0x2aaab0512da8, transaction=0x0, quiet=false) at Table.cpp:1758
#9  0x00000000008a7283 in Table::expungeRecordVersions (this=0x2ae97462d2f8, record=0x2aaab0512da8)
    at Table.cpp:1607
#10 0x000000000091f118 in RecordVersion::scavenge (this=0x2aaab0512da8, oldestActive=33052)
    at RecordVersion.cpp:150
#11 0x000000000091b535 in RecordLeaf::retireRecords (this=0x2ae9746af468, table=0x2ae97462d2f8, base=186,
    age=-1, oldestActiveTransaction=33052) at RecordLeaf.cpp:167
#12 0x0000000000989fbd in RecordGroup::retireRecords (this=0x2aaab04081d0, table=0x2ae97462d2f8, base=1,
    age=-1, oldestActiveTransaction=33052) at RecordGroup.cpp:122
#13 0x0000000000989fbd in RecordGroup::retireRecords (this=0x2aaab0407e80, table=0x2ae97462d2f8, base=0,
    age=-1, oldestActiveTransaction=33052) at RecordGroup.cpp:122
#14 0x000000000089ff66 in Table::retireRecords (this=0x2ae97462d2f8, age=-1, oldestActiveTransaction=33052)
    at Table.cpp:1525
#15 0x00000000008a005a in Table::cleanupRecords (this=0x2ae97462d2f8, threshold=-1,
    oldestActiveTransaction=33052) at Table.cpp:2313
[13 Jul 2007 12:50] Hakan Küçükyılmaz
Latest version

Attachment: falcon_concurrent_blob_updates.tar.gz (application/x-gzip, text), 3.62 KiB.

[16 Jul 2007 14:32] Kevin Lewis
This bug has now morphed from the crash in the Record destructor to an ASSERT in Section::deleteLine().  The other problem has apparently been fixed.  What Hakan neglected to mention with the previous call stack is that he was able to reproduce this crash with on both fluffy and his laptop by running falcon_concurrent_blob_updates repeatedly it in a loop with a different seed each time.  On both machines, the seed value that failed would not cause the assert if run once.  So this bug is a timing issue with multiple threads and difficult to reproduce.  Hakan ran it on his laptop for 1.5 hours and on Fluffy for 11 iterations. 

Coincidentaly, this is the same assert that was being hit with the JVER test reported in Bug#29613.  For that bug, JVer was run serveral times before the assert was hit, and the bug was closed because the problem apparently went away with a patch tested on July 12.

Considering the difficulty in reproducing this bug, and the beta timeframe, we should consider trying to fix this after beta.
[16 Jul 2007 15:02] Hakan Küçükyılmaz
For the record:

It needs one iteration with seed value of 963 on my laptop and eleven itarations starting with seed value of 1 and incrementing by 3 on fluffy.
[25 Jul 2007 10:58] Hakan Küçükyılmaz
Original crash does not happen anymore. New failure will be tracked in new bug report.
[26 Jul 2007 9:22] MC Brown
Internal change only - no documentation needed.