Bug #30463 Falcon: assert hit in Cache::fakePage()
Submitted: 16 Aug 2007 18:10 Modified: 24 Aug 2007 10:27
Reporter: Kevin Lewis Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0.2 OS:Any
Assigned to: Kevin Lewis CPU Architecture:Any

[16 Aug 2007 18:10] Kevin Lewis
Description:
After Bug#30364 was fixed, this bug can be repeated with;
falcon_concurrent_blob_updates -r300 -n5 -s1

The follwing assert is hit;
ASSERT(!bdb->syncObject.isLocked());  
in Cache::fakePage() line 275. 

This was reported at one point in both Bug#30032 and Bug#28174.  Each
time, the bug morphed into some other problem and this ASSERT was lost
sight of, or  not repeated exactly.  Now that many of the other
problems have been fixed, I can reproduce this readily with 5 or more
threads and less than 5 minutes run time.

How to repeat:
falcon_concurrent_blob_updates -r300 -n5 -s1
[16 Aug 2007 21:03] MySQL Verification Team
Thank you for the bug report.

Program received signal SIGILL, Illegal instruction.
[Switching to Thread 1183643968 (LWP 3993)]
0x00002af5f3e96fdd in raise () from /lib64/libpthread.so.0
(gdb) bt full
#0  0x00002af5f3e96fdd in raise () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00000000009393c0 in Error::debugBreak () at Error.cpp:92
No locals.
#2  0x00000000009394d4 in Error::error (string=0xd7a028 "assertion failed at line %d in file %s\n") at Error.cpp:69
        buffer = "assertion failed at line 508 in file Record.cpp\n\000�\214F", '\0' <repeats 44 times>, "��\214F\000\000\000\000��\214F\000\000\000\000��\214F\000\000\000\000��\214F\000\000\000\000\b�\214F", '\0' <repeats 12 times>, "\020�\214F\000\000\000\000�\210\217", '\0' <repeats 29 times>, "\220�\225��*", '\0' <repeats 42 times>, "�s�\001\000\000\000\000\001", '\0' <repeats 14 times>
        args = {{gp_offset = 24, fp_offset = 48, overflow_arg_area = 0x468cd1d0, reg_save_area = 0x468cd100}}
#3  0x000000000093956b in Error::assertionFailed (fileName=0xd7f2e0 "Record.cpp", line=508) at Error.cpp:76
No locals.
#4  0x0000000000970357 in Record::release (this=0x2af5f515d248) at Record.cpp:508
No locals.
#5  0x00000000008e6419 in StorageDatabase::nextRow (this=0x2af5f50a8538, storageTable=0x2af5f4f26190, recordNumber=4220,
    lockForUpdate=true) at StorageDatabase.cpp:278
        exception = (SQLException &) @0x1d673e0: {_vptr.SQLException = 0xd83050}
        storageConnection = (class StorageConnection *) 0x2af5f50eb420
        connection = (class Connection *) 0x2af5f50e44d0
        table = (class Table *) 0x2af5f50d92c8
        transaction = (class Transaction *) 0x2af5f50f3f30
        candidate = (class Record *) 0x2af5f515d248
        record = (class Record *) 0x0
#6  0x00000000008ebc21 in StorageTable::next (this=0x2af5f4f26190, recordNumber=0, lockForUpdate=true) at StorageTable.cpp:126
        ret = 0
#7  0x00000000008df65e in StorageInterface::rnd_next (this=0x1fb3798, buf=0x1fb39d8 "�/�c�") at ha_falcon.cpp:480
        _db_func_ = 0xd3536f "mysql_update"
        _db_file_ = 0xd35146 "sql_update.cc"
        _db_level_ = 5
        _db_framep_ = (char **) 0x468cd340
#8  0x00000000007d0865 in rr_sequential (info=0x468cd560) at records.cc:363
        tmp = 0
#9  0x00000000007686c7 in mysql_update (thd=0x1d613a8, table_list=0x1d3a448, fields=@0x1d62eb0, values=@0x1d63288, conds=0x0,
    order_num=0, order=0x0, limit=18446744073709551615, handle_duplicates=DUP_ERROR, ignore=false) at sql_update.cc:558
        using_limit = false
---Type <return> to continue, or q <return> to quit---
        safe_update = false
        used_key_is_modified = false
        transactional_table = true
        will_batch = false
        can_compare_record = true
        res = 0
        error = 0
        loc_error = 1183635136
        used_index = 64
        dup_key_found = 0
        need_sort = true
        want_privilege = 0
        table_count = 1
        updated = 0
        found = 0
        old_covering_keys = {map = 0}
        table = (TABLE *) 0x1fb3e78
        select = (SQL_SELECT *) 0x0
[22 Aug 2007 19:35] Kevin Lewis
This assert in cache::fakePage is being hit because the page it is trying to fake (allocate) is free, but the PageWriter is still trying to take it off the DirtyPage list.  Since the PageWriter may have been in the position of waiting on a shared lock when the page was freed, it may still have the lock when the page is attempted to be re-used, which is where fakePage is called.  If fakePage would ignore this assert in this condition, the PageWriter with the shared lock will clean up it's references and release the lock, and everything would proceed fine.

So this assert, that the page being allocated is locked, is changed to a condition where, if true, the pageType must be PAGE_free and the lock type must be Shared.

Also, the Shared lock is avoided in probePage when pageType == PAGE_free.
This helps a little, but quite often the PageWriter gets to the addRef before
the page is actually freed. So changing probePage cannot always keep the PageWriter from getting a Shared lock on a freed page. Hence the change to fakePage is also necessary.

Note that this change just avoids the assert.  There never was a risk of corruption.  It was a valid concurrency situation.
[23 Aug 2007 14:59] Hakan Küçükyılmaz
I don't see the assertion anymore. I tried

   falcon_concurrent_blob_updates -r300 -n5 -s1
and
   falcon_concurrent_blob_updates -r900 -n15 -s1
[24 Aug 2007 10:27] MC Brown
A note has been added to the 6.0.1 changelog: 

An assertion could be thrown during high number of concurrent updates of BLOB fields.