Bug #43146 Falcon crash in EncodedDataStream::skip
Submitted: 24 Feb 2009 13:25 Modified: 15 May 2009 13:25
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0-falcon-team, 6.0 bzr OS:Any
Assigned to: Kevin Lewis
Triage: Triaged: D1 (Critical)

[24 Feb 2009 13:25] Philip Stoev
When executing a RQG workload, Falcon crashed as follows:

#4  0x00000000009e27b2 in EncodedDataStream::skip (
    ptr=0x2aaac07872e6 <Address 0x2aaac07872e6 out of bounds>)
    at EncodedDataStream.h:350
#5  0x0000000000a1a626 in Record::getEncodedValue (this=0x13552130,
    fieldId=25, value=0x441afd00) at Record.cpp:662
#6  0x0000000000a1b66c in Record::getRawValue (this=0x13552130, fieldId=25,
    value=0x441afd00) at Record.cpp:370
#7  0x0000000000a1b9df in Record::getValue (this=0x13552130, fieldId=25,
    value=0x441afd00) at Record.cpp:322
#8  0x00000000009f014c in Index::makeKey (this=0x2aaaaae46aa8,
    record=0x13552130, key=0x441b0260) at Index.cpp:636
#9  0x00000000009f02f0 in Index::duplicateKey (this=0x2aaaaae46aa8,
    key=0x441b2d20, record=0x13552130) at Index.cpp:698
#10 0x00000000009f04a4 in Index::garbageCollect (this=0x2aaaaae46aa8,
    leaving=0x13114380, staying=0x13552130, transaction=0x0, quiet=false)
    at Index.cpp:656
#11 0x000000000099c3cd in Table::garbageCollect (this=0x2aaaaadb4308,
    leaving=0x13114380, staying=0x13552130, transaction=0x0, quiet=false)
    at Table.cpp:2146
#12 0x0000000000a1c891 in RecordLeaf::pruneRecords (this=0x2aaaaae4eca0,
    table=0x2aaaaadb4308, base=1, recordScavenge=0x441b5b40)
    at RecordLeaf.cpp:163
#13 0x0000000000a8e737 in RecordGroup::pruneRecords (this=0x2aaaaae4e950,
    table=0x2aaaaadb4308, base=0, recordScavenge=0x441b5b40)
    at RecordGroup.cpp:122
#14 0x0000000000993017 in Table::pruneRecords (this=0x2aaaaadb4308,
    recordScavenge=0x441b5b40) at Table.cpp:1910
#15 0x00000000009ca3fb in Database::pruneRecords (this=0x2aaaaabecbc8,
    recordScavenge=0x441b5b40) at Database.cpp:1873
#16 0x00000000009ca535 in Database::scavengeRecords (this=0x2aaaaabecbc8,
    forced=false) at Database.cpp:1821
#17 0x00000000009ca8da in Database::scavenge (this=0x2aaaaabecbc8,
    forced=false) at Database.cpp:1777
#18 0x00000000009ca9fc in Database::scavengerThreadMain (this=0x2aaaaabecbc8)
    at Database.cpp:1950
#19 0x00000000009caa8b in Database::scavengerThreadMain (
    database=0x2aaaaabecbc8) at Database.cpp:1939
#20 0x00000000009a40f9 in Thread::thread (this=0x2aaaaadf7e78)
    at Thread.cpp:166
#21 0x00000000009a4321 in Thread::thread (parameter=0x2aaaaadf7e78)
    at Thread.cpp:145

(gdb) list
345             static int              init(void);
346             static const UCHAR* decode (const UCHAR *ptr, Value *value, bool copyFlag);
348             inline static const unsigned char* skip (const unsigned char *ptr)
349                     {
350                     int length = lengths[*ptr++]; <<<<<<< HERE
352                     if (length < 0)
353                             {
354                             int count = 0;

(gdb) print ptr
$1 = (const unsigned char *) 0x2aaac07872e6 <Address 0x2aaac07872e6 out of bounds>

in Record::getEncodedValue:

(gdb) list
657                                     {
658                                     const UCHAR *p = (UCHAR*) data.record + vector[highWater];
660                                     while (highWater < index)
661                                             {
662                                             const UCHAR *q = EncodedDataStream::skip(p); <<<<<<<< HERE
663                                             vector[++highWater] = (USHORT) (q - (UCHAR*) data.record);
664                                             p = q;
665                                             }
666                                     }

(gdb) print data
$12 = {record = 0x2aaac0297290 "L"}
(gdb) print data.record
$13 = 0x2aaac0297290 "L"
(gdb) print p
$14 = (const UCHAR *) 0x2aaac07872e6 <Address 0x2aaac07872e6 out of bounds>
(gdb) print q
$15 = (const UCHAR *) 0x2aaac07872e6 <Address 0x2aaac07872e6 out of bounds>

How to repeat:
  perl runall.pl  --mem --rows=1000 --threads=4 --mysqld=--falcon-checkpoint-schedule='1 1 1 1 1'    --basedir=/export/home/pb2/test/sb_1-342525-1235379243.16/mysql-6.0.10-alpha-linux-x86_64-test   --mask=3334   --queries=10000000   --duration=900   --engine=Falcon   --grammar=conf/combinations.yy   --gendata=conf/combinations.zz   --reporters=Deadlock,ErrorLog,Backtrace,Recovery   --mysqld=--loose-falcon-lock-wait-timeout=1   --mysqld=--loose-innodb-lock-wait-timeout=1   --mysqld=--log-output=none   --mysqld=--skip-safemalloc  


   perl runall.pl   --rows=10000 --threads=16 --mysqld=--falcon-record-chill-threshold=1    --basedir=/export/home/pb2/test/sb_1-342525-1235379243.16/mysql-6.0.10-alpha-linux-x86_64-test   --mask=1764   --queries=10000000   --duration=900   --engine=Falcon   --grammar=conf/combinations.yy   --gendata=conf/combinations.zz   --reporters=Deadlock,ErrorLog,Backtrace,Recovery   --mysqld=--loose-falcon-lock-wait-timeout=1   --mysqld=--loose-innodb-lock-wait-timeout=1   --mysqld=--log-output=none   --mysqld=--skip-safemalloc
[24 Feb 2009 15:01] Sveta Smirnova
Thank you for the report.

Crash verified using current 6.0 tree:

#0  0x002ce402 in __kernel_vsyscall ()
#0  0x002ce402 in __kernel_vsyscall ()
#1  0x0046264f in pthread_kill () from /lib/libpthread.so.0
#2  0x0881c8e7 in my_write_core (sig=11) at stacktrace.c:309
#3  0x082bc7a8 in handle_segfault (sig=11) at mysqld.cc:2689
#4  <signal handler called>
#5  0x0035841d in memmove () from /lib/libc.so.6
#6  0x082b9880 in well_formed_copy_nchars (to_cs=0x8ba37e0, to=0xb49b023 "20011210000338", ' ' <repeats 50 times>, "##\f\037A\022", to_length=64, from_cs=0x8ba37e0, from=0x0, from_length=64, nchars=64, 
    well_formed_error_pos=0x976ee908, cannot_convert_error_pos=0x976ee904, from_end_pos=0x976ee900) at sql_string.cc:1018
#7  0x08290df6 in Field_string::store (this=0xb49c360, from=0x0, length=4294967295, cs=0x8ba37e0) at field.cc:6304
#8  0x0855dc28 in StorageInterface::decodeRecord (this=0xb49a860, buf=0xb49ac48 "\001") at ha_falcon.cpp:3231
#9  0x0855e304 in StorageInterface::rnd_next (this=0xb49a860, buf=0xb49ac48 "\001") at ha_falcon.cpp:652
#10 0x084146b9 in ha_partition::rnd_next (this=0xb49a478, buf=0xb49ac48 "\001") at ha_partition.cc:3541
#11 0x083fffc7 in find_all_keys (param=0x976eed54, select=0x0, sort_keys=0xb7401ca0, buffpek_pointers=0x976eeb7c, tempfile=0x976eec68, indexfile=0x0) at filesort.cc:587
#12 0x08401d2a in filesort (thd=0xb450d68, table=0xb476d20, sortorder=0xb4631d8, s_length=1, select=0x0, max_rows=6, sort_positions=true, examined_rows=0x976ef038) at filesort.cc:245
#13 0x083889b3 in mysql_update (thd=0xb450d68, table_list=0xb462a20, fields=@0xb45201c, values=@0xb45224c, conds=0x0, order_num=1, order=0xb4630e8, limit=6, handle_duplicates=DUP_ERROR, ignore=false)
    at sql_update.cc:426
#14 0x082d3d88 in mysql_execute_command (thd=0xb450d68) at sql_parse.cc:3014
#15 0x082da2bd in mysql_parse (thd=0xb450d68, inBuf=0xb4627b0 "UPDATE `table1000_falcon_key_pk_parts_2_int_autoinc` SET `char_64_utf8_not_null` = 'so'  ORDER BY `enum_utf8_not_null` LIMIT 6", 
    length=126, found_semicolon=0x976eff20) at sql_parse.cc:5752
#16 0x082dacff in dispatch_command (command=COM_QUERY, thd=0xb450d68, 
    packet=0xb456731 "UPDATE `table1000_falcon_key_pk_parts_2_int_autoinc` SET `char_64_utf8_not_null` = 'so'  ORDER BY `enum_utf8_not_null` LIMIT 6", packet_length=126) at sql_parse.cc:1009
#17 0x082dc063 in do_command (thd=0xb450d68) at sql_parse.cc:691
#18 0x082c9f0b in handle_one_connection (arg=0xb450d68) at sql_connect.cc:1146
#19 0x0045fbd4 in start_thread () from /lib/libpthread.so.0
#20 0x003b74fe in clone () from /lib/libc.so.6
Using host libthread_db library "/lib/libthread_db.so.1".
Core was generated by `/users/ssmirnova/build/mysql-6.0/libexec/mysqld --no-defaults --basedir=/users/'.
Program terminated with signal 11, Segmentation fault.

warning: svr4_current_sos: Can't read pathname for load map: Input/output error
[24 Feb 2009 15:31] Sveta Smirnova
Philip, second run gave backtrace similar to one you described:

Thread 23 (process 14801):
#0  0x08846527 in my_tosort_unicode (uni_plane=0x8b83820, wc=0xa4648dec) at ctype-utf8.c:1970
#1  0x08846422 in my_strnxfrm_unicode (cs=0x8b84120, dst=0xa464e804 "", dstlen=10809, nweights=1100, src=0xa4648e84 "et50522160546wjfhiuekonxslufrypalaepbppfyxrbwhvdcdlnuybkedmhhqb", srclen=3, flags=0)
    at ctype-utf8.c:2002
#2  0x08556d9c in falcon_strnxfrm (cs=0x8b84120, dst=0xa464e804 "", dstlen=10809, nweights=1100, src=0xa4648e83 "get50522160546wjfhiuekonxslufrypalaepbppfyxrbwhvdcdlnuybkedmhhqb", srclen=3)
    at ha_falcon.cpp:299
#3  0x085e3ab7 in MySQLCollation::makeKey (this=0xb70958a0, value=0xa464e43c, key=0xa464e7f4, partialKey=0, maxKeyLength=1100) at MySQLCollation.cpp:72
#4  0x085d1d41 in Index::makeKey (this=0xb70ad198, field=0xb70a4ca0, value=0xa464e43c, segment=0, indexKey=0xa464e7f4) at Index.cpp:312
#5  0x085d1ebc in Index::makeKey (this=0xb70ad198, count=1, values=0xa464e73c, indexKey=0xa464e7f4) at Index.cpp:363
#6  0x085d2564 in Index::makeKey (this=0xb70ad198, record=0x9eb91a48, key=0xa464e7f4) at Index.cpp:639
#7  0x085d26a0 in Index::duplicateKey (this=0xb70ad198, key=0xa4651288, record=0x9eb91a48) at Index.cpp:698
#8  0x085d281a in Index::garbageCollect (this=0xb70ad198, leaving=0x96891458, staying=0x969a3e48, transaction=0x0, quiet=false) at Index.cpp:656
#9  0x08584233 in Table::garbageCollect (this=0xb73d7020, leaving=0x96891458, staying=0x969a3e48, transaction=0x0, quiet=false) at Table.cpp:2145
#10 0x085fc8f5 in RecordLeaf::pruneRecords (this=0xb70afc08, table=0xb73d7020, base=0, recordScavenge=0xa4653f40) at RecordLeaf.cpp:163
#11 0x086681d7 in RecordGroup::pruneRecords (this=0xb70b19f0, table=0xb73d7020, base=0, recordScavenge=0xa4653f40) at RecordGroup.cpp:122
#12 0x0857cf2b in Table::pruneRecords (this=0xb73d7020, recordScavenge=0xa4653f40) at Table.cpp:1909
#13 0x085ae1a6 in Database::pruneRecords (this=0xb727c640, recordScavenge=0xa4653f40) at Database.cpp:1868
#14 0x085b0dac in Database::scavengeRecords (this=0xb727c640, forced=false) at Database.cpp:1816
#15 0x085b1161 in Database::scavenge (this=0xb727c640, forced=false) at Database.cpp:1776
#16 0x085b1270 in Database::scavengerThreadMain (this=0xb727c640) at Database.cpp:1945
#17 0x085b131b in Database::scavengerThreadMain (database=0xb727c640) at Database.cpp:1934
#18 0x0858b959 in Thread::thread (this=0xb70831d8) at Thread.cpp:166
#19 0x0858be19 in Thread::thread (parameter=0xb70831d8) at Thread.cpp:145
#20 0x0045fbd4 in start_thread () from /lib/libpthread.so.0
#21 0x003b74fe in clone () from /lib/libc.so.6
Current language:  auto; currently c

So probably no new test case needed.
[30 Mar 2009 22:04] Kevin Lewis
Philip reported that this problem cannot be repeated after the changes the following changes were added;
1) Olav created a TransactionState object that outlives the Transaction
2) A CycleManager was added to protect doomed records until after all temporary stack pointers to them have gone away.
[15 May 2009 13:25] MC Brown
Internal/test fix. No changelog entry required.
[15 May 2009 13:40] MC Brown
A note have been added to the 6.0.11 changelog: 

The Falcon CycleManager has been updated, which addresses a number of issues when examining records in various transaction states and their visisbility/isolation in relation to other threads.