Bug #44850 Falcon; StorageTable::dataStream can point to freed memory
Submitted: 13 May 2009 16:08 Modified: 26 May 2010 17:47
Reporter: Kevin Lewis Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0.11 OS:Any
Assigned to: Kevin Lewis CPU Architecture:Any
Tags: F_CHILL THAW

[13 May 2009 16:08] Kevin Lewis
Description:
In this run of Pushbuild 2 (kevin.lewis@sun.com, 2009-05-13 02:08:14),
falcon_chill_thaw_compare experienced this crash;
====================

# 12:17:00 #3  0x082c2082 in handle_segfault (sig=11) at mysqld.cc:2710
# 12:17:00 #4  <signal handler called>
# 12:17:00 #5  0x0079120c in memmove () from /lib/libc.so.6
# 12:17:00 #6  0x082b9554 in well_formed_copy_nchars (to_cs=0x8bc5da0, # 12:17:00     to=0xa5b9f45 ' ' <repeats 200 times>..., to_length=255, from_cs=0x8bc5da0, # 12:17:00     from=0xcccccccc <Address 0xcccccccc out of bounds>, from_length=255, # 12:17:00     nchars=255, well_formed_error_pos=0xac94c6b8, # 12:17:00     cannot_convert_error_pos=0xac94c6b4, from_end_pos=0xac94c6b0)
# 12:17:00     at sql_string.cc:1018
# 12:17:00 #7  0x08297e54 in Field_string::store (this=0xa5b9e58, # 12:17:00     from=0xcccccccc <Address 0xcccccccc out of bounds>, length=3435973836, # 12:17:00     cs=0x8bc5da0) at field.cc:6304
# 12:17:00 #8  0x08559059 in StorageInterface::decodeRecord (this=0xa5b9b48, # 12:17:00     buf=0xa5b9f30 "ñ", 'Ì' <repeats 16 times>) at ha_falcon.cpp:3315
# 12:17:00 #9  0x0855ccc2 in StorageInterface::index_next (this=0xa5b9b48, # 12:17:00     buf=0xa5b9f30 "ñ", 'Ì' <repeats 16 times>) at ha_falcon.cpp:1893
# 12:17:00 #10 0x0855c7ee in StorageInterface::multi_range_read_next (this=0xa5b9b48, # 12:17:00     rangeInfo=0xac94c810) at ha_falcon.cpp:1985
# 12:17:00 #11 0x083e746b in QUICK_RANGE_SELECT::get_next (this=0xa5bec88)
# 12:17:00     at opt_range.cc:8579
# 12:17:00 #12 0x083ff151 in rr_quick (info=0xa5a9fcc) at records.cc:322
# 12:17:00 #13 0x083448a7 in join_init_read_record (tab=0xa5a9f88) at sql_select.cc:17086
# 12:17:00 #14 0x08347cf5 in sub_select (join=0xa5ab930, join_tab=0xa5a9f88, # 12:17:00     end_of_records=false) at sql_select.cc:16280
# 12:17:00 #15 0x08354335 in do_select (join=0xa5ab930, fields=0xac4d43a4, table=0x0, # 12:17:00     procedure=0x0) at sql_select.cc:15844
# 12:17:00 #16 0x0836eee4 in JOIN::exec (this=0xa5ab930) at sql_select.cc:2886
# 12:17:00 #17 0x08369a8d in mysql_select (thd=0xac4d2fc0, rref_pointer_array=0xac4d4414, # 12:17:00     tables=0xa5a8d48, wild_num=1, fields=@0xac4d43a4, conds=0xa5a94b8, # 12:17:00     og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, # 12:17:00     select_options=2147764736, result=0xa5a9608, unit=0xac4d4074, # 12:17:00     select_lex=0xac4d4310) at sql_select.cc:3067
# 12:17:00 #18 0x0836f1f4 in handle_select (thd=0xac4d2fc0, lex=0xac4d4018, # 12:17:00     result=0xa5a9608, setup_tables_done_option=0) at sql_select.cc:310
# 12:17:00 #19 0x082d23af in execute_sqlcom_select (thd=0xac4d2fc0, all_tables=0xa5a8d48)
# 12:17:00     at sql_parse.cc:4949
# 12:17:00 #20 0x082d3482 in mysql_execute_command (thd=0xac4d2fc0) at sql_parse.cc:2157
# 12:17:00 #21 0x082dbfbb in mysql_parse (thd=0xac4d2fc0, # 12:17:00     inBuf=0xa5a8900 "SELECT * FROM `table1000_falcon_int_autoinc` WHERE `bigint_key` > 'ododkfkfpcpcnjnjrerehjhjlililqlqojojmxmxxnxnqvqvhkhkehehkrkrhnhnififnlnltvtvtftfaaaaununpqpqywywukukalalaoaogkgkxdxdkxkxreregtgtyuyuh"..., length=324, # 12:17:00     found_semicolon=0xac94de80) at sql_parse.cc:5964
# 12:17:00 #22 0x082dd13d in dispatch_command (command=COM_QUERY, thd=0xac4d2fc0, # 12:17:00     packet=0xac4d8e99 "", packet_length=324) at sql_parse.cc:1049
# 12:17:00 #23 0x082de3ce in do_command (thd=0xac4d2fc0) at sql_parse.cc:731
# 12:17:00 #24 0x082cad73 in handle_one_connection (arg=0xac4d2fc0) at sql_connect.cc:1146
# 12:17:00 #25 0x0089a45b in start_thread () from /lib/libpthread.so.0
# 12:17:00 #26 0x007f1c4e in clone () from /lib/libc.so.6 
===================================

But this call stack, StorageInterface::decodeRecord() is trying to access address 0xcccccccc when using StorageTable::dataStream, which is an EncodedDataStream.  It is set in StorageTable::setRecord(Record* newRecord, bool locked) where it does this;

    // The following is confusing because Record::getEncodeRecord returns pointer to
    // the actual data fields while Record::getEncodedSize return length of the data fields plus
    // a two byte format number.
    
    dataStream.setData((const UCHAR*) record->getEncodedRecord(), record->getEncodedSize() - sizeof(USHORT));

Record::getEncodedRecord returns a pointer directly into the Record::data.record buffer!  This record should NEVER get chilled while StorageTable has an active pointer into it!  But we do not have any explicit protection against that.  There may be some 'just got lucky' reasons why this does not happen.  It might be that the latest changes may have upset that balance.

How to repeat:
runall.pl \ 
--basedir=/export/home/pb2/test/sb_1-512849-1242209795.19/mysql-6.0.12-alpha-linux-i686-test \ 
--mysqld=--loose-innodb-lock-wait-timeout=5 \ 
--mysqld=--table-lock-wait-timeout=5 \ 
--mysqld=--loose-falcon-lock-wait-timeout=5 \ 
--mysqld=--loose-falcon-debug-mask=2 \ 
--mysqld=--skip-safemalloc \ 
--grammar=conf/falcon_chill_thaw.yy \ 
--gendata=conf/falcon_chill_thaw.zz \ 
--mysqld=--falcon-record-chill-threshold=1K \ 
--mysqld=--falcon-index-chill-threshold=1K \ 
--threads=1 \ 
--vardir1=/export/home/pb2/test/sb_1-512849-1242209795.19/mysql-6.0.12-alpha-linux-i686-test/vardirs/chillthaw-vardir \ 
--vardir2=/export/home/pb2/test/sb_1-512849-1242209795.19/mysql-6.0.12-alpha-linux-i686-test/vardirs/default-vardir \ 
--reporters=Deadlock,ErrorLog,Backtrace \ 
--duration=1200 \ 
--mysqld=--log-output=file \ 
--queries=100000 \ 
--engine=falcon

Suggested fix:
A possible solution is to have a 'doNotChill' counter that uses interlocked increment and decrement.  The chill process already protects the 'current' record from chilling by temporarily setting the state to recNoChill.  But it saves off the real state while it does this, which possibly causes other reading threads to not know what this RecordVersion really is.  So the current solution is not a good one.
[14 May 2009 5:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/73998

2705 Kevin Lewis	2009-05-14
      Bug#44850 - Delete the member variable StorageTable::dataStream since it can point into Storagetable::record.data.record, which can be chilled while it exists.
      Instead, use local pointers that are protected by a CycleLock so that even if the data.record  is chilled, that buffer will be accessable until the next cycle.
      Fix Record::deleteData so that it works for chilled records.
      Also, fix Record::getDataMemUsage() so that it does not thaw.  That was an inadvertent change from a recent patch.