Description:
In this run of Pushbuild 2 (kevin.lewis@sun.com, 2009-05-13 02:08:14),
falcon_chill_thaw_compare experienced this crash;
====================
# 12:17:00 #3 0x082c2082 in handle_segfault (sig=11) at mysqld.cc:2710
# 12:17:00 #4 <signal handler called>
# 12:17:00 #5 0x0079120c in memmove () from /lib/libc.so.6
# 12:17:00 #6 0x082b9554 in well_formed_copy_nchars (to_cs=0x8bc5da0, # 12:17:00 to=0xa5b9f45 ' ' <repeats 200 times>..., to_length=255, from_cs=0x8bc5da0, # 12:17:00 from=0xcccccccc <Address 0xcccccccc out of bounds>, from_length=255, # 12:17:00 nchars=255, well_formed_error_pos=0xac94c6b8, # 12:17:00 cannot_convert_error_pos=0xac94c6b4, from_end_pos=0xac94c6b0)
# 12:17:00 at sql_string.cc:1018
# 12:17:00 #7 0x08297e54 in Field_string::store (this=0xa5b9e58, # 12:17:00 from=0xcccccccc <Address 0xcccccccc out of bounds>, length=3435973836, # 12:17:00 cs=0x8bc5da0) at field.cc:6304
# 12:17:00 #8 0x08559059 in StorageInterface::decodeRecord (this=0xa5b9b48, # 12:17:00 buf=0xa5b9f30 "ñ", 'Ì' <repeats 16 times>) at ha_falcon.cpp:3315
# 12:17:00 #9 0x0855ccc2 in StorageInterface::index_next (this=0xa5b9b48, # 12:17:00 buf=0xa5b9f30 "ñ", 'Ì' <repeats 16 times>) at ha_falcon.cpp:1893
# 12:17:00 #10 0x0855c7ee in StorageInterface::multi_range_read_next (this=0xa5b9b48, # 12:17:00 rangeInfo=0xac94c810) at ha_falcon.cpp:1985
# 12:17:00 #11 0x083e746b in QUICK_RANGE_SELECT::get_next (this=0xa5bec88)
# 12:17:00 at opt_range.cc:8579
# 12:17:00 #12 0x083ff151 in rr_quick (info=0xa5a9fcc) at records.cc:322
# 12:17:00 #13 0x083448a7 in join_init_read_record (tab=0xa5a9f88) at sql_select.cc:17086
# 12:17:00 #14 0x08347cf5 in sub_select (join=0xa5ab930, join_tab=0xa5a9f88, # 12:17:00 end_of_records=false) at sql_select.cc:16280
# 12:17:00 #15 0x08354335 in do_select (join=0xa5ab930, fields=0xac4d43a4, table=0x0, # 12:17:00 procedure=0x0) at sql_select.cc:15844
# 12:17:00 #16 0x0836eee4 in JOIN::exec (this=0xa5ab930) at sql_select.cc:2886
# 12:17:00 #17 0x08369a8d in mysql_select (thd=0xac4d2fc0, rref_pointer_array=0xac4d4414, # 12:17:00 tables=0xa5a8d48, wild_num=1, fields=@0xac4d43a4, conds=0xa5a94b8, # 12:17:00 og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, # 12:17:00 select_options=2147764736, result=0xa5a9608, unit=0xac4d4074, # 12:17:00 select_lex=0xac4d4310) at sql_select.cc:3067
# 12:17:00 #18 0x0836f1f4 in handle_select (thd=0xac4d2fc0, lex=0xac4d4018, # 12:17:00 result=0xa5a9608, setup_tables_done_option=0) at sql_select.cc:310
# 12:17:00 #19 0x082d23af in execute_sqlcom_select (thd=0xac4d2fc0, all_tables=0xa5a8d48)
# 12:17:00 at sql_parse.cc:4949
# 12:17:00 #20 0x082d3482 in mysql_execute_command (thd=0xac4d2fc0) at sql_parse.cc:2157
# 12:17:00 #21 0x082dbfbb in mysql_parse (thd=0xac4d2fc0, # 12:17:00 inBuf=0xa5a8900 "SELECT * FROM `table1000_falcon_int_autoinc` WHERE `bigint_key` > 'ododkfkfpcpcnjnjrerehjhjlililqlqojojmxmxxnxnqvqvhkhkehehkrkrhnhnififnlnltvtvtftfaaaaununpqpqywywukukalalaoaogkgkxdxdkxkxreregtgtyuyuh"..., length=324, # 12:17:00 found_semicolon=0xac94de80) at sql_parse.cc:5964
# 12:17:00 #22 0x082dd13d in dispatch_command (command=COM_QUERY, thd=0xac4d2fc0, # 12:17:00 packet=0xac4d8e99 "", packet_length=324) at sql_parse.cc:1049
# 12:17:00 #23 0x082de3ce in do_command (thd=0xac4d2fc0) at sql_parse.cc:731
# 12:17:00 #24 0x082cad73 in handle_one_connection (arg=0xac4d2fc0) at sql_connect.cc:1146
# 12:17:00 #25 0x0089a45b in start_thread () from /lib/libpthread.so.0
# 12:17:00 #26 0x007f1c4e in clone () from /lib/libc.so.6
===================================
But this call stack, StorageInterface::decodeRecord() is trying to access address 0xcccccccc when using StorageTable::dataStream, which is an EncodedDataStream. It is set in StorageTable::setRecord(Record* newRecord, bool locked) where it does this;
// The following is confusing because Record::getEncodeRecord returns pointer to
// the actual data fields while Record::getEncodedSize return length of the data fields plus
// a two byte format number.
dataStream.setData((const UCHAR*) record->getEncodedRecord(), record->getEncodedSize() - sizeof(USHORT));
Record::getEncodedRecord returns a pointer directly into the Record::data.record buffer! This record should NEVER get chilled while StorageTable has an active pointer into it! But we do not have any explicit protection against that. There may be some 'just got lucky' reasons why this does not happen. It might be that the latest changes may have upset that balance.
How to repeat:
runall.pl \
--basedir=/export/home/pb2/test/sb_1-512849-1242209795.19/mysql-6.0.12-alpha-linux-i686-test \
--mysqld=--loose-innodb-lock-wait-timeout=5 \
--mysqld=--table-lock-wait-timeout=5 \
--mysqld=--loose-falcon-lock-wait-timeout=5 \
--mysqld=--loose-falcon-debug-mask=2 \
--mysqld=--skip-safemalloc \
--grammar=conf/falcon_chill_thaw.yy \
--gendata=conf/falcon_chill_thaw.zz \
--mysqld=--falcon-record-chill-threshold=1K \
--mysqld=--falcon-index-chill-threshold=1K \
--threads=1 \
--vardir1=/export/home/pb2/test/sb_1-512849-1242209795.19/mysql-6.0.12-alpha-linux-i686-test/vardirs/chillthaw-vardir \
--vardir2=/export/home/pb2/test/sb_1-512849-1242209795.19/mysql-6.0.12-alpha-linux-i686-test/vardirs/default-vardir \
--reporters=Deadlock,ErrorLog,Backtrace \
--duration=1200 \
--mysqld=--log-output=file \
--queries=100000 \
--engine=falcon
Suggested fix:
A possible solution is to have a 'doNotChill' counter that uses interlocked increment and decrement. The chill process already protects the 'current' record from chilling by temporarily setting the state to recNoChill. But it saves off the real state while it does this, which possibly causes other reading threads to not know what this RecordVersion really is. So the current solution is not a good one.