Bug #36971 Falcon crash in RecordVersion::fetchVersionRecursive on I_S query
Submitted: 26 May 2008 14:43 Modified: 25 Sep 2008 20:40
Reporter: Philip Stoev Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0-falcon-team OS:Any
Assigned to: Christopher Powers CPU Architecture:Any

[26 May 2008 14:43] Philip Stoev
Description:
When executing a workload containing a I_S query, Falcon crashed as follows:

#0  0x00110416 in __kernel_vsyscall ()
#1  0x00581c78 in pthread_kill () from /lib/libpthread.so.0
#2  0x084497d1 in write_core (sig=11) at stacktrace.c:302
#3  0x082a3c90 in handle_segfault (sig=11) at mysqld.cc:2626
#4  <signal handler called>
#5  0x08554567 in RecordVersion::fetchVersionRecursive (this=0xb04e9f80, trans=0x0) at RecordVersion.cpp:138
#6  0x085546dd in RecordVersion::fetchVersion (this=0xb04e9f80, trans=0x0) at RecordVersion.cpp:127
#7  0x08501d83 in Context::fetchNext (this=0xb708e234, statement=0xb71b72a8) at Context.cpp:119
#8  0x08520d0c in FsbExhaustive::fetch (this=0xb709a1f8, statement=0xb71b72a8) at FsbExhaustive.cpp:57
#9  0x08521349 in FsbSieve::fetch (this=0xb7174750, statement=0xb71b72a8) at FsbSieve.cpp:56
#10 0x0854562a in NSelect::next (this=0xb7130560, statement=0xb71b72a8, resultSet=0xb703e378) at NSelect.cpp:369
#11 0x0855e3c6 in ResultSet::next (this=0xb703e378) at ResultSet.cpp:134
#12 0x084be954 in StorageHandler::getTablesInfo (this=0xb722c028, infoTable=0xa9afdcdc) at StorageHandler.cpp:1064
#13 0x084af501 in NfsPluginHandler::getTablesInfo (thd=0xa9b38438, tables=0x971bc38, cond=0x99631f0) at ha_falcon.cpp:3132
#14 0x08405ab1 in get_schema_tables_result (join=0x995e0c0, executed_place=PROCESSED_BY_JOIN_EXEC) at sql_show.cc:6313
#15 0x0833ce6c in JOIN::exec (this=0x995e0c0) at sql_select.cc:2271
#16 0x08339ce8 in mysql_select (thd=0xa9b38438, rref_pointer_array=0xa9b398f8, tables=0x971b9e8, wild_num=1, fields=@0xa9b39888, conds=0x0, og_num=0,
    order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2684635648, result=0x995e0b0, unit=0xa9b39564, select_lex=0xa9b397f4)
    at sql_select.cc:2929
#17 0x0833ed83 in handle_select (thd=0xa9b38438, lex=0xa9b39508, result=0x995e0b0, setup_tables_done_option=0) at sql_select.cc:289
#18 0x082b3385 in execute_sqlcom_select (thd=0xa9b38438, all_tables=0x971b9e8) at sql_parse.cc:4824
#19 0x082b4dc2 in mysql_execute_command (thd=0xa9b38438) at sql_parse.cc:2018
#20 0x082bdbf5 in mysql_parse (thd=0xa9b38438,
    inBuf=0x971b748 "SELECT * FROM INFORMATION_SCHEMA.TABLES\nINNER JOIN INFORMATION_SCHEMA.FALCON_TABLES ON (TABLES.TABLE_SCHEMA = FALCON_TABLES.SCHEMA_NAME AND TABLES.TABLE_NAME = FALCON_TABLES.TABLE_NAME)", length=185, found_semicolon=0xa9aff260) at sql_parse.cc:5782
#21 0x082be691 in dispatch_command (command=COM_QUERY, thd=0xa9b38438,
    packet=0xa9b39e49 "SELECT * FROM INFORMATION_SCHEMA.TABLES\nINNER JOIN INFORMATION_SCHEMA.FALCON_TABLES ON (TABLES.TABLE_SCHEMA = FALCON_TABLES.SCHEMA_NAME AND TABLES.TABLE_NAME = FALCON_TABLES.TABLE_NAME)", packet_length=185) at sql_parse.cc:1059
#22 0x082bf952 in do_command (thd=0xa9b38438) at sql_parse.cc:732
#23 0x082ad066 in handle_one_connection (arg=0xa9b38438) at sql_connect.cc:1134
#24 0x0057d32f in start_thread () from /lib/libpthread.so.0
#25 0x0049a27e in clone () from /lib/libc.so.6

The crash happens because the second argument of fetchVersionRecursive and fetchVersion is zero. This is because statement->transaction is zero.

How to repeat:
A simplifed test case will hopefully follow shortly.
[26 May 2008 15:33] Philip Stoev
Test case for bug 36971

Attachment: bug36971.zip (application/x-zip-compressed, text), 505 bytes.

[26 May 2008 15:36] Philip Stoev
To reproduce this bug, place the .txt file in mysql-test and the .test file in mysql-test/t. Then run:

$ cd mysql-test
$ ./mysql-test-run.pl --stress --stress-test-file=bug36971_run.txt \
  --stress-test-duration=7200 --stress-threads=10 --skip-ndb

A crash will happen within 1000 test cycles.
[4 Jun 2008 16:33] Kevin Lewis
Chris, we notice that the transaction is zero.  Maybe the system transaction ws busy and the pointer did not get assigned.
[19 Aug 2008 21:14] Philip Stoev
> Philip,
> 
> I'm trying to reproduce this bug on Linux and on Windows, but 
> mysql-test-run --stress does not run the test case on Windows. Any 
> suggestions?
> 
> Chris

Chris,

The test case actually consists of 10 threads running those queries in a tight loop:

SELECT * FROM INFORMATION_SCHEMA.TABLES
INNER JOIN INFORMATION_SCHEMA.FALCON_TABLES ON (TABLES.TABLE_SCHEMA = FALCON_TABLES.SCHEMA_NAME AND TABLES.TABLE_NAME = FALCON_TABLES.TABLE_NAME);

CREATE TABLE i_s1 (table_name CHAR(128), users integer) ENGINE = Falcon;
DROP TABLE i_s1;

CREATE TABLE i_s2 (`TABLESPACE` CHAR(128), users integer) ENGINE = Falcon;
DROP TABLE i_s2;

You can either use mysql-test-run.pl --stress to run those queries from a linux client to a Windows server, or wrap the queries in some sort of script that will fork 10 threads.

What I can do for you is create such a script in Perl. Is that an acceptable solution?
[19 Aug 2008 22:45] Christopher Powers
No need for a perl script. I can run to a remote Windows server.

I did run the test case for 2 hours on Linux--no crash.
[20 Aug 2008 11:45] Philip Stoev
The test case is no longer relevant because it uses INFORMATION_SCHEMA.FALCON_TABLES which does not exist anymore.

If the test case is rewritten to not use INFORMATION_SCHEMA.FALCON_TABLES , that is, to do a join between two instances of INFORMATION_SCHEMA.TABLES, the crash does not occur.

Therefore , an older Falcon tree will be required to reproduce it.
[25 Sep 2008 20:40] Christopher Powers
I believe we do indeed have a problem with the I_S interface (getTableSpaceInfo, etc), however, this crash is unreproducible in the current codebase and the large number of recent fixes puts into question the value of debugging an older build.

Fortunately, the stack trace is similar to the reproducible crash in Bug#35034, "Falcon crash on complex I_S query", so our attention is best directed there.