Bug #42902 Falcon 100% CPU loop in Index::scanIndex() after recovery
Submitted: 16 Feb 2009 19:49 Modified: 5 Mar 2009 13:20
Reporter: Philip Stoev Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version: OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: F_RECOVERY

[16 Feb 2009 19:49] Philip Stoev
Description:
After recovering from a kill -9 after a simple transactional workload, Falcon entered the following endless loop when executing a SELECT query.

420                     for (; node.node < end; node.getNext(end))
422                             if (highKey)
456                             int number = node.getNumber();
458                             if (number < 0)
461                             bitmap->set (number);
420                     for (; node.node < end; node.getNext(end))

The backtrace is:

#0  IndexRootPage::scanIndex (dbb=0x7f06083372f8, indexId=1, rootPage=7, lowKey=0x7f060835c580, highKey=0x0, searchFlags=1, transId=0, bitmap=0x7f06037c52c8)
    at IndexRootPage.cpp:422
#1  0x00000000009f5dec in Index::scanIndex (this=0x7f06037f7750, lowKey=0x7f060835c580, highKey=0x0, searchFlags=1, transaction=0x7f060833c0d8,
    bitmap=0x7f06037c52c8) at Index.cpp:496
#2  0x000000000097726b in StorageDatabase::indexScan (this=0x7f0603780208, index=0x7f06037f7750, lower=0x7f060835c578, upper=0x0, searchFlags=1,
    storageConnection=0x7f06037f6d00, bitmap=0x7f06037c52c8) at StorageDatabase.cpp:782
#3  0x000000000097f524 in StorageTable::indexScan (this=0x7f060835c528, indexOrder=0) at StorageTable.cpp:259
#4  0x000000000096df07 in StorageInterface::scanRange (this=0x3fcda20, start_key=0x3fcdb00, end_key=0x0, eqRange=false) at ha_falcon.cpp:1792
#5  0x000000000096e133 in StorageInterface::fillMrrBitmap (this=0x3fcda20) at ha_falcon.cpp:1955
#6  0x000000000096f1ef in StorageInterface::multi_range_read_init (this=0x3fcda20, seq=0x7f0608171cb0, seq_init_param=0x40f4bb0, n_ranges=2, mode=4,
    buf=0x7f0608171cf0) at ha_falcon.cpp:1933
#7  0x00000000007f820c in QUICK_RANGE_SELECT::reset (this=0x40f4bb0) at opt_range.cc:8456
#8  0x0000000000748984 in join_init_read_record (tab=0x417bb80) at sql_select.cc:17010
#9  0x000000000074c237 in sub_select (join=0x4175dc0, join_tab=0x417bb80, end_of_records=false) at sql_select.cc:16224
#10 0x0000000000759bb1 in do_select (join=0x4175dc0, fields=0x3fccd30, table=0x0, procedure=0x0) at sql_select.cc:15788
#11 0x0000000000773bbc in JOIN::exec (this=0x4175dc0) at sql_select.cc:2881
#12 0x000000000076e732 in mysql_select (thd=0x3fcae88, rref_pointer_array=0x3fcce10, tables=0x41682f0, wild_num=1, fields=@0x3fccd30, conds=0x4168e48,
    og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147764736, result=0x4168fc0, unit=0x3fcc7c0, select_lex=0x3fccc28)
    at sql_select.cc:3062
#13 0x0000000000773edb in handle_select (thd=0x3fcae88, lex=0x3fcc720, result=0x4168fc0, setup_tables_done_option=0) at sql_select.cc:314
#14 0x00000000006d1183 in execute_sqlcom_select (thd=0x3fcae88, all_tables=0x41682f0) at sql_parse.cc:4757
#15 0x00000000006d2736 in mysql_execute_command (thd=0x3fcae88) at sql_parse.cc:2063
#16 0x00000000006daa99 in mysql_parse (thd=0x3fcae88,
    inBuf=0x4167fb0 "SELECT * FROM `test`.`table10_falcon_int_autoinc` WHERE `int_key` >= -9223372036854775808 OR `int_key` IS NULL", length=110,
    found_semicolon=0x7f0608173f00) at sql_parse.cc:5752
#17 0x00000000006db684 in dispatch_command (command=COM_QUERY, thd=0x3fcae88, packet=0x4158199 "", packet_length=110) at sql_parse.cc:1009
#18 0x00000000006dcbad in do_command (thd=0x3fcae88) at sql_parse.cc:691
#19 0x00000000006cab2d in handle_one_connection (arg=0x3fcae88) at sql_connect.cc:1146
#20 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#21 0x000000315a4e627d in clone () from /lib64/libc.so.6

The problematic query is:

SELECT * FROM `test`.`table10_falcon_int_autoinc` WHERE `int_key` >= -9223372036854775808 OR `int_key` IS NULL

The table contains about 14K records, and the query has been running for over 1 hour with no end in sight.

How to repeat:
The tablespace will be uploaded shortly.
[16 Feb 2009 19:57] Philip Stoev
The tablespace is located at:

http://fedora10.dyndns.org/bug42902-2.zip

The command line that produced the corrupted tablespace is:

$ perl runall.pl  \
  --mem \
  --rows=1000 \
  --threads=4 \
  --mysqld=--falcon-page-cache-size=1K \
  --basedir=/build/bzr/6.0-falcon-team \
  --mask=2515 \
  --queries=10000000 \
  --duration=900 \
  --engine=Falcon \
  --grammar=conf/transactions-flat.yy \
  --gendata=conf/transactions.zz \
  --reporters=Deadlock,ErrorLog,Backtrace,Recovery \
  --mysqld=--loose-falcon-lock-wait-timeout=1 \
  --mysqld=--loose-innodb-lock-wait-timeout=1 \
  --mysqld=--log-output=none \
  --mysqld=--skip-safemalloc   

Which is the same that caused bug #42874 . Note the small falcon-page-cache value, which gets reset to a default of 2Mb.
[5 Mar 2009 13:20] Vladislav Vaintroub
Primary bug for index corruption is Bug #42344