Bug #42341 Falcon assertion (key - (UCHAR*) indexNode < 14) in IndexNode::parseNode
Submitted: 26 Jan 2009 9:37 Modified: 15 May 2009 16:15
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0-falcon-team OS:Any
Assigned to: Lars-Erik Bjørk CPU Architecture:Any
Tags: F_LIMIT
Triage: Triaged: D1 (Critical)

[26 Jan 2009 9:37] Philip Stoev
Description:
When executing a simple random workload with 4 threads, Falcon asserted as follows:

[Falcon] Error: assertion (key - (UCHAR*) indexNode < 14) failed at line 109 in file IndexNode.h

#7  0x0000000000f2d77d in Error::assertionFailed (
    text=0x18b210c "key - (UCHAR*) indexNode < 14",
    fileName=0x18b2100 "IndexNode.h", line=109) at Error.cpp:78
#8  0x0000000000f5936c in IndexNode::parseNode (this=0x7f561d3bd890,
    indexNode=0x7f561d3c03d2) at IndexNode.h:109
#9  0x0000000000f5946e in IndexNode::getNext (this=0x7f561d3bd890,
    end=0x7f561d3c040e) at IndexNode.h:132
#10 0x0000000001069d7c in WalkIndex::getNextNode (this=0x7f561d3b8370)
    at WalkIndex.cpp:96
#11 0x0000000001069f1f in WalkIndex::getNext (this=0x7f561d3b8370,
    lockForUpdate=false) at WalkIndex.cpp:65
#12 0x0000000000e53fc3 in StorageDatabase::nextIndexed (this=0x7f561cff9210,
    storageTable=0x7f561d3787e8, indexWalker=0x7f561d3b8370,
    lockForUpdate=false) at StorageDatabase.cpp:485
#13 0x0000000000e63769 in StorageTable::nextIndexed (this=0x7f561d3787e8,
    recordNumber=0, lockForUpdate=false) at StorageTable.cpp:169
#14 0x0000000000e424b2 in StorageInterface::index_next (this=0x427e010,
    buf=0x427e400 "О©ҐО©ҐО©Ґ") at ha_falcon.cpp:1831
#15 0x0000000000e41f05 in StorageInterface::multi_range_read_next (
    this=0x427e010, rangeInfo=0x7f560d31bc70) at ha_falcon.cpp:1961
#16 0x0000000000aa038d in QUICK_RANGE_SELECT::get_next (this=0x432c680)
    at opt_range.cc:8544
#17 0x0000000000ad9f45 in rr_quick (info=0x43385f8) at records.cc:322
#18 0x000000000094946e in join_init_read_record (tab=0x4338570)
    at sql_select.cc:16975
#19 0x0000000000950517 in sub_select (join=0x425a990, join_tab=0x4338570,
    end_of_records=false) at sql_select.cc:16185
#20 0x000000000096ccd9 in do_select (join=0x425a990, fields=0x42605d8,
    table=0x0, procedure=0x0) at sql_select.cc:15749
#21 0x00000000009a01a4 in JOIN::exec (this=0x425a990) at sql_select.cc:2871
#22 0x0000000000994b1f in mysql_select (thd=0x7f56081bb938,
    rref_pointer_array=0x7f56081bd9e8, tables=0x429dd38, wild_num=0,
    fields=@0x7f56081bd908, conds=0x429e510, og_num=1, order=0x0,
    group=0x429e7f0, having=0x0, proc_param=0x0, select_options=2147764736,
    result=0x429e8c0, unit=0x7f56081bd398, select_lex=0x7f56081bd800)
    at sql_select.cc:3052
#23 0x00000000009a0739 in handle_select (thd=0x7f56081bb938,
    lex=0x7f56081bd2f8, result=0x429e8c0, setup_tables_done_option=0)
    at sql_select.cc:314
#24 0x0000000000842c98 in execute_sqlcom_select (thd=0x7f56081bb938,
    all_tables=0x429dd38) at sql_parse.cc:4747
#25 0x0000000000845a79 in mysql_execute_command (thd=0x7f56081bb938)
    at sql_parse.cc:2062
#26 0x00000000008588c5 in mysql_parse (thd=0x7f56081bb938,
    inBuf=0x429d520 "SELECT `enum_utf8` , `enum_key_latin1` , `datetime_not_null` , `int_key_not_null` FROM `table1000_falcon_int_autoinc` WHERE `char_64_key$
    found_semicolon=0x7f560d31def0) at sql_parse.cc:5735
#27 0x000000000085a0e1 in dispatch_command (command=COM_QUERY,
    thd=0x7f56081bb938, packet=0x7f56081c6349 "", packet_length=206)
    at sql_parse.cc:1008
#28 0x000000000085ce2d in do_command (thd=0x7f56081bb938) at sql_parse.cc:691

The offending query is:

SELECT `enum_utf8` , `enum_key_latin1` , `datetime_not_null` , `int_key_not_null` FROM `table1000_falcon_int_autoinc` WHERE `char_64_key_utf8_not_null` >= 'my' GROUP BY `char_64_key_utf8_not_null`   LIMIT 3

How to repeat:
$ perl runall.pl \
  --mysqld=--falcon-page-size=16K \
  --mem \
  --rows=1000 \
  --threads=4 \
  --mask=2610 \
  --queries=1000000 \
  --duration=300 \
  --basedir=/build/bzr/6.0-falcon-team \
  --engine=Falcon \
  --grammar=conf/combinations.yy \
  --gendata=conf/combinations.zz \
  --reporter=ErrorLog,Backtrace \
  --mysqld=--loose-falcon-lock-wait-timeout=1 \
  --mysqld=--log-output=none
[26 Jan 2009 16:05] Kevin Lewis
Sorry, Vlad is busy with recovery bugs. Chris, please take a look at this index issue.
[27 Jan 2009 17:00] Hakan Küçükyılmaz
Philip,

does the assertion also happens with 4k page size?
[27 Jan 2009 18:14] Philip Stoev
Hakan this was only seen with falcon-page-size=16K however so far I do not think we have had a LIMIT bug that is specific to a certain page size. I think that any page size will be affected given the right index sizes and workload.
[30 Jan 2009 15:46] Philip Stoev
In PB2 this is seen frequently with the default page cashe size.
[27 Feb 2009 9:36] Lars-Erik Bjørk
This crash seems to happen because we are trying to use the data behind the last record in the bucket. The reason for this is that the node with the special record number -1, which indicates END_BUCKET, is the only node in the page.

WalkIndex::getNextNode has the following piece of code:

int32 WalkIndex::getNextNode(void)
{
    for (;; first = true)
        {
            if (first)
                {
                first = false;
                recordNumber = node.getNumber();

                if (recordNumber >= 0)
                    return recordNumber;
                else if (recordNumber == END_LEVEL)
                    return -1;
                }
			
            node.getNext(endNodes);

It seems like we fail to check if the recordNumber == END_BUCKET, and further down the call stack from node.getNext(endNodes) in IndexNode::parseNode() we try to parse some garbage data and assert on a consistency check.

Changing the if from

else if (recordNumber == END_LEVEL)

to

else if (recordNumber == END_LEVEL || recordNumber == END_BUCKET)

prevents the crash.

According to Ann, this check ought to be there as we should expect that pages may only contain the END_BUCKET node, and that this should not slip through WalkIndex::getNextNode()
[2 Mar 2009 8:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/67956

3043 lars-erik.bjork@sun.com	2009-03-02
      This is a patch for 
      bug#42341 Falcon assertion (key - (UCHAR*) 
      indexNode < 14) in IndexNode::parseNode
      and
      bug#38130 Falcon assertion in IndexNode::expandKey 
      offset + length <= MAX_PHYSICAL_KEY_LENGTH
      
      
      These crashes happen because we are trying to use the 
      data behind the last node in the bucket, when we 
      are walking the index. The reason for this is that 
      the node with the special record number -1 (which 
      indicates END_BUCKET) is the only node in the page.
      
      WalkIndex::getNextNode has the following piece of code:
      
      int32 WalkIndex::getNextNode(void)
      {
          for (;; first = true)
              {
                  if (first)
                      {
                      first = false;
                      recordNumber = node.getNumber();
      
                      if (recordNumber >= 0)
                          return recordNumber;
                      else if (recordNumber == END_LEVEL)
                          return -1;
                      }
      								
                  node.getNext(endNodes);
      
      We fail to check if recordNumber == END_BUCKET.
      In the case of bug#42341, we try to parse some
      garbage data in IndexNode::parseNode and assert on
      a consistency check.
      In the case of bug#38130, we slip through this
      consistency check, but assert on a second check
      in IndexNode::expandKey 
      
      Changing the if from
      
      else if (recordNumber == END_LEVEL)
      
      to
      
      else if (recordNumber == END_LEVEL || recordNumber == END_BUCKET)
      
      prevents both crashes.
      
      
      modified file 'storage/falcon/WalkIndex.cpp'
      -----------------------------------------------
      Changed the if to prevent reading behind the
      END_BUCKET node.
[2 Mar 2009 14:38] Kevin Lewis
Patch approved
[2 Apr 2009 17:39] Bugs System
Pushed into 6.0.11-alpha (revid:hky@sun.com-20090402144811-yc5kp8g0rjnhz7vy) (version source revid:christopher.powers@sun.com-20090304040340-b4zoglfws0iswqm1) (merge vers: 6.0.11-alpha) (pib:6)
[15 May 2009 16:15] MC Brown
Internal/test fix. No changelog entry required.