Bug #52540 Crash in JOIN_CACHE::set_match_flag_if_none () at sql_join_cache.cc:1883
Submitted: 1 Apr 2010 20:27 Modified: 21 Nov 2010 19:16
Reporter: Patrick Crews Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:mysql-6.0-codebase-bugfixing OS:Any
Assigned to: Guilhem Bichot CPU Architecture:Any
Tags: assertion, crashing bug, optimizer_join_cache_level, optimizer_switch

[1 Apr 2010 20:27] Patrick Crews
Description:
Crashing bug in 6.0-codebase-bugfixing tree with optimizer_join_cache_level=4 (not repeatable with other values):

Query:
Using a mix of Innodb and MyISAM tables
SELECT table2 .`col_varchar_1024_utf8_key`  
FROM H table2  LEFT  JOIN F table3  JOIN J table4  ON table3 .`col_int_key`  = table4 .`pk`  LEFT  JOIN O table5  ON table5 .`col_varchar_10_latin1_key`  ON table2 .`col_varchar_10_latin1_key`  > table3 .`col_varchar_10_latin1_key`   ;

Crash output:
Thread 1 (Thread 19830):
#0  0x002fa422 in __kernel_vsyscall ()
#1  0x004ace93 in __pthread_kill (threadid=2929657712, signo=6) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:64
#2  0x08d1f00f in my_write_core (sig=6) at stacktrace.c:328
#3  0x08422222 in handle_segfault (sig=6) at mysqld.cc:2843
#4  <signal handler called>
#5  0x002fa422 in __kernel_vsyscall ()
#6  0x0013a4d1 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#7  0x0013d932 in *__GI_abort () at abort.c:92
#8  0x00133648 in *__GI___assert_fail (assertion=0x9052310 "cache", file=0x9052270 "sql_join_cache.cc", line=1883, 
    function=0x9052760 "bool JOIN_CACHE::set_match_flag_if_none(JOIN_TAB*, uchar*)") at assert.c:81
#9  0x084c8b8f in JOIN_CACHE::set_match_flag_if_none (this=0xab89228, first_inner=0xab831d4, rec_ptr=0xae97e022 "") at sql_join_cache.cc:1883
#10 0x084c9314 in JOIN_CACHE::join_null_complements (this=0xab89228, skip_last=false) at sql_join_cache.cc:2079
#11 0x084c7fc6 in JOIN_CACHE::join_records (this=0xab89228, skip_last=false) at sql_join_cache.cc:1661
#12 0x0858303f in sub_select_cache (join=0xab83990, join_tab=0xab8354c, end_of_records=true) at sql_select.cc:16405
#13 0x085834ad in sub_select (join=0xab83990, join_tab=0xab83390, end_of_records=true) at sql_select.cc:16568
#14 0x085834ad in sub_select (join=0xab83990, join_tab=0xab831d4, end_of_records=true) at sql_select.cc:16568
#15 0x085830c3 in sub_select_cache (join=0xab83990, join_tab=0xab831d4, end_of_records=true) at sql_select.cc:16407
#16 0x085834ad in sub_select (join=0xab83990, join_tab=0xab83018, end_of_records=true) at sql_select.cc:16568
#17 0x08582127 in do_select (join=0xab83990, fields=0xa9d0900, table=0x0, procedure=0x0) at sql_select.cc:16159
#18 0x085483c7 in JOIN::exec (this=0xab83990) at sql_select.cc:2987
#19 0x085490d1 in mysql_select (thd=0xa9cf360, rref_pointer_array=0xa9d0970, tables=0xa9018f8, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, 
    having=0x0, proc_param=0x0, select_options=2147748608, result=0xa903500, unit=0xa9d03ac, select_lex=0xa9d086c) at sql_select.cc:3184
#20 0x0853944d in handle_select (thd=0xa9cf360, lex=0xa9d0350, result=0xa903500, setup_tables_done_option=0) at sql_select.cc:304
#21 0x08459ca5 in execute_sqlcom_select (thd=0xa9cf360, all_tables=0xa9018f8) at sql_parse.cc:5032
#22 0x08448b0c in mysql_execute_command (thd=0xa9cf360) at sql_parse.cc:2295
#23 0x0845dd5a in mysql_parse (thd=0xa9cf360, 
    inBuf=0xa901538 "SELECT table2 .`col_varchar_1024_utf8_key`  \nFROM H table2  LEFT  JOIN F table3  JOIN J table4  ON table3 .`col_int_key`  = table4 .`pk`  LEFT  JOIN O table5  ON table5 .`col_varchar_10_latin1_key`  O"..., length=276, found_semicolon=0xae9ef910) at sql_parse.cc:6060
#24 0x084439ad in dispatch_command (command=COM_QUERY, thd=0xa9cf360, packet=0xa99d841 "", packet_length=279) at sql_parse.cc:1091
#25 0x084421c7 in do_command (thd=0xa9cf360) at sql_parse.cc:775
#26 0x0843e551 in do_handle_one_connection (thd_arg=0xa9cf360) at sql_connect.cc:1173
#27 0x0843e238 in handle_one_connection (arg=0xa9cf360) at sql_connect.cc:1113
#28 0x08e7a303 in pfs_spawn_thread (arg=0xaa1a6a8) at pfs.cc:1011
#29 0x004a780e in start_thread (arg=0xae9f0770) at pthread_create.c:300
#30 0x001dc8de in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

How to repeat:
MTR test case:
Change optimizer_join_cache_level to other values and observe the crash going away.

Test case attached as separate file due to space limitations, will include original + simplified queries
[1 Apr 2010 20:30] Patrick Crews
Full crash output

Attachment: bug52540_crash_output.txt (text/plain), 12.34 KiB.

[1 Apr 2010 20:30] Patrick Crews
Full MTR test case with original + simplified queries (simplified=query0)

Attachment: bug52540_test.txt (text/plain), 20.13 KiB.

[1 Apr 2010 22:12] Sveta Smirnova
Thank you for the report.

Verified as described.
[3 May 2010 8:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/107162

3846 Guilhem Bichot	2010-05-03
      Fix for BUG#52540 "Crash in JOIN_CACHE::set_match_flag_if_none () at sql_join_cache.cc:1883"
     @ mysql-test/t/join_cache.test
        minimal test case for bug
     @ sql/sql_select.cc
        As said in a comment in this file:
        "For a nested outer join/semi-join, currently, we either use join buffers for
        all inner tables or for none of them.".
        This is enforced by check_join_cache_usage():
          for (JOIN_TAB *first_inner= tab->first_inner; first_inner;
               first_inner= first_inner->first_upper)
          {
            if (first_inner != tab && !first_inner->use_join_cache)
              goto no_join_cache;
          }
        i.e. if the first table in this join nest doesn't use join cache, or the first table of the upper
        containing join nest doesn't, go to label "no_join_cache" and thus:
        1) don't use join cache for the current table 2) disable join cache for all previous tables of this nest
        and upper containing nests (via revise_cache_usage()).
        In the bug's scenario, before the fix, EXPLAIN showed this plan:
        t2,  t1(with join buffering),  t3,  t4(with join buffering).
        * join buffering was decided for t1
        * it was rejected for t3, because t3 is accessed with EQ_REF (join buffering with EQ_REF requires BKA,
        which we don't have at levels <=4): check_join_cache_usage() did "return 0", forgetting to disable
        join buffering for previous tables of the join nest i.e. t1
        * join buffering was decided for t4, and allowed as it was observed that the first table of
        the upper nest (t1) had join buffering (so it was believed that all tables in between had
        join buffering).
        * at execution time, JOIN_CACHE:set_match_flag_if_none() would walk back the list of join
        buffers from t4 to t1:
          while (cache->join_tab != first_inner)
          {
            cache= cache->prev_cache;
            DBUG_ASSERT(cache);
        But t4 had prev_cache==0 (as previous table t1 wasn't doing join buffering), hence the
        assertion failure. This loop above assumed again that all previous tables should be doing
        join buffering.
        The fix is to make sure that when join buffering is rejected because of EQ_REF, we
        disable join buffering for previous tables, so that assumptions are true.
[10 May 2010 11:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/107834

3873 Guilhem Bichot	2010-05-10
      Fix for BUG#52540 "Crash in JOIN_CACHE::set_match_flag_if_none () at sql_join_cache.cc:1883"
     @ mysql-test/t/join_cache.test
        minimal test case for bug
     @ sql/sql_select.cc
        As said in a comment in this file:
        "For a nested outer join/semi-join, currently, we either use join
        buffers for all inner tables or for none of them.".
        This is enforced by check_join_cache_usage():
          for (JOIN_TAB *first_inner= tab->first_inner; first_inner;
               first_inner= first_inner->first_upper)
          {
            if (first_inner != tab && !first_inner->use_join_cache)
              goto no_join_cache;
          }
        i.e. if the first table in this join nest doesn't use join cache, or
        the first table of the upper containing join nest doesn't, go to label
        "no_join_cache" and thus:
        1) don't use join cache for the current table 2) disable join cache for
        all previous tables of this nest and upper containing nests (via
        revise_cache_usage()).
        In the bug's scenario, before the fix, EXPLAIN showed this plan:
        t2,  t1(with join buffering),  t3,  t4(with join buffering).
        * join buffering was decided for t1
        * it was rejected for t3, because t3 is accessed with EQ_REF (join
        buffering with EQ_REF requires BKA, which we don't have at levels <=4):
        check_join_cache_usage() did "return 0", forgetting to disable join
        buffering for previous tables of the join nest i.e. t1
        * join buffering was decided for t4, and allowed as it was observed
        that the first table of the upper nest (t1) had join buffering (so it
        was believed that all tables in between had join buffering).
        * at execution time, JOIN_CACHE:set_match_flag_if_none() would walk
        back the list of join buffers from t4 to t1:
          while (cache->join_tab != first_inner)
          {
            cache= cache->prev_cache;
            DBUG_ASSERT(cache);
        But t4 had prev_cache==0 (as previous table t1 wasn't doing join
        buffering), hence the assertion failure. This loop above assumed again
        that all previous tables should be doing join buffering.
        The fix is to make sure that when join buffering is rejected because of
        EQ_REF, we disable join buffering for previous tables, so that
        assumptions are true.
[10 May 2010 11:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/107835

3846 Guilhem Bichot	2010-05-10
      Fix for BUG#52540 "Crash in JOIN_CACHE::set_match_flag_if_none () at sql_join_cache.cc:1883"
     @ mysql-test/t/join_cache.test
        minimal test case for bug
     @ sql/sql_select.cc
        As said in a comment in this file:
        "For a nested outer join/semi-join, currently, we either use join
        buffers for all inner tables or for none of them.".
        This is enforced by check_join_cache_usage():
          for (JOIN_TAB *first_inner= tab->first_inner; first_inner;
               first_inner= first_inner->first_upper)
          {
            if (first_inner != tab && !first_inner->use_join_cache)
              goto no_join_cache;
          }
        i.e. if the first table in this join nest doesn't use join cache, or
        the first table of the upper containing join nest doesn't, go to label
        "no_join_cache" and thus:
        1) don't use join cache for the current table 2) disable join cache for
        all previous tables of this nest and upper containing nests (via
        revise_cache_usage()).
        In the bug's scenario, before the fix, EXPLAIN showed this plan:
        t2,  t1(with join buffering),  t3,  t4(with join buffering).
        * join buffering was decided for t1
        * it was rejected for t3, because t3 is accessed with EQ_REF (join
        buffering with EQ_REF requires BKA, which we don't have at levels <=4):
        check_join_cache_usage() did "return 0", forgetting to disable join
        buffering for previous tables of the join nest i.e. t1
        * join buffering was decided for t4, and allowed as it was observed
        that the first table of the upper nest (t1) had join buffering (so it
        was believed that all tables in between had join buffering).
        * at execution time, JOIN_CACHE:set_match_flag_if_none() would walk
        back the list of join buffers from t4 to t1:
          while (cache->join_tab != first_inner)
          {
            cache= cache->prev_cache;
            DBUG_ASSERT(cache);
        But t4 had prev_cache==0 (as previous table t1 wasn't doing join
        buffering), hence the assertion failure. This loop above assumed again
        that all previous tables should be doing join buffering.
        The fix is to make sure that when join buffering is rejected because of
        EQ_REF, we disable join buffering for previous tables, so that
        assumptions are true.
[11 May 2010 8:25] Guilhem Bichot
queued to 6.0-codebase-bugfixing
[14 May 2010 15:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/108366

3160 Guilhem Bichot	2010-05-10
      Fix for BUG#52540 "Crash in JOIN_CACHE::set_match_flag_if_none () at sql_join_cache.cc:1883"
      (Backporting of guilhem@mysql.com-20100510092844-9fjd9g9ivk7258oj )
     @ mysql-test/t/join_cache.test
        minimal test case for bug
     @ sql/sql_select.cc
        As said in a comment in this file:
        "For a nested outer join/semi-join, currently, we either use join
        buffers for all inner tables or for none of them.".
        This is enforced by check_join_cache_usage():
          for (JOIN_TAB *first_inner= tab->first_inner; first_inner;
               first_inner= first_inner->first_upper)
          {
            if (first_inner != tab && !first_inner->use_join_cache)
              goto no_join_cache;
          }
        i.e. if the first table in this join nest doesn't use join cache, or
        the first table of the upper containing join nest doesn't, go to label
        "no_join_cache" and thus:
        1) don't use join cache for the current table 2) disable join cache for
        all previous tables of this nest and upper containing nests (via
        revise_cache_usage()).
        In the bug's scenario, before the fix, EXPLAIN showed this plan:
        t2,  t1(with join buffering),  t3,  t4(with join buffering).
        * join buffering was decided for t1
        * it was rejected for t3, because t3 is accessed with EQ_REF (join
        buffering with EQ_REF requires BKA, which we don't have at levels <=4):
        check_join_cache_usage() did "return 0", forgetting to disable join
        buffering for previous tables of the join nest i.e. t1
        * join buffering was decided for t4, and allowed as it was observed
        that the first table of the upper nest (t1) had join buffering (so it
        was believed that all tables in between had join buffering).
        * at execution time, JOIN_CACHE:set_match_flag_if_none() would walk
        back the list of join buffers from t4 to t1:
          while (cache->join_tab != first_inner)
          {
            cache= cache->prev_cache;
            DBUG_ASSERT(cache);
        But t4 had prev_cache==0 (as previous table t1 wasn't doing join
        buffering), hence the assertion failure. This loop above assumed again
        that all previous tables should be doing join buffering.
        The fix is to make sure that when join buffering is rejected because of
        EQ_REF, we disable join buffering for previous tables, so that
        assumptions are true.
[14 May 2010 18:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/108376

3161 Guilhem Bichot	2010-05-10
      Fix for BUG#52540 "Crash in JOIN_CACHE::set_match_flag_if_none () at sql_join_cache.cc:1883"
      (Backporting of guilhem@mysql.com-20100510092844-9fjd9g9ivk7258oj )
     @ mysql-test/t/join_cache.test
        minimal test case for bug
     @ sql/sql_select.cc
        As said in a comment in this file:
        "For a nested outer join/semi-join, currently, we either use join
        buffers for all inner tables or for none of them.".
        This is enforced by check_join_cache_usage():
          for (JOIN_TAB *first_inner= tab->first_inner; first_inner;
               first_inner= first_inner->first_upper)
          {
            if (first_inner != tab && !first_inner->use_join_cache)
              goto no_join_cache;
          }
        i.e. if the first table in this join nest doesn't use join cache, or
        the first table of the upper containing join nest doesn't, go to label
        "no_join_cache" and thus:
        1) don't use join cache for the current table 2) disable join cache for
        all previous tables of this nest and upper containing nests (via
        revise_cache_usage()).
        In the bug's scenario, before the fix, EXPLAIN showed this plan:
        t2,  t1(with join buffering),  t3,  t4(with join buffering).
        * join buffering was decided for t1
        * it was rejected for t3, because t3 is accessed with EQ_REF (join
        buffering with EQ_REF requires BKA, which we don't have at levels <=4):
        check_join_cache_usage() did "return 0", forgetting to disable join
        buffering for previous tables of the join nest i.e. t1
        * join buffering was decided for t4, and allowed as it was observed
        that the first table of the upper nest (t1) had join buffering (so it
        was believed that all tables in between had join buffering).
        * at execution time, JOIN_CACHE:set_match_flag_if_none() would walk
        back the list of join buffers from t4 to t1:
          while (cache->join_tab != first_inner)
          {
            cache= cache->prev_cache;
            DBUG_ASSERT(cache);
        But t4 had prev_cache==0 (as previous table t1 wasn't doing join
        buffering), hence the assertion failure. This loop above assumed again
        that all previous tables should be doing join buffering.
        The fix is to make sure that when join buffering is rejected because of
        EQ_REF, we disable join buffering for previous tables, so that
        assumptions are true.
[20 May 2010 10:04] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100520100225-oe4iuu5kuzsx0knq) (version source revid:alik@sun.com-20100520100057-rmn5y3o3ij726bm7) (merge vers: 6.0.14-alpha) (pib:16)
[20 May 2010 20:58] Guilhem Bichot
"for values of optimizer_join_cache_level smaller or equal to 4, MySQL could crash in queries with outer joins and indexed columns".
[20 May 2010 22:46] Paul DuBois
Noted in 6.0.14 changelog.
[16 Aug 2010 6:33] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:17] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[21 Nov 2010 19:16] Paul DuBois
Noted in 5.6.1 changelog.
[23 Nov 2010 2:16] Paul DuBois
Correction: No 5.6.1 changelog entry. Bug does not appear in any released 5.6.x version.