Bug #54023 crash in fix_semijoin_strategies_for_picked_join_order with semijoin=off
Submitted: 27 May 2010 9:54 Modified: 27 May 2010 14:13
Reporter: John Embretsen Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:5.5.5_Celosia, 5.6.99_Dahlia, bzr_mysql-next-mr-opt-backporting OS:Any
Assigned to: CPU Architecture:Any

[27 May 2010 9:54] John Embretsen
Description:
Server crash observed while running the Random Query Generator's outer_join grammar. Crashing query is:

SELECT    
    MIN(  table2 . `pk` ) AS field1 , 
    MIN(  table1 . `pk` ) AS field2 , 
    table4 . `col_int_key` AS field3 
FROM    Z AS table1  
  LEFT  JOIN    I AS table2  
  LEFT  JOIN   Q AS table3  
  LEFT  JOIN M AS table4 ON  table3 . `col_varchar_1024_utf8` =  table4 . `col_varchar_1024_latin1`   
  LEFT  JOIN  DD AS table5  
  LEFT  JOIN FF AS table6 ON  table5 . `col_varchar_10_utf8_key` =  table6 . `col_varchar_10_utf8_key`  ON  table4 . `col_int_key` =  table5 . `col_int_key`  ON  table2 . `pk` =  table6 . `col_int_key`   
  LEFT  JOIN  D AS table7  
  LEFT  JOIN I AS table8 ON  table7 . `col_varchar_10_utf8_key` =  table8 . `col_varchar_10_latin1_key`  ON  table5 . `col_varchar_10_latin1` =  table8 . `col_varchar_1024_latin1`   
  LEFT  JOIN  H AS table9  
  LEFT  JOIN G AS table10 ON  table9 . `col_varchar_1024_utf8` =  table10 . `col_varchar_10_utf8`  ON  table4 . `col_int_key` =  table10 . `pk`  ON  table1 . `pk` =  table5 . `pk`   
  LEFT  JOIN EE AS table11 ON  table2 . `pk` =  table11 . `pk`   
  LEFT  JOIN J AS table12 ON  table11 . `col_int_key` =  table12 . `pk`  
WHERE table1 . `col_int` IS  NULL  
GROUP BY field3  
ORDER BY field1 ASC , 
         field3 DESC , 
         field3
';

Backtrace from mysql-next-mr-opt-backporting as of 2010-05-25 is:

---- called from signal handler with signal 11 (SIGSEGV) ------
[8] fix_semijoin_strategies_for_picked_join_order(join = 0x40db920), line 8125 in "sql_select.cc"
[9] get_best_combination(join = 0x40db920), line 8332 in "sql_select.cc"
[10] make_join_statistics(join = 0x40db920, tables_arg = 0x398ee90, conds = 0x406c858, keyuse_array = 0x40e1890), line 4654 in "sql_select.cc"
[11] JOIN::optimize(this = 0x40db920), line 1715 in "sql_select.cc"
[12] mysql_select(thd = 0x398aee0, rref_pointer_array = 0x398cf20, tables = 0x398ee90, wild_num = 0, fields = CLASS, conds = 0x406c858, og_num = 4U, order = 0x406cb38, group = 0x406ca00, having = (nil), proc_param = (nil), select_options = 2147748608ULL, result = 0x406ced0, unit = 0x398c718, select_lex = 0x398cd38), line 3237 in "sql_select.cc"
[13] handle_select(thd = 0x398aee0, lex = 0x398c670, result = 0x406ced0, setup_tables_done_option = 0), line 301 in "sql_select.cc"
[14] execute_sqlcom_select(thd = 0x398aee0, all_tables = 0x398ee90), line 4779 in "sql_parse.cc"
[15] mysql_execute_command(thd = 0x398aee0), line 2260 in "sql_parse.cc"
[16] mysql_parse(thd = 0x398aee0, inBuf = 0x398def0 "SELECT    MIN(  table2 . `pk` ) AS field1 , MIN(  table1 . `pk` ) AS field2 , table4 . `col_int_key` AS field3 FROM    Z AS table1  LEFT  JOIN    I AS table2  LEFT  JOIN   Q AS table3  LEFT  JOIN M AS table4 ON  table3 . `col_varchar_1024_utf8` =  table4 . `col_varchar_1024_latin1`   LEFT  JOIN  DD AS table5  LEFT  JOIN FF AS table6 ON  table5 . `col_varchar_10_utf8_key` =  table6 . `col_varchar_10_utf8_key`  ON  table4 . `col_int_key` =  table5 . `col_int_key`  ON  table2 . `pk` =  table6 . `col_int_key`  " ..., length = 1151U, parser_state = 0xfffffd7fff03cab8), line 5808 in "sql_parse.cc"
[17] dispatch_command(command = COM_QUERY, thd = 0x398aee0, packet = 0x3991f01 "  SELECT    MIN(  table2 . `pk` ) AS field1 , MIN(  table1 . `pk` ) AS field2 , table4 . `col_int_key` AS field3 FROM    Z AS table1  LEFT  JOIN    I AS table2  LEFT  JOIN   Q AS table3  LEFT  JOIN M AS table4 ON  table3 . `col_varchar_1024_utf8` =  table4 . `col_varchar_1024_latin1`   LEFT  JOIN  DD AS table5  LEFT  JOIN FF AS table6 ON  table5 . `col_varchar_10_utf8_key` =  table6 . `col_varchar_10_utf8_key`  ON  table4 . `col_int_key` =  table5 . `col_int_key`  ON  table2 . `pk` =  table6 . `col_int_key`" ..., packet_length = 1155U), line 1085 in "sql_parse.cc"
[18] do_command(thd = 0x398aee0), line 771 in "sql_parse.cc"
[19] do_handle_one_connection(thd_arg = 0x398aee0), line 1188 in "sql_connect.cc"
[20] handle_one_connection(arg = 0x398aee0), line 1127 in "sql_connect.cc"

sql_select.cc lines 8125 to 8129:

    if ((handled_tabs & s->table->map) || pos->sj_strategy == SJ_OPT_NONE)
    {
      remaining_tables |= s->table->map;
      continue;
    }

How to repeat:
Use the RQG (aka. randgen) to generate queries and start the server.

bzr branch lp:randgen
cd randgen

perl ./runall.pl \
--threads=1 \
--grammar=conf/optimizer/outer_join.yy \
--gendata=conf/optimizer/outer_join.zz \
--mysqld1=--optimizer_search_depth=10 \
--queries=100000 \
--duration=1200 \
--basedir=/path/to/mysql/binaries \
--Reporter=Shutdown,Backtrace

The use of --optimizer_search_depth=10 is to avoid extremely long processing time for some large JOIN queries that this grammar produces.

You should observe a crash after a couple of minutes of running time.

Note that the crash occurs even if optimizer_switch is set to include semijoin=off.

Unfortunately, efforts to produce a simplified reproducible test case have not been successful so far.
[27 May 2010 11:47] Manyi Lu
This is likely a duplicate of 52005, at least in 5.1.
[27 May 2010 14:13] John Embretsen
Correct, with the patch for Bug#52005 the crash no longer occurs against 5.1.
The same patch applied against mysql-next-mr-opt-backporting did not eliminate the crash.

It seems that the crash observed against mysql-6.0-codebase-bugfixing (as reported in private comment earlier) is the same as Bug#52357, which was not ported to the -opt-backporting branch yet.

With the ported patch for Bug#52357 at http://lists.mysql.com/commits/109373 applied to mysql-next-mr-opt-backporting we see another crash, namely the one observed against 5.1

With patches for both Bug#52357 and Bug#52005 applied to mysql-next-mr-opt-backporting the 20-minute test run passes without crash.

So, even though Bug#52005 is affecting this test run as well, I conclude by closing this report as duplicate of Bug#52357.