MySQL Bugs: #38795: Automatic search depth and nested join's results in server crash

Bug #38795	Automatic search depth and nested join's results in server crash
Submitted:	14 Aug 2008 12:30	Modified:	4 May 2009 9:25
Reporter:	Pim Pronk	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S1 (Critical)
Version:	5.0.79,/5.1.33/6.0	OS:	Any (CentOS 5)
Assigned to:	Georgi Kodinov	CPU Architecture:	Any
Tags:	crash, innodb, nested join, optimizer_search_depth

Description:
Our application creates sql queries based on the input of the user. Because this can result in a large number of joins (>100), the statistics state can take a very long time (>100 seconds).
While trying to optimize this by setting optimizer_search_depth to 0, I found that with some queries the server crashes.

The server is a Dell PowerEdge with dual quadcore 3Ghz and 8GB ram. 

Our my.cnf:
[mysqld]
datadir = /mysql
port = 3306
socket = /var/lib/mysql/mysql.sock

max_connections = 150
max_connect_errors = 10
table_cache = 100000
max_allowed_packet = 32M
binlog_cache_size = 1M
max_heap_table_size = 256M
sort_buffer_size = 32M
join_buffer_size = 32M
thread_cache_size = 8
thread_concurrency = 8
query_cache_size = 64M
query_cache_limit = 2M
ft_min_word_len = 4
default_table_type = MYISAM
thread_stack = 192K
transaction_isolation = REPEATABLE-READ
tmp_table_size = 256M

query_prealloc_size = 16K
query_alloc_block_size = 2K
query_cache_size = 0M
query_cache_limit = 2M
query_cache_type = 1

#log
log-slow-queries
long_query_time = 5 # 5 seconds for a query is long
log_long_format

log-bin = mysql-bin
binlog-do-db = test
server-id = 1

# MyISAM Speficic
key_buffer_size = 32M
read_buffer_size = 2M
read_rnd_buffer_size = 16M
bulk_insert_buffer_size = 128M
myisam_max_sort_file_size = 10G
myisam_max_extra_sort_file_size = 10G
myisam_repair_threads = 1
myisam_recover

# BDB Specific
skip-bdb

# INNODB Specific
innodb_additional_mem_pool_size = 32M
innodb_buffer_pool_size = 3G
innodb_data_home_dir =
innodb_data_file_path = /dev/sda1:4Graw
innodb_file_io_threads = 5
innodb_thread_concurrency = 16
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 8M
innodb_log_file_size = 256M
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
innodb_flush_method = O_DIRECT

How to repeat:
I have narrowed this problem down to the following query and database. 

First import the attached test database.

Then run:
mysql> set optimizer_search_depth = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT t1.table1_id
    -> FROM table1 As t1
    -> LEFT JOIN (
    -> table2 As t2
    -> INNER JOIN table3 As t3 ON t3.table3_id = t2.table3_id
    -> ) ON t2.table2_id = t1.table2_id
    -> LEFT JOIN (
    -> table4 As t4
    -> LEFT JOIN table5 As t5 ON t5.table5_id = t4.table5_id
    -> LEFT JOIN table6 As t6 ON t6.table4_id = t4.table4_id
    -> LEFT JOIN (
    -> table7 As t7
    -> INNER JOIN table8 As t8 ON t8.table8_id = t7.table8_id
    -> ) ON t7.table4_id = t4.table4_id
    -> ) ON t4.table4_id = t1.table4_id
    -> LEFT JOIN table9 As t9 ON t9.table9_id = t1.table9_id
    -> WHERE t1.table1_id = 1;
ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql>

The error log says:
080814 12:52:18 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.0.67-community-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Edition (GPL)
080814 13:15:45 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=33554432
read_buffer_size=2097152
max_used_connections=1
max_connections=150
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 5255168 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x1b8421a0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x45068fd0, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
(nil)
New value of fp=0x1b8421a0 failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/using-stack-trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x1b886e60 = SELECT t1.table1_id
FROM table1 As t1
LEFT JOIN (
table2 As t2
INNER JOIN table3 As t3 ON t3.table3_id = t2.table3_id
) ON t2.table2_id = t1.table2_id

LEFT JOIN (
table4 As t4
LEFT JOIN table5 As t5 ON t5.table5_id = t4.table5_id
LEFT JOIN table6 As t6 ON t6.table4_id = t4.table4_id
LEFT JOIN (
table7 As t7
INNER JOIN table8 As t8 ON t8.table8_id = t7.table8_id
) ON t7.table4_id = t4.table4_id
) ON t4.table4_id = t1.table4_id
LEFT JOIN table9 As t9 ON t9.table9_id = t1.table9_id
WHERE t1.table1_id = 1
thd->thread_id=9
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Number of processes running now: 0
080814 13:15:45  mysqld restarted

Suggested fix:
Well, it is not a suggested fix but there are three workarounds.

Workaround 1) Move the join with table9 to the top.
mysql> set optimizer_search_depth = 0;
Query OK, 0 rows affected (0.01 sec)

mysql> SELECT t1.table1_id
    -> FROM table1 As t1
    -> LEFT JOIN table9 As t9 ON t9.table9_id = t1.table9_id
    -> LEFT JOIN (
    -> table2 As t2
    -> INNER JOIN table3 As t3 ON t3.table3_id = t2.table3_id
    -> ) ON t2.table2_id = t1.table2_id
    -> LEFT JOIN (
    -> table4 As t4
    -> LEFT JOIN table5 As t5 ON t5.table5_id = t4.table5_id
    -> LEFT JOIN table6 As t6 ON t6.table4_id = t4.table4_id
    -> LEFT JOIN (
    -> table7 As t7
    -> INNER JOIN table8 As t8 ON t8.table8_id = t7.table8_id
    -> ) ON t7.table4_id = t4.table4_id
    -> ) ON t4.table4_id = t1.table4_id
    -> WHERE t1.table1_id = 1;
+-----------+
| table1_id |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

Workaround 2) First run query with search_depth=1, then change back to search_depth = 0
NOTE: this only works for search_depth=1. A larger search_depth (e.g. 2) crashes the server as well. (also see my.cnf: query_cache_size=0).

mysql> set optimizer_search_depth = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT t1.table1_id
    -> FROM table1 As t1
    -> LEFT JOIN (
    -> table2 As t2
    -> INNER JOIN table3 As t3 ON t3.table3_id = t2.table3_id
    -> ) ON t2.table2_id = t1.table2_id
    -> LEFT JOIN (
    -> table4 As t4
    -> LEFT JOIN table5 As t5 ON t5.table5_id = t4.table5_id
    -> LEFT JOIN table6 As t6 ON t6.table4_id = t4.table4_id
    -> LEFT JOIN (
    -> table7 As t7
    -> INNER JOIN table8 As t8 ON t8.table8_id = t7.table8_id
    -> ) ON t7.table4_id = t4.table4_id
    -> ) ON t4.table4_id = t1.table4_id
    -> LEFT JOIN table9 As t9 ON t9.table9_id = t1.table9_id
    -> WHERE t1.table1_id = 1;
+-----------+
| table1_id |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

mysql> set optimizer_search_depth = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT t1.table1_id FROM table1 As t1 LEFT JOIN ( table2 As t2 INNER JOIN table3 As t3 ON t3.table3_id = t2.table3_id ) ON t2.table2_id = t1.table2_id  LEFT JOIN ( table4 As t4 LEFT JOIN table5 As t5 ON t5.table5_id = t4.table5_id LEFT JOIN table6 As t6 ON t6.table4_id = t4.table4_id LEFT JOIN ( table7 As t7 INNER JOIN table8 As t8 ON t8.table8_id = t7.table8_id ) ON t7.table4_id = t4.table4_id ) ON t4.table4_id = t1.table4_id LEFT JOIN table9 As t9 ON t9.table9_id = t1.table9_id  WHERE t1.table1_id = 1;
+-----------+
| table1_id |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

Workaround 3) If you first run the query as in workaround 1 and then run the query that fails normally, the query doesn't fail (ha! ;-)
mysql> SELECT t1.table1_id
    -> FROM table1 As t1
    -> LEFT JOIN table9 As t9 ON t9.table9_id = t1.table9_id
    -> LEFT JOIN (
    -> table2 As t2
    -> INNER JOIN table3 As t3 ON t3.table3_id = t2.table3_id
        LEFT JOIN table6 As t6 ON t6.table4_id = t4.table4_id
    -> ) ON t2.table2_id = t1.table2_id
    -> LEFT JOIN (
    -> table4 As t4
    -> LEFT JOIN table5 As t5 ON t5.table5_id = t4.table5_id
    -> LEFT JOIN table6 As t6 ON t6.table4_id = t4.table4_id
    -> LEFT JOIN (
    -> table7 As t7
    -> INNER JOIN table8 As t8 ON t8.table8_id = t7.table8_id
    -> ) ON t7.table4_id = t4.table4_id
    -> ) ON t4.table4_id = t1.table4_id
    -> WHERE t1.table1_id = 1;

+-----------+
| table1_id |
+-----------+
|         1 |
+-----------+
1 row in set (0.01 sec)

mysql> SELECT t1.table1_id
    -> FROM table1 As t1
    -> LEFT JOIN (
    -> table2 As t2
    -> INNER JOIN table3 As t3 ON t3.table3_id = t2.table3_id
    -> ) ON t2.table2_id = t1.table2_id
    -> LEFT JOIN (
    -> table4 As t4
    -> LEFT JOIN table5 As t5 ON t5.table5_id = t4.table5_id
    -> LEFT JOIN table6 As t6 ON t6.table4_id = t4.table4_id
    -> LEFT JOIN (
    -> table7 As t7
    -> INNER JOIN table8 As t8 ON t8.table8_id = t7.table8_id
        ) ON t7.table4_id = t4.table4_id
    -> ) ON t7.table4_id = t4.table4_id
    -> ) ON t4.table4_id = t1.table4_id
    -> LEFT JOIN table9 As t9 ON t9.table9_id = t1.table9_id
    -> WHERE t1.table1_id = 1;
+-----------+
| table1_id |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

mysql>

Test Database

Attachment: test_db.sql (text/plain), 8.32 KiB.

5.1 bzr affected.

[miguel@hegel dbs]$ 5.1/libexec/mysqld
080814  9:44:11  InnoDB: Started; log sequence number 0 49525
080814  9:44:11 [Note] Event Scheduler: Loaded 0 events
080814  9:44:11 [Note] 5.1/libexec/mysqld: ready for connections.
Version: '5.1.28-debug'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
mysqld: sql_parse.cc:4136: int mysql_execute_command(THD*): Assertion `thd->is_error() || thd->killed' failed.
080814  9:44:30 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388572
read_buffer_size=131072
max_used_connections=1
max_threads=151
threads_connected=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338304 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x1878fa8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x45089110 thread_stack 0x40000
5.1/libexec/mysqld(my_print_stacktrace+0x32)[0xad2560]
5.1/libexec/mysqld(handle_segfault+0x2ea)[0x68a654]
/lib64/libpthread.so.0[0x315bc0e540]
/lib64/libc.so.6(gsignal+0x35)[0x315b030ec5]
/lib64/libc.so.6(abort+0x110)[0x315b032970]
/lib64/libc.so.6(__assert_fail+0xef)[0x315b02a11f]
5.1/libexec/mysqld(_Z21mysql_execute_commandP3THD+0x6bd8)[0x6a1f3e]
5.1/libexec/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x280)[0x6a3f1e]
5.1/libexec/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xc6b)[0x6a4d1b]
5.1/libexec/mysqld(_Z10do_commandP3THD+0x249)[0x6a6103]
5.1/libexec/mysqld(handle_one_connection+0x137)[0x69362b]
/lib64/libpthread.so.0[0x315bc06407]
/lib64/libc.so.6(clone+0x6d)[0x315b0d4b0d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x18d6678 = call lstContacts(null)
thd->thread_id=1
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
[miguel@hegel dbs]$ 5.1/libexec/mysqld
080814 10:17:45  InnoDB: Started; log sequence number 0 49525
080814 10:17:45 [Note] Event Scheduler: Loaded 0 events
080814 10:17:45 [Note] 5.1/libexec/mysqld: ready for connections.
Version: '5.1.28-debug'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
080814 10:20:03 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388572
read_buffer_size=131072
max_used_connections=1
max_threads=151
threads_connected=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 338304 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x1878fa8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x45089110 thread_stack 0x40000
5.1/libexec/mysqld(my_print_stacktrace+0x32)[0xad2560]
5.1/libexec/mysqld(handle_segfault+0x2ea)[0x68a654]
/lib64/libpthread.so.0[0x315bc0e540]
/lib64/libc.so.6(memcpy+0xa0)[0x315b07b350]
5.1/libexec/mysqld[0x70b7cd]
5.1/libexec/mysqld[0x720463]
5.1/libexec/mysqld(_ZN4JOIN8optimizeEv+0xba0)[0x7230da]
5.1/libexec/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x2a3)[0x7271f9]
5.1/libexec/mysqld(_Z13handle_selectP3THDP6st_lexP13select_resultm+0x1f7)[0x72ca1f]
5.1/libexec/mysqld[0x69a289]
5.1/libexec/mysqld(_Z21mysql_execute_commandP3THD+0x623)[0x69b989]
5.1/libexec/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x280)[0x6a3f1e]
5.1/libexec/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xc6b)[0x6a4d1b]
5.1/libexec/mysqld(_Z10do_commandP3THD+0x249)[0x6a6103]
5.1/libexec/mysqld(handle_one_connection+0x137)[0x69362b]
/lib64/libpthread.so.0[0x315bc06407]
/lib64/libc.so.6(clone+0x6d)[0x315b0d4b0d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x18d6678 = SELECT t1.table1_id
FROM table1 As t1
LEFT JOIN (
table2 As t2
INNER JOIN table3 As t3 ON t3.table3_id = t2.table3_id
) ON t2.table2_id = t1.table2_id
LEFT JOIN (
table4 As t4
LEFT JOIN table5 As t5 ON t5.table5_id = t4.table5_id
LEFT JOIN table6 As t6 ON t6.table4_id = t4.table4_id
LEFT JOIN (
table7 As t7
INNER JOIN table8 As t8 ON t8.table8_id = t7.table8_id
) ON t7.table4_id = t4.table4_id
) ON t4.table4_id = t1.table4_id
LEFT JOIN table9 As t9 ON t9.table9_id = t1.table9_id
WHERE t1.table1_id = 1
thd->thread_id=1
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
[miguel@hegel dbs]$

Thank you for the bug report. Verified as described.

Problem is in a wrong assumption in greedy_search().
In the following piece of code :
 4686     if (size_remain <= search_depth)
 4687     {
 4688       /*
 4689         'join->best_positions' contains a complete optimal extension of the
 4690         current partial QEP.
 4691       */
 4692       DBUG_EXECUTE("opt", print_plan(join, join->tables,
 4693                                      record_count, read_time, read_time,
 4694                                      "optimal"););
 4695       DBUG_RETURN(FALSE);
 4696     }

greedy_search() assumes that if search_depth tables have been found it has the complete plan.
But if search_depth isless than the number of non-const tables this plan will not be complete : the tables that are out of search depth would not have been placed.

Here's a simplified test case : 
CREATE TABLE t1 (a int, b int, d int, i int); INSERT INTO t1 VALUES (1,1,1,1);
CREATE TABLE t2 (b int, c int); INSERT INTO t2 VALUES (1,1);
CREATE TABLE t3 (c int); INSERT INTO t3 VALUES (1);
CREATE TABLE t4 (d int, e int); INSERT INTO t4 VALUES (1,1);
CREATE TABLE t5 (g int, d int, h int); INSERT INTO t5 VALUES (1,1,1);

set optimizer_search_depth = 3;

SELECT 1
FROM t1
LEFT JOIN (
  t2 JOIN t3 ON t3.c = t2.c
) ON t2.b = t1.b
LEFT JOIN (
  t4 JOIN t5 ON t5.d = t4.d
) ON t4.d = t1.d
;

The crash is right after the call to choose_plan() (that calls greedy_search()) on the first attempt to use the new plan.

The call stack is : 
#0  0x000000307900be26 in pthread_kill () from /lib64/libpthread.so.0
#1  0x00000000007b33be in write_core (sig=11) at stacktrace.c:254
#2  0x000000000061a383 in handle_segfault (sig=11) at mysqld.cc:2376
#3  <signal handler called>
#4  0x000000000069265b in get_best_combination (join=0x2428588) at sql_select.cc:5162
#5  0x00000000006a3a0c in make_join_statistics (join=0x2428588, tables=0x0, conds=0x0, 
    keyuse_array=0x2429768) at sql_select.cc:2773
#6  0x00000000006a46b6 in JOIN::optimize (this=0x2428588) at sql_select.cc:902
#7  0x00000000006a867a in mysql_select (thd=0x23dbc28, rref_pointer_array=0x23dd690, 
    tables=0x241b3f0, wild_num=0, fields=@0x23dd570, conds=0x0, og_num=0, order=0x0, 
    group=0x0, having=0x0, proc_param=0x0, select_options=2156153344, result=0x2427898, 
    unit=0x23dd0b8, select_lex=0x23dd468) at sql_select.cc:2261
#8  0x00000000006ad1c0 in handle_select (thd=0x23dbc28, lex=0x23dd028, result=0x2427898, 
    setup_tables_done_option=0) at sql_select.cc:257
#9  0x0000000000635556 in mysql_execute_command (thd=0x23dbc28) at sql_parse.cc:2741
#10 0x000000000063d75e in mysql_parse (thd=0x23dbc28, 
    inBuf=0x241b238 "SELECT 1\nFROM t1\nLEFT JOIN (\nt2 JOIN t3 ON t3.c = t2.c\n) ON t2.b = t1.b\nLEFT JOIN (\nt4 JOIN t5 ON t5.d = t4.d\n) ON t4.d = t1.d", length=126, 
    found_semicolon=0x41967e40) at sql_parse.cc:6206
#11 0x000000000063e477 in dispatch_command (command=COM_QUERY, thd=0x23dbc28, 
    packet=0x24131a9 "SELECT 1\nFROM t1\nLEFT JOIN (\nt2 JOIN t3 ON t3.c = t2.c\n) ON t2.b = t1.b\nLEFT JOIN (\nt4 JOIN t5 ON t5.d = t4.d\n) ON t4.d = t1.d\n", packet_length=128)
    at sql_parse.cc:1890
#12 0x000000000063fb83 in do_command (thd=0x23dbc28) at sql_parse.cc:1580
#13 0x000000000064008f in handle_one_connection (arg=0x23dbc28) at sql_parse.cc:1186
#14 0x000000307900729a in start_thread () from /lib64/libpthread.so.0
#15 0x00000030784e42cd in clone () from /lib64/libc.so.6

Better call stack can be seen if debug tracing is turned on:
#0  0x000000307900be26 in pthread_kill () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install gcc.x86_64 glibc.x86_64 zlib.x86_64
(gdb) where
#0  0x000000307900be26 in pthread_kill () from /lib64/libpthread.so.0
#1  0x00000000007b33be in write_core (sig=11) at stacktrace.c:254
#2  0x000000000061a383 in handle_segfault (sig=11) at mysqld.cc:2376
#3  <signal handler called>
#4  0x00000000006cdaee in print_plan (join=0x26797c8, idx=5, record_count=1, 
    read_time=2.002197265625, current_read_time=2.002197265625, info=0xae4eb1 "optimal")
    at sql_test.cc:263
#5  0x000000000069f71c in greedy_search (join=0x26797c8, remaining_tables=28, 
    search_depth=3, prune_level=1) at sql_select.cc:4692
#6  0x000000000069fc7f in choose_plan (join=0x26797c8, join_tables=30) at sql_select.cc:4401
#7  0x00000000006a394c in make_join_statistics (join=0x26797c8, tables=0x0, conds=0x0, 
    keyuse_array=0x267a9a8) at sql_select.cc:2763
#8  0x00000000006a46b6 in JOIN::optimize (this=0x26797c8) at sql_select.cc:902
#9  0x00000000006a867a in mysql_select (thd=0x262ce68, rref_pointer_array=0x262e8d0, 
    tables=0x266c630, wild_num=0, fields=@0x262e7b0, conds=0x0, og_num=0, order=0x0, 
    group=0x0, having=0x0, proc_param=0x0, select_options=2156153344, result=0x2678ad8, 
    unit=0x262e2f8, select_lex=0x262e6a8) at sql_select.cc:2261
#10 0x00000000006ad1c0 in handle_select (thd=0x262ce68, lex=0x262e268, result=0x2678ad8, 
    setup_tables_done_option=0) at sql_select.cc:257
#11 0x0000000000635556 in mysql_execute_command (thd=0x262ce68) at sql_parse.cc:2741
#12 0x000000000063d75e in mysql_parse (thd=0x262ce68, 
    inBuf=0x266c478 "SELECT 1\nFROM t1\nLEFT JOIN (\nt2 JOIN t3 ON t3.c = t2.c\n) ON t2.b = t1.b\nLEFT JOIN (\nt4 JOIN t5 ON t5.d = t4.d\n) ON t4.d = t1.d", length=126, 
    found_semicolon=0x4150ce40) at sql_parse.cc:6206
#13 0x000000000063e477 in dispatch_command (command=COM_QUERY, thd=0x262ce68, 
    packet=0x26643e9 "SELECT 1\nFROM t1\nLEFT JOIN (\nt2 JOIN t3 ON t3.c = t2.c\n) ON t2.b = t1.b\nLEFT JOIN (\nt4 JOIN t5 ON t5.d = t4.d\n) ON t4.d = t1.d\n", packet_length=128)
    at sql_parse.cc:1890
#14 0x000000000063fb83 in do_command (thd=0x262ce68) at sql_parse.cc:1580
#15 0x000000000064008f in handle_one_connection (arg=0x262ce68) at sql_parse.cc:1186
#16 0x000000307900729a in start_thread () from /lib64/libpthread.so.0
#17 0x00000030784e42cd in clone () from /lib64/libc.so.6

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57228

2708 Georgi Kodinov	2008-10-28
      Bug #38795: Automatic search depth and nested join's results in server crash
      
      The greedy optimizer was checking if it has a complete plan by comparing the remaining tables with the search depth.
      However the search depth includes the const tables (that are already placed) before the call to greedy_search()).
      And tables in the remaining_tables bitmask are only the non-constant ones.
      Fixed by adding the number of const tables to the number of remaining tables when comparing with search_depth (that has them).

Unfortunately, the fix is incorrect, as the script below shows.
The script below adds one extra row to t1, so that it is not
considered a system table. If we set search_dept = 3, we get
the crash anyway, despite there is no system table in the plan.

According to my investigation, the problem is not in the function
greedy_search(). I believe it is in best_extension_by_limited_search(),
which doesn't behave correct in the presence of OUTER JOIN.

mysql> insert into t1 values (2,2,2,2);
Query OK, 1 row affected (0.00 sec)

mysql> explain SELECT 1 FROM t1 LEFT JOIN (   t2 JOIN t3 ON t3.c = t2.c ) ON t2.b = t1.b LEFT JOIN (   t5 JOIN t4 ON t5.d = t4.d ) ON t4.d = t1.d;
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
|  1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL |    2 |       |
|  1 | SIMPLE      | t2    | ALL  | NULL          | NULL | NULL    | NULL |    1 |       |
|  1 | SIMPLE      | t3    | ALL  | NULL          | NULL | NULL    | NULL |    1 |       |
|  1 | SIMPLE      | t5    | ALL  | NULL          | NULL | NULL    | NULL |    1 |       |
|  1 | SIMPLE      | t4    | ALL  | NULL          | NULL | NULL    | NULL |    1 |       |
+----+-------------+-------+------+---------------+------+---------+------+------+-------+
5 rows in set (0.03 sec)

mysql> SET optimizer_search_depth = 3;
Query OK, 0 rows affected (0.73 sec)

mysql> SELECT 1 FROM t1 LEFT JOIN (   t2 JOIN t3 ON t3.c = t2.c ) ON t2.b = t1.b LEFT JOIN (   t5 JOIN t4 ON t5.d = t4.d ) ON t4.d = t1.d;
ERROR 2013 (HY000): Lost connection to MySQL server during query

Below is a summary of my review:

As my previous comment above explains, the crash is unrelated to constant
tables. This can be shown if we add one extra row to table "t1", and run
the server with the suggested patch. If we run a DEBUG binary we get a
crash in print_plan().

I managed to narrow down the possible cause of the problem to the function
check_interleaving_with_nj() which is called from within
best_extension_by_limited_search().

This is what happens:
- greedy_search() calls best_extension_by_limited_search() twice to find
  the first two tables in the query plan. In our case these are t1, t2.
- greedy_search() calls best_extension_by_limited_search() a third time
  to determine the remainder of the plan. Since search depth is 3, and
  there are 3 more tables to be optimized, this must be the last call to
  best_extension_by_limited_search(), which must find a complete plan.
- the third call to best_extension_by_limited_search() finds a partial
  plan <t1,t2,t3>, and recursively calls itself to find the order of the
  last two tables t4, and t5.
- in this last recursive call, best_extension_by_limited_search() fails
  to find a complete plan because the function check_interleaving_with_nj()
  rejects both possible extensions of the partial plan. More specifically,
  check_interleaving_with_nj() returns TRUE (meaning "reject") because the
  following condition becomes TRUE:

  if (join->cur_embedding_map & ~next_tab->embedding_map)

Therefore, it seems that one (or both) of join->cur_embedding_map, and
next_tab->embedding_map are computed incorrectly.

If we call the optimizer with search_depth >= than the number of tables,
a query plan found without problems, therefore one hint is to look at how
the values above are computed with exhaustive search.

Looking some more at the bug in the light of Timour's comments I think I've found the problem at hand : join->cur_embedding_map.
The problem manifests itself as follows : 
consider the following queries :
SET optimizer_search_depth = 3; 
SELECT 1
FROM t1
LEFT JOIN (
  t2 JOIN t3 ON t3.c = t2.c
) ON t2.b = t1.b
LEFT JOIN (
  t4 JOIN t5 ON t5.d = t4.d
) ON t4.d = t1.d

We go to greedy_search(). t1 is a constant table and is already placed.
So we call best_extension_by_limited_search() with a search depth of 3.
this means that it can effectively try placing t2 and t3 on the second place after t1. 
best_exension_by_limited_search() adds t2, updates join->cur_embedding_map to reflect the fact that the current position is inside the first nested join and calls itself recursively with search depth 2. 
At this second level it checks if it can put t3 and finds that it can (since it's in the first nested join). So it adds it and updates join->cur_embedding_map to mark that it's back on top level after t3.
It then tries t4 and succeeds (t4 can be placed there). while trying it it updates join->cur_embedding_map to reflect the position inside the second nested join.
But it reaches search depth and is forced to go back. 
While going back it "unplaces" t4, t3 and t2 all the way up to the  
At the exit to greedy_search() join->cur_embedding_map is at the top level (after t1).
Now greedy_search adds t2 (the first table found by best_exension_by_limited_search() but is not updating join->cur_embedding_map to be after t2 (i.e. in the first nested join).
It then calls best_exension_by_limited_search starting after t1,t2. It goes to t3 and adds it to join->cur_embedding_map.
Next it tries t4 and t5 but can't place them because it thinks it's still in the first nested join.
As a result greedy_search() gets one table (t3) instead of 3 (t3,t4,t5) and crashes.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/61230

2708 Georgi Kodinov	2008-12-10
      Bug #38795: Automatic search depth and nested join's results in server crash
      
      The greedy optimizer tracks the current level of nested joins and the position
      inside these by setting and maintaining a state that's global for the whole FROM
      clause.
      This state was correctly maintained inside the selection of the next partial plan
      table (in best_extension_by_limited_search()). 
      greedy_search() also moves the current position by adding the last partial match 
      table when there's not enough tables in the partial plan found by 
      best_extension_by_limited_search().
      This may require update of the global state variables that describe the current position
      in the plan if the last table placed by greedy_search is not a top-level join table.
      Fixed by updating the state after placing the partial plan table in greedy_search() in
      the same way this is done on entering the best_extension_by_limited_search().
      Fixed the signature of the function called to update the state : check_interleaving_with_nj

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63088

2708 Georgi Kodinov	2009-01-13
      Bug #38795: Automatic search depth and nested join's results in server crash
            
      The greedy optimizer tracks the current level of nested joins and the position
      inside these by setting and maintaining a state that's global for the whole FROM
      clause.
      This state was correctly maintained inside the selection of the next partial plan
      table (in best_extension_by_limited_search()). 
      greedy_search() also moves the current position by adding the last partial match 
      table when there's not enough tables in the partial plan found by 
      best_extension_by_limited_search().
      This may require update of the global state variables that describe the current
      position in the plan if the last table placed by greedy_search is not a top-level 
      join table.
      Fixed by updating the state after placing the partial plan table in greedy_search()
      in the same way this is done on entering the best_extension_by_limited_search().
      Fixed the signature of the function called to update the state : 
      check_interleaving_with_nj

Pushed into 6.0.10-alpha (revid:joro@sun.com-20090119171328-2hemf2ndc1dxl0et) (version source revid:timothy.smith@sun.com-20090116165151-xtp5e4z6qsmxyvy0) (merge vers: 6.0.10-alpha) (pib:6)

Pushed into 5.0.78 (revid:joro@sun.com-20090203090422-v91rh3gmx2ulhbu9) (version source revid:chad@mysql.com-20090126173455-ia9503yxdqi30guh) (merge vers: 5.0.77) (pib:6)

Pushed into 5.1.32 (revid:joro@sun.com-20090203090549-gos3v4320vimrzg6) (version source revid:joro@sun.com-20090116104517-8bibi61rhr2428q9) (merge vers: 5.1.31) (pib:6)

Noted in 5.0.78, 5.1.32, 6.0.10 changelogs.

The greedy optimizer could cause a server crash due to improper
handling of nested outer joins.

Pushed into 5.1.32-ndb-6.3.23 (revid:tomas.ulin@sun.com-20090217131017-6u8qz1edkjfiobef) (version source revid:tomas.ulin@sun.com-20090203133556-9rclp06ol19bmzs4) (merge vers: 5.1.32-ndb-6.3.22) (pib:6)

Pushed into 5.1.32-ndb-6.4.3 (revid:tomas.ulin@sun.com-20090217134419-5ha6xg4dpedrbmau) (version source revid:tomas.ulin@sun.com-20090203133556-9rclp06ol19bmzs4) (merge vers: 5.1.32-ndb-6.3.22) (pib:6)

Pushed into 5.1.32-ndb-6.2.17 (revid:tomas.ulin@sun.com-20090217134216-5699eq74ws4oxa0j) (version source revid:tomas.ulin@sun.com-20090202111723-1zzwax187rtls913) (merge vers: 5.1.32-ndb-6.2.17) (pib:6)

Unfortunately I think the patch for this issue has introduced another bug. After some testing we still experience server crashes with some user generated queries. The test case below crashes the server on 5.1.33-community-log, but the query returns correct results on 5.0.51a-community-log.

To repeat:
1) Import the test database: see next attached file
2) Run the following query:
SELECT t1.table1_id
FROM table1 t1
LEFT JOIN table2 t2 ON t2.table2_id = t1.table2_id
LEFT JOIN table3 t3 ON t3.table3_id = t1.table3_id
LEFT JOIN table4 t4 ON t4.table4_id = t1.table4_id
LEFT JOIN (
	table5 t5
		LEFT JOIN (
			table1 t1_2
			LEFT JOIN table6 t6_2 ON t6_2.table1_id = t1_2.table1_id
		) ON t1_2.table1_id = t5.table1_id
) ON t5.table1_id = t1.table1_id
LEFT JOIN table6 t6 ON t6.table1_id = t1.table1_id

START OUTPUT --------------------------------------------->
[root@kelly ~]# mysql test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.1.33-community-log MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select @@optimizer_search_depth;
+--------------------------+
| @@optimizer_search_depth |
+--------------------------+
|                        0 |
+--------------------------+
1 row in set (0.00 sec)

mysql> SELECT t1.table1_id FROM table1 t1 LEFT JOIN table2 t2 ON t2.table2_id = t1.table2_id LEFT JOIN table3 t3 ON t3.table3_id = t1.table3_id LEFT JOIN table4 t4 ON t4.table4_id = t1.table4_id LEFT JOIN ( table5 t5 LEFT JOIN ( table1 t1_2 LEFT JOIN table6 t6_2 ON t6_2.table1_id = t1_2.table1_id ) ON t1_2.table1_id = t5.table1_id ) ON t5.table1_id = t1.table1_id LEFT JOIN table6 t6 ON t6.table1_id = t1.table1_id;
ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql>
END ------------------------------------------------------>

When run with default value for optimizer_search_depth:

START OUTPUT --------------------------------------------->
[root@kelly ~]# mysql test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.1.33-community-log MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select @@optimizer_search_depth;
+--------------------------+
| @@optimizer_search_depth |
+--------------------------+
|                       62 |
+--------------------------+
1 row in set (0.00 sec)

mysql> SELECT t1.table1_id FROM table1 t1 LEFT JOIN table2 t2 ON t2.table2_id = t1.table2_id LEFT JOIN table3 t3 ON t3.table3_id = t1.table3_id LEFT JOIN table4 t4 ON t4.table4_id = t1.table4_id LEFT JOIN ( table5 t5 LEFT JOIN ( table1 t1_2 LEFT JOIN table6 t6_2 ON t6_2.table1_id = t1_2.table1_id ) ON t1_2.table1_id = t5.table1_id ) ON t5.table1_id = t1.table1_id LEFT JOIN table6 t6 ON t6.table1_id = t1.table1_id;
+-----------+
| table1_id |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

mysql>
END ------------------------------------------------------>

The error log says:
090406 10:26:51 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=33554432
read_buffer_size=2097152
max_used_connections=1
max_threads=150
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5256692 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x1b582430
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x41652f20 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x84417e]
/usr/sbin/mysqld(handle_segfault+0x322)[0x5bc292]
/lib64/libpthread.so.0[0x3bce40e4c0]
/lib64/libc.so.6(memcpy+0xd2)[0x3bcdc7b512]
/usr/sbin/mysqld[0x629f20]
/usr/sbin/mysqld(_ZN4JOIN8optimizeEv+0x4da)[0x62ae8a]
/usr/sbin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xa1)[0x633721]
/usr/sbin/mysqld(_Z13handle_selectP3THDP6st_lexP13select_resultm+0x169)[0x634109]
/usr/sbin/mysqld[0x5c66d4]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x386c)[0x5cc7dc]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x1f1)[0x5ce6a1]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xfd0)[0x5cf690]
/usr/sbin/mysqld(_Z10do_commandP3THD+0xe4)[0x5cfc24]
/usr/sbin/mysqld(handle_one_connection+0x6ef)[0x5c30cf]
/lib64/libpthread.so.0[0x3bce406367]
/lib64/libc.so.6(clone+0x6d)[0x3bcdcd2f7d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x1b5b5310 = SELECT t1.table1_id FROM table1 t1 LEFT JOIN table2 t2 ON t2.table2_id = t1.table2_id LEFT JOIN table3 t3 ON t3.table3_id = t1.table3_id LEFT JOIN table4 t4 ON t4.table4_id = t1.table4_id LEFT JOIN ( table5 t5 LEFT JOIN ( table1 t1_2 LEFT JOIN table6 t6_2 ON t6_2.table1_id = t1_2.table1_id ) ON t1_2.table1_id = t5.table1_id ) ON t5.table1_id = t1.table1_id LEFT JOIN table6 t6 ON t6.table1_id = t1.table1_id
thd->thread_id=3
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
090406 10:26:51 mysqld_safe Number of processes running now: 0
090406 10:26:51 mysqld_safe mysqld restarted
090406 10:26:51 [Warning] Changed limits: max_open_files: 65535  max_connections: 150  table_cache: 32687
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
090406 10:26:52  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
etc

Please note that we understand the query mentioned above is of a very bad design. But as queries are generated by our users who make mistakes as above, we still have to support them ;)

Test database

Attachment: db_test.sql (text/plain), 5.84 KiB.

It has been 2 weeks since I reopened this bug. Any news? Or do you wish me to open another bug report?

5.0.79 crashes too:

mysqld-debug.exe!get_best_combination()[sql_select.cc:5172]
mysqld-debug.exe!make_join_statistics()[sql_select.cc:2783]
mysqld-debug.exe!JOIN::optimize()[sql_select.cc:913]
mysqld-debug.exe!mysql_select()[sql_select.cc:2271]
mysqld-debug.exe!handle_select()[sql_select.cc:257]
mysqld-debug.exe!mysql_execute_command()[sql_parse.cc:2789]
mysqld-debug.exe!mysql_parse()[sql_parse.cc:6267]
mysqld-debug.exe!dispatch_command()[sql_parse.cc:1940]
mysqld-debug.exe!do_command()[sql_parse.cc:1628]
mysqld-debug.exe!handle_one_connection()[sql_parse.cc:1234]
mysqld-debug.exe!pthread_start()[my_winthread.c:85]
mysqld-debug.exe!_callthreadstart()[thread.c:295]
mysqld-debug.exe!_threadstart()[thread.c:277]
kernel32.dll!BaseThreadStart()

5.0.79, 5.1.33 crashes still.  Here's a testcase:

--------
set session optimizer_search_depth=0;
drop table if exists `t1`;
create table `t1` (`a` int) engine=innodb;
insert into `t1` values (1);

select t1.a
from t1 t1
left join t1 t2 on t2.a = t1.a
left join t1 t3 on t3.a = t1.a
left join t1 t4 on t4.a = t1.a
left join (t1 t5
left join (t1 t1_2 
left join t1 t6_2 on t6_2.a = t1_2.a) 
on t1_2.a = t5.a) on t5.a = t1.a
left join t1 t6 on t6.a = t1.a;
--------

See also bug #44633