Bug #52394 Segfault in JOIN_CACHE::get_offset () at sql_select.h:445
Submitted: 26 Mar 2010 14:54 Modified: 23 Nov 2010 3:23
Reporter: Patrick Crews Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:mysql-6.0-codebase-bugfixing OS:Any
Assigned to: Guilhem Bichot CPU Architecture:Any
Tags: crashing bug, optimizer_join_cache_level, optimizer_switch, segfault

[26 Mar 2010 14:54] Patrick Crews
Description:
Crashing bug / segfault in 6.0-codebase-bugfixing with optimizer_join_cache_level=6:
NOTE:  This crash did not show up for other optimizer_join_cache_level values for the same test.

Key line: # 2010-03-26T10:46:01 #5  0x084cce4a in JOIN_CACHE::get_offset (this=0xa5d5720, ofs_sz=1, ptr=0xb5e9127 <Address 0xb5e9127 out of bounds>) at sql_select.h:445

Still have some further analysis to do - producing a simplified MTR test case is proving problematic.  However, this bug is absolutely repeatable with the random query generator as noted in the 'How to Repeat' section.

Partial backtrace (full output attached separately):

# 2010-03-26T10:46:01 Thread 1 (Thread 9640):
# 2010-03-26T10:46:01 #0  0x00a98422 in __kernel_vsyscall ()
# 2010-03-26T10:46:01 #1  0x00385e93 in __pthread_kill (threadid=2883331952, signo=11) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:64
# 2010-03-26T10:46:01 #2  0x08d1f00f in my_write_core (sig=11) at stacktrace.c:328
# 2010-03-26T10:46:01 #3  0x08422222 in handle_segfault (sig=11) at mysqld.cc:2843
# 2010-03-26T10:46:01 #4  <signal handler called>
# 2010-03-26T10:46:01 #5  0x084cce4a in JOIN_CACHE::get_offset (this=0xa5d5720, ofs_sz=1, ptr=0xb5e9127 <Address 0xb5e9127 out of bounds>) at sql_select.h:445
# 2010-03-26T10:46:01 #6  0x084cd0b2 in JOIN_CACHE::get_rec_length (this=0xa5d5720, ptr=0xb5e9127 <Address 0xb5e9127 out of bounds>) at sql_select.h:624
# 2010-03-26T10:46:01 #7  0x084c77c5 in JOIN_CACHE::read_referenced_field (this=0xa5d5720, copy=0xa5d57d8, rec_ptr=0xb5e9128 <Address 0xb5e9128 out of bounds>, 
# 2010-03-26T10:46:01     len=0xabdbfc1c) at sql_join_cache.cc:1499
# 2010-03-26T10:46:01 #8  0x084ca53d in JOIN_CACHE_BKA::get_next_key (this=0xa5d5850, key=0xa5d5b60) at sql_join_cache.cc:2495
# 2010-03-26T10:46:01 #9  0x084c9726 in bka_range_seq_next (rseq=0xa5d5850, range=0xa5d5b60) at sql_join_cache.cc:2164
# 2010-03-26T10:46:01 #10 0x087014ee in handler::multi_range_read_next (this=0xa5d5ac0, range_info=0xabdbfd4c) at handler.cc:4427
# 2010-03-26T10:46:01 #11 0x0870266c in DsMrr_impl::dsmrr_fill_buffer (this=0xa424654) at handler.cc:4660
# 2010-03-26T10:46:01 #12 0x08701e17 in DsMrr_impl::dsmrr_init (this=0xa424654, h_arg=0xa4244f8, seq_funcs=0xabdbfe5c, seq_init_param=0xa5d5850, n_ranges=340, mode=128, 
# 2010-03-26T10:46:01     buf=0xa5d58d0) at handler.cc:4578
# 2010-03-26T10:46:01 #13 0x08baedf1 in ha_myisam::multi_range_read_init (this=0xa4244f8, seq=0xabdbfe5c, seq_init_param=0xa5d5850, n_ranges=340, mode=128, buf=0xa5d58d0)
# 2010-03-26T10:46:01     at ha_myisam.cc:2119
# 2010-03-26T10:46:01 #14 0x084ca066 in JOIN_CACHE_BKA::init_join_matching_records (this=0xa5d5850, seq_funcs=0xabdbfe5c, ranges=340) at sql_join_cache.cc:2379
# 2010-03-26T10:46:01 #15 0x084c9ae9 in JOIN_CACHE_BKA::join_matching_records (this=0xa5d5850, skip_last=false) at sql_join_cache.cc:2287
# 2010-03-26T10:46:01 #16 0x084c7d2c in JOIN_CACHE::join_records (this=0xa5d5850, skip_last=false) at sql_join_cache.cc:1627
# 2010-03-26T10:46:01 #17 0x084c8073 in JOIN_CACHE::join_records (this=0xa5d5720, skip_last=false) at sql_join_cache.cc:1673
# 2010-03-26T10:46:01 #18 0x0858303f in sub_select_cache (join=0xa5d0500, join_tab=0xa50f574, end_of_records=true) at sql_select.cc:16405
# 2010-03-26T10:46:01 #19 0x085834ad in sub_select (join=0xa5d0500, join_tab=0xa50f3b8, end_of_records=true) at sql_select.cc:16568
# 2010-03-26T10:46:01 #20 0x08582127 in do_select (join=0xa5d0500, fields=0x0, table=0xa50ff48, procedure=0x0) at sql_select.cc:16159
# 2010-03-26T10:46:01 #21 0x08545667 in JOIN::exec (this=0xa5d0500) at sql_select.cc:2567
# 2010-03-26T10:46:01 #22 0x085490d1 in mysql_select (thd=0xa425048, rref_pointer_array=0xa426658, tables=0xa40f958, wild_num=0, fields=..., conds=0x0, og_num=4, 
# 2010-03-26T10:46:01     order=0x0, group=0xa5cf1a0, having=0x0, proc_param=0x0, select_options=2147781376, result=0xa5cf4b8, unit=0xa426094, select_lex=0xa426554)
# 2010-03-26T10:46:01     at sql_select.cc:3184
# 2010-03-26T10:46:01 #23 0x0853944d in handle_select (thd=0xa425048, lex=0xa426038, result=0xa5cf4b8, setup_tables_done_option=0) at sql_select.cc:304
# 2010-03-26T10:46:01 #24 0x08459ca5 in execute_sqlcom_select (thd=0xa425048, all_tables=0xa40f958) at sql_parse.cc:5032
# 2010-03-26T10:46:01 #25 0x08448b0c in mysql_execute_command (thd=0xa425048) at sql_parse.cc:2295
# 2010-03-26T10:46:01 #26 0x0845dd5a in mysql_parse (thd=0xa425048, 
# 2010-03-26T10:46:01     inBuf=0xa40e9f0 "SELECT    table2 . `col_datetime_key` AS field1 , CONCAT ( table2 . `col_varchar_key` , table2 . `col_varchar_nokey` ) AS field2 , table2 . `col_datetime_key` AS field3 , table2 . `col_datetime_key` A"..., length=602, found_semicolon=0xabdc1950) at sql_parse.cc:6060
# 2010-03-26T10:46:01 #27 0x084439ad in dispatch_command (command=COM_QUERY, thd=0xa425048, 
# 2010-03-26T10:46:01     packet=0xa3fab19 " SELECT    table2 . `col_datetime_key` AS field1 , CONCAT ( table2 . `col_varchar_key` , table2 . `col_varchar_nokey` ) AS field2 , table2 . `col_datetime_key` AS field3 , table2 . `col_datetime_key` "..., packet_length=603) at sql_parse.cc:1091
# 2010-03-26T10:46:01 #28 0x084421c7 in do_command (thd=0xa425048) at sql_parse.cc:775
# 2010-03-26T10:46:01 #29 0x0843e551 in do_handle_one_connection (thd_arg=0xa425048) at sql_connect.cc:1173
# 2010-03-26T10:46:01 #30 0x0843e238 in handle_one_connection (arg=0xa425048) at sql_connect.cc:1113
# 2010-03-26T10:46:01 #31 0x0038080e in start_thread (arg=0xabdc2770) at pthread_create.c:300
# 2010-03-26T10:46:01 #32 0x002928de in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

How to repeat:
perl ./runall.pl --basedir=/mysql-6.0 --vardir=6var6 --mysqld=--init-file=jcl6.sql --mtr-build-thread=261 --threads=1 --queries=50000  --Validator=Transformer  --grammar=conf/optimizer/optimizer_no_subquery.yy

The contents of file jcl6.sql:
SET GLOBAL OPTIMIZER_SWITCH = 'index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,firstmatch=on,loosescan=on,materialization=on,semijoin=off,mrr=on,mrr_cost_based=off,index_condition_pushdown=on';

SET GLOBAL optimizer_join_cache_level = 6;

This test will run for a bit, then crash with the backtrace noted in the description
[26 Mar 2010 15:10] Patrick Crews
Full crash output

Attachment: bug52394_backtrace.txt (text/plain), 42.10 KiB.

[26 Mar 2010 18:34] Philip Stoev
Maybe this is a crash that requires more than 1 query to develop? If this is the case, you could potentially use the mysqltest simplifier to reduce the CSV log.
[27 Mar 2010 13:52] Philip Stoev
Test case after automatic simplification of the CSV log file

Attachment: bug52394.test (application/octet-stream, text), 6.09 KiB.

[27 Mar 2010 13:53] Philip Stoev
Please find attached the test case that was produced by automatically processing the CSV log file. It seems that a single SELECT is sufficient to reproduce the crash, there is no interaction between two queries.
[2 Apr 2010 6:09] Valeriy Kravchuk
Verified just as described:

openxs@ubuntu:/home2/openxs/dbs/6.0-bugfixing/mysql-test$ ./mtr bug52394
Logging: ./mtr  bug52394
100402  9:05:38 [Note] Buffered information: Performance schema disabled (reason: start parameters).

100402  9:05:38 [Note] Plugin 'FEDERATED' is disabled.
100402  9:05:38 [Note] Plugin 'ndbcluster' is disabled.
MySQL Version 6.0.14
Checking supported features...
 - using ndbcluster when necessary, mysqld supports it
 - SSL connections supported
 - binaries are debug compiled
Collecting tests...
vardir: /home2/openxs/dbs/6.0-bugfixing/mysql-test/var
Checking leftover processes...
mysql-test-run: WARNING: Found non pid file 'backup_xpfm_compat_lctn1.bak' in '/home2/openxs/dbs/6.0-bugfixing/mysql-test/var/run'
mysql-test-run: WARNING: Found non pid file 'backup_xpfm_compat_lctn0.bak' in '/home2/openxs/dbs/6.0-bugfixing/mysql-test/var/run'
Removing old var directory...
Creating var directory '/home2/openxs/dbs/6.0-bugfixing/mysql-test/var'...
Installing system database...
Using server port 49019

==============================================================================

TEST                                      RESULT   TIME (ms)
------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
main.bug52394                            [ fail ]
        Test ended at 2010-04-02 09:05:50

CURRENT_TEST: main.bug52394

Server [mysqld.1 - pid: 5907, winpid: 5907, exit: 256] failed during test run
Server log from this test:
100402  9:05:46 [Note] Buffered information: Performance schema enabled.

100402  9:05:46 [Note] Plugin 'FEDERATED' is disabled.
100402  9:05:46 [Note] Plugin 'InnoDB' is disabled.
100402  9:05:46 [Note] Plugin 'ndbcluster' is disabled.
100402  9:05:47 [Note] Event Scheduler: Loaded 0 events
100402  9:05:47 [Note] /home2/openxs/dbs/6.0-bugfixing/libexec/mysqld: ready for connections.
Version: '6.0.14-alpha-debug-log'  socket: '/home2/openxs/dbs/6.0-bugfixing/mysql-test/var/tmp/mysqld.1.sock'  port: 13000  Source distribution
100402  9:05:48 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=1
max_threads=151
thread_count=1
connection_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 60134 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x90b2590
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0xb34adf60 thread_stack 0x30c00
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(my_print_stacktrace+0x32) [0x87c02f1]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(handle_segfault+0x2f2) [0x830fb4c]
[0xb771e420]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(JOIN_CACHE::get_rec_length(unsigned char*)+0x22) [0x8360862]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(JOIN_CACHE::read_referenced_field(st_cache_field*, unsigned char*, unsigned int*)+0x89) [0x835e68b]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(JOIN_CACHE_BKA::get_next_key(unsigned char**)+0x210) [0x835e9b8]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld [0x835d1ea]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(handler::multi_range_read_next(char**)+0x177) [0x847d071]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(DsMrr_impl::dsmrr_fill_buffer()+0x164) [0x848019a]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(DsMrr_impl::dsmrr_init(handler*, st_range_seq_if*, void*, unsigned int, unsigned int, st_handler_buffer*)+0x3f6) [0x848086e]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(ha_myisam::multi_range_read_init(st_range_seq_if*, void*, unsigned int, unsigned int, st_handler_buffer*)+0x41) [0x86cfa27]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(JOIN_CACHE_BKA::init_join_matching_records(st_range_seq_if*, unsigned int)+0x11e) [0x835d0fa]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(JOIN_CACHE_BKA::join_matching_records(bool)+0x101) [0x835deb3]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(JOIN_CACHE::join_records(bool)+0x6c) [0x835c924]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(JOIN_CACHE::join_records(bool)+0x199) [0x835ca51]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(sub_select_cache(JOIN*, st_join_table*, bool)+0x96) [0x83ac682]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(sub_select(JOIN*, st_join_table*, bool)+0x66) [0x83ac2ae]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld [0x83b9cd3]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(JOIN::exec()+0xa52) [0x83d094e]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*)+0x30d) [0x83cc973]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(handle_select(THD*, LEX*, select_result*, unsigned long)+0x1ec) [0x83d24ae]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld [0x8321eed]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(mysql_execute_command(THD*)+0xa7a) [0x8323dcc]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(mysql_parse(THD*, char const*, unsigned int, char const**)+0x229) [0x832c81d]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x9e9) [0x832d39b]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(do_command(THD*)+0x241) [0x832e8fb]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(do_handle_one_connection(THD*)+0x150) [0x831ad30]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(handle_one_connection+0x25) [0x831adef]
/home2/openxs/dbs/6.0-bugfixing/libexec/mysqld(pfs_spawn_thread(void*)+0xb8) [0x885647a]
/lib/tls/i686/cmov/libpthread.so.0 [0xb76f34fb]
/lib/tls/i686/cmov/libc.so.6(clone+0x5e) [0xb7502e5e]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x8fc7b28 = SELECT SQL_SMALL_RESULT table2 . `col_varchar_key` AS field1 FROM (
C AS table1 INNER JOIN (
(
D AS table2 STRAIGHT_JOIN CC AS table3 ON ((
table3 . `pk` = table2 . `col_int_nokey` )
)
)
)
ON (table3 . `col_varchar_key` = table2 . `col_varchar_key` )
)
GROUP BY field1 ORDER BY field1, field1 , field1 LIMIT 1 /* TRANSFORM_OUTCOME_SINGLE_ROW */
thd->thread_id=2
...
[22 Apr 2010 13:19] Guilhem Bichot
Yet more minimal testcase:

SET optimizer_join_cache_level = 6;

CREATE TABLE C(a int);
INSERT INTO C VALUES(1),(2),(3),(4),(5);

CREATE TABLE D (a int(11), b varchar(1));
INSERT INTO D VALUES (6,'r'),(27,'o');

CREATE TABLE E (a int(11) primary key, b varchar(1));
INSERT INTO E VALUES
(14,'d'),(15,'z'),(16,'e'),(17,'h'),(18,'b'),(19,'s'),(20,'e'),(21,'j'),(22,'e'),(23,'f'),(24,'v'),(25,'x'),(26,'m'),(27,'c');

SELECT 1 FROM C,D,E WHERE D.a = E.a AND D.b = E.b;
drop table C,D,E;
[22 Apr 2010 14:22] Guilhem Bichot
Also goes away with optimizer_switch="mrr=off", but I'm not yet adding "mrr" in tags, because it's already tagged as BKA so bug would show up twice in our lists broken down by categories.
[23 Apr 2010 14:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/106442

3832 Guilhem Bichot	2010-04-23
      Fix for BUG#52394 "Segfault in JOIN_CACHE::get_offset () at sql_select.h:445":
      code failed to see that there were no more records in the join buffer, so tried to read one more
      record, reading random data.
     @ sql/sql_join_cache.cc
        In the test's scenario, execution plan is D,C,E. Crash happened when reading a
        record from the last join buffer (of "(D JOIN C) JOIN E"). This is a JOIN_CACHE_BKA buffer,
        because primary key access will be used to find rows from E (see the join
        condition on E: "D.a = E.a", E.a is a primary key).
        In this JOIN_CACHE_BKA buffer, a record is encoded like this:
          length | pointer to pieces of previous table's JOIN_CACHE | fields :
        - "length" is the record's total length
        - "pointer" is because we are using "incremental buffers" (optimizer_join_cache_level=6).
        - "fields" is empty because fields from E can be found in the previous
        buffer (due to join condition). Thus the start of "fields" of record N
        is also the start of "length" of record N+1 (this matters below).
        JOIN_CACHE::last_record_pos remembers the position of the last record
        in the buffer (for "EOF" detection); more exactly, the position of the start of
        fields of the last record.
        When JOIN_CACHE_BKA::get_next_key() starts it compares "pos" (position right
        after the previously read record, i.e. position of the start of "length"
        of the maybe-existing to-be-read record) with last_rec_pos, to see if it is
        now at EOF.
        So when we have read the last record, and come to JOIN_CACHE_BKA::get_next_key()
        again, pos==last_rec_pos: pos>last_rec_pos is false, code believes that there
        is a record to read, though there isn't. It then tries to read some values in
        the record (random data), uses them to build a pointer (random pointer) to
        peek in the previous buffer (incremental buffers used), and causes
        a segmentation fault.
        The fix is to compare apples to apples: "pos" is a start of length, "last_rec_pos"
        is a start of fields, this isn't the same: rather determine the position of the
        start of fields of the maybe-existing to-be-read record, and compare *that* to
        last_rec_pos: this gives a reliable EOF detection.
        The same type of correct logic is found in JOIN_CACHE::get_record(): it builds
        the "start of fields" position of the maybe-existing record (pos+=size_of_rec_len etc),
        then calls read_all_record_fields() which compares this position with last_rec_pos.
        Why doesn't it crash without incremental buffers: in that other case, E's "fields"
        are not empty as they don't exist in a previous buffer. So in get_next_key(),
        "pos" is "size of fields" bytes after last_rec_pos, and function exits properly.
[23 Apr 2010 20:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/106454

3832 Guilhem Bichot	2010-04-23
      Fix for BUG#52394 "Segfault in JOIN_CACHE::get_offset () at sql_select.h:445":
      code failed to see that there were no more records in the join buffer, so tried to read one more
      record, reading random data.
     @ sql/sql_join_cache.cc
        In the test's scenario, execution plan is D,C,E. Crash happened when reading a
        record from the last join buffer (of "(D JOIN C) JOIN E"). This is a JOIN_CACHE_BKA buffer,
        because primary key access will be used to find rows from E (see the join
        condition on E: "D.a = E.a", E.a is a primary key).
        In this JOIN_CACHE_BKA buffer, a record is encoded like this:
          length | pointer to pieces of previous table's JOIN_CACHE | fields :
        - "length" is the record's total length
        - "pointer" is because we are using "incremental buffers" (optimizer_join_cache_level=6).
        - "fields" is empty because fields from E can be found in the previous
        buffer (due to join condition). Thus the start of "fields" of record N
        is also the start of "length" of record N+1 (this matters below).
        JOIN_CACHE::last_record_pos remembers the position of the last record
        in the buffer (for "EOF" detection); more exactly, the position of the start of
        fields of the last record.
        When JOIN_CACHE_BKA::get_next_key() starts it compares "pos" (position right
        after the previously read record, i.e. position of the start of "length"
        of the maybe-existing to-be-read record) with last_rec_pos, to see if it is
        now at EOF.
        So when we have read the last record, and come to JOIN_CACHE_BKA::get_next_key()
        again, pos==last_rec_pos: pos>last_rec_pos is false, code believes that there
        is a record to read, though there isn't. It then tries to read some values in
        the record (random data), uses them to build a pointer (random pointer) to
        peek in the previous buffer (incremental buffers used), and causes
        a segmentation fault.
        The fix:
        1) compare apples to apples: "pos" is a start of length, "last_rec_pos"
        is a start of fields, this isn't the same: rather determine the position of the
        start of fields of the maybe-existing to-be-read record, and compare *that* to
        last_rec_pos: this gives a reliable EOF detection. This means that
        EOF <=> (pos + size_of_rec_len +
                  ((prev_cache != NULL) ? prev_cache->get_size_of_rec_offset() : 0))
                  > last_rec_pos                             (A)
        The same type of correct logic is found in JOIN_CACHE::get_record(): it builds
        the "start of fields" position of the maybe-existing record (pos+=size_of_rec_len etc),
        then calls read_all_record_fields() which compares this position with last_rec_pos.
        2) the correct inequality above can be simplified to
         pos >= last_record_pos      (B)
        Indeed:
        - (B) => (A) is clearly true.
        - !(B) => !(A) is true too because: if !(B), pos < last_record_pos.
        pos is the start of "length" of a thus _existing_ record.
        (pos + size_of_rec_len +
                  ((prev_cache != NULL) ? prev_cache->get_size_of_rec_offset() : 0))
        is the start of "fields" of this same record. By definition of last_rec_pos,
        this expression is thus smaller than or equal to last_rec_pos, so !(A) is true.
        3) we use (B) rather than (A) in code because when at an existing record it is
        the same amount of code, and when at EOF it is less code (a comparison, versus
        one or two additions and an if()).
        Why wasn't there any crash without incremental buffers: in that other case, E's "fields"
        are not empty as they don't exist in a previous buffer. So in get_next_key(),
        "pos" is "size of fields" bytes after last_rec_pos, ">" works, and function exits properly.
[25 Apr 2010 19:50] Olav Sandstå
Both patches are fine, I prefer the last of the to committed/proposed patches (changeset: http://lists.mysql.com/commits/106454 ).
[28 Apr 2010 4:12] Øystein Grøvlen
Evidently no concurrency control in this system. Tor and I updated this in parallel.  Tor wins since he has already approved the patch.
[28 Apr 2010 20:12] Guilhem Bichot
queued to 6.0-codebase-bugfixing
[28 Apr 2010 20:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/106872

3845 Guilhem Bichot	2010-04-28
      Fix for BUG#52394 "Segfault in JOIN_CACHE::get_offset () at sql_select.h:445":
      code failed to see that there were no more records in the join buffer, so tried to read one more
      record, reading random data.
     @ sql/sql_join_cache.cc
        In the test's scenario, execution plan is D,C,E. Crash happened when reading a
        record from the last join buffer (of "(D JOIN C) JOIN E"). This is a JOIN_CACHE_BKA buffer,
        because primary key access will be used to find rows from E (see the join
        condition on E: "D.a = E.a", E.a is a primary key).
        In this JOIN_CACHE_BKA buffer, a record is encoded like this:
          length | pointer to pieces of previous table's JOIN_CACHE | fields :
        - "length" is the record's total length
        - "pointer" is because we are using "incremental buffers" (optimizer_join_cache_level=6).
        - "fields" is empty because fields from E can be found in the previous
        buffer (due to join condition). Thus the start of "fields" of record N
        is also the start of "length" of record N+1 (this matters below).
        JOIN_CACHE::last_record_pos remembers the position of the last record
        in the buffer (for "EOF" detection); more exactly, the position of the start of
        fields of the last record.
        When JOIN_CACHE_BKA::get_next_key() starts it compares "pos" (position right
        after the previously read record, i.e. position of the start of "length"
        of the maybe-existing to-be-read record) with last_rec_pos, to see if it is
        now at EOF.
        So when we have read the last record, and come to JOIN_CACHE_BKA::get_next_key()
        again, pos==last_rec_pos: pos>last_rec_pos is false, code believes that there
        is a record to read, though there isn't. It then tries to read some values in
        the record (random data), uses them to build a pointer (random pointer) to
        peek in the previous buffer (incremental buffers used), and causes
        a segmentation fault.
        The fix:
        1) compare apples to apples: "pos" is a start of length, "last_rec_pos"
        is a start of fields, this isn't the same: rather determine the position of the
        start of fields of the maybe-existing to-be-read record, and compare *that* to
        last_rec_pos: this gives a reliable EOF detection. This means that
        EOF <=> (pos + size_of_rec_len +
                  ((prev_cache != NULL) ? prev_cache->get_size_of_rec_offset() : 0))
                  > last_rec_pos                             (A)
        The same type of correct logic is found in JOIN_CACHE::get_record(): it builds
        the "start of fields" position of the maybe-existing record (pos+=size_of_rec_len etc),
        then calls read_all_record_fields() which compares this position with last_rec_pos.
        2) the correct inequality above can be simplified to
         pos >= last_record_pos      (B)
        Indeed:
        - (B) => (A) is clearly true.
        - !(B) => !(A) is true too because: if !(B), pos < last_record_pos.
        pos is the start of "length" of a thus _existing_ record.
        (pos + size_of_rec_len +
                  ((prev_cache != NULL) ? prev_cache->get_size_of_rec_offset() : 0))
        is the start of "fields" of this same record. By definition of last_rec_pos,
        this expression is thus smaller than or equal to last_rec_pos, so !(A) is true.
        3) we use (B) rather than (A) in code because when at an existing record it is
        the same amount of code, and when at EOF it is less code (a comparison, versus
        one or two additions and an if()).
        Why wasn't there any crash without incremental buffers: in that other case, E's "fields"
        are not empty as they don't exist in a previous buffer. So in get_next_key(),
        "pos" is "size of fields" bytes after last_rec_pos, ">" works, and function exits properly.
[7 May 2010 9:21] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100507091908-vqyhpwf2km0aokno) (version source revid:alik@sun.com-20100507091737-12vceffs11elb25g) (merge vers: 6.0.14-alpha) (pib:16)
[8 May 2010 14:49] Guilhem Bichot
backported to next-mr-opt-backporting guilhem@mysql.com-20100508134814-38ifb3dfx1jwr5kz
[8 May 2010 16:54] Paul DuBois
Noted in 6.0.14 changelog.

The server tried to read too many records from the join cache,
resulting in a crash.
[16 Aug 2010 6:34] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:05] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 3:23] Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.