Bug #42991 invalid memory access and/or crash when using index condition pushdown + innodb
Submitted: 18 Feb 2009 19:59 Modified: 14 May 2011 1:43
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:6.0.9,6.0.12,6.0-bzr OS:Any
Assigned to: Olav Sandstå
Tags: index_condition_pushdown, optimizer_switch
Triage: Triaged: D1 (Critical)

[18 Feb 2009 19:59] Shane Bester
Description:
I had an intermittently crashing query, and intermittent valgrind errors at this callstack:  

mysqld.exe!my_strnncollsp_simple()[ctype-simple.c:166]                     
mysqld.exe!Field_blob::cmp()[field.cc:7729]                                
mysqld.exe!Field_blob::key_cmp()[field.cc:7842]                            
mysqld.exe!key_cmp()[key.cc:478]                                           
mysqld.exe!handler::compare_key2()[handler.cc:5070]                        
mysqld.exe!index_cond_func_innodb()[ha_innodb.cc:8411]                     
mysqld.exe!row_search_for_mysql()[row0sel.c:4193]                          
mysqld.exe!sortcmp()[sql_string.cc:669]                                    
mysqld.exe!srv_conc_enter_innodb()[srv0srv.c:989]                          
mysqld.exe!ha_innobase::index_next()[ha_innodb.cc:4481]                    
mysqld.exe!handler::read_range_next()[handler.cc:5027]                     
mysqld.exe!handler::multi_range_read_next()[handler.cc:4288]               
mysqld.exe!DsMrr_impl::dsmrr_next()[handler.cc:4540]                       
mysqld.exe!ha_innobase::multi_range_read_next()[ha_innodb.cc:8374]         
mysqld.exe!QUICK_RANGE_SELECT::get_next()[opt_range.cc:8539]               
mysqld.exe!rr_quick()[records.cc:322]                                      
mysqld.exe!sub_select()[sql_select.cc:14083]                               
mysqld.exe!do_select()[sql_select.cc:13762]                                
mysqld.exe!JOIN::exec()[sql_select.cc:2439]                                
mysqld.exe!mysql_select()[sql_select.cc:3043]                              
mysqld.exe!handle_select()[sql_select.cc:298]                              
mysqld.exe!execute_sqlcom_select()[sql_parse.cc:4747]                      
mysqld.exe!mysql_execute_command()[sql_parse.cc:2007]                      
mysqld.exe!mysql_parse()[sql_parse.cc:5735]                                
mysqld.exe!dispatch_command()[sql_parse.cc:1009]                           
mysqld.exe!do_command()[sql_parse.cc:690]                                  
mysqld.exe!handle_one_connection()[sql_connect.cc:1145]                    
mysqld.exe!pthread_start()[my_winthread.c:61]                              
mysqld.exe!_callthreadstartex()[threadex.c:348]                            
mysqld.exe!_threadstartex()[threadex.c:326]                                
kernel32.dll!FlsSetValue()                                                 

How to repeat:
run mysqld under valgrind.
import the attached file.
might be tricky to repeat, to try it a few times, using debug and release server...
[18 Feb 2009 20:04] Shane Bester
import a few times against mysqld under valgrind, wait for errors...

Attachment: bug42991.sql (text/x-sql), 63.73 KiB.

[18 Feb 2009 20:05] Shane Bester
full debug infos, valgrind outputs, etc.

Attachment: bug42991_debug_valgrind_infos.txt (text/plain), 10.37 KiB.

[19 Feb 2009 7:12] Sveta Smirnova
Thank you for the report.

Valgrind warnings verified as described:

==14704== Invalid read of size 1
==14704==    at 0x8834A1C: my_strnncollsp_simple (ctype-simple.c:167)
==14704==    by 0x82944C8: Field_blob::cmp(unsigned char const*, unsigned, unsigned char const*, unsigned) (field.cc:7734)
==14704==    by 0x8294999: Field_blob::key_cmp(unsigned char const*, unsigned) (field.cc:7847)
==14704==    by 0x82A1A72: key_cmp(st_key_part_info*, unsigned char const*, unsigned) (key.cc:477)
==14704==    by 0x840B112: handler::compare_key2(st_key_range*) (handler.cc:5127)
==14704==    by 0x868C958: index_cond_func_innodb (ha_innodb.cc:8543)
==14704==    by 0x86F3EFE: row_search_for_mysql (row0sel.c:4143)
==14704==    by 0x8686A5F: ha_innobase::general_fetch(unsigned char*, unsigned, unsigned) (ha_innodb.cc:4656)
==14704==    by 0x8686B55: ha_innobase::index_next(unsigned char*) (ha_innodb.cc:4692)
==14704==    by 0x840AE79: handler::read_range_next() (handler.cc:5083)
==14704==    by 0x868CA32: ha_innobase::read_range_next() (ha_innodb.cc:8578)
==14704==    by 0x8409558: handler::multi_range_read_next(char**) (handler.cc:4304)
==14704==    by 0x840A084: DsMrr_impl::dsmrr_next(char**) (handler.cc:4599)
==14704==    by 0x868C853: ha_innobase::multi_range_read_next(char**) (ha_innodb.cc:8505)
==14704==    by 0x83F12F0: QUICK_RANGE_SELECT::get_next() (opt_range.cc:8558)
==14704==    by 0x83FD590: rr_quick(READ_RECORD*) (records.cc:322)
==14704==  Address 0x28438cd0 is 96 bytes inside a block of size 16,464 free'd
==14704==    at 0x400542D: free (vg_replace_malloc.c:323)
==14704==    by 0x8710057: ut_free (ut0mem.c:252)
==14704==    by 0x86CFC9D: mem_area_free (mem0pool.c:472)
==14704==    by 0x86CF069: mem_heap_block_free (mem0mem.c:524)
==14704==    by 0x86CE29C: mem_heap_free_func (mem0mem.ic:487)
==14704==    by 0x86F1C12: row_sel_store_mysql_rec (row0sel.c:2606)
==14704==    by 0x86F3EE7: row_search_for_mysql (row0sel.c:4141)
==14704==    by 0x8686A5F: ha_innobase::general_fetch(unsigned char*, unsigned, unsigned) (ha_innodb.cc:4656)
==14704==    by 0x8686B55: ha_innobase::index_next(unsigned char*) (ha_innodb.cc:4692)
==14704==    by 0x840AE79: handler::read_range_next() (handler.cc:5083)
==14704==    by 0x868CA32: ha_innobase::read_range_next() (ha_innodb.cc:8578)
==14704==    by 0x8409558: handler::multi_range_read_next(char**) (handler.cc:4304)
==14704==    by 0x840A084: DsMrr_impl::dsmrr_next(char**) (handler.cc:4599)
==14704==    by 0x868C853: ha_innobase::multi_range_read_next(char**) (ha_innodb.cc:8505)
==14704==    by 0x83F12F0: QUICK_RANGE_SELECT::get_next() (opt_range.cc:8558)
==14704==    by 0x83FD590: rr_quick(READ_RECORD*) (records.cc:322)
[19 Jun 2009 10:13] Shane Bester
Got another crash with engine_condition_pushdown=1:

Thread 11:
Invalid write of size 4
at:  mem_pool_fill_free_list (mem0pool.c:302)
by: mem_pool_fill_free_list (mem0pool.c:286)
by: mem_area_alloc (mem0pool.c:346)
by: mem_heap_create_block (mem0mem.c:370)
by: mem_heap_add_block (mem0mem.c:465)
by: lock_rec_create (mem0mem.ic:155)
by: lock_rec_lock (lock0lock.c:1972)
by: lock_clust_rec_read_check_and_lock (lock0lock.c:5159)
by: row_search_for_mysql (row0sel.c:851)
by: ha_innobase::general_fetch (ha_innodb.cc:4671)
by: ha_innobase::rnd_next(unsigned char*) (ha_innodb.cc:4860)
by: rr_sequential(READ_RECORD*) (records.cc:390)
Address 0x10 is not stack'd, malloc'd or (recently) free'd
[8 Oct 2009 12:24] Guilhem Bichot
goes away / comes back when disabling/enabling ICP in InnoDB
[30 Nov 2009 20:38] Sergey Petrunya
If I add EXPLAIN to the testcase, I get this:

+explain select * from `table5` where  (col2 <= '6566-06-15' AND col24 <> 'd')  group by `col83` order by `col83` desc  ;
+id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
+1	SIMPLE	table5	range	PRIMARY,idx1,idx3,idx4	idx3	89	NULL	1	Using index condition; Using where; Using temporary; Using filesort

it is using idx3 which is defined as 

  KEY `idx3` (`col2`(86))

push_index_cond() and co correctly detect that "col2 <= '6566-06-15" part of the condition cannot be checked by using the index.
The "col24 <> 'd'" part does get pushed (col24 is a clustered primary key, so it is part of the index).  Since index condition is present, it will try to decode the entire index tuple. I didn't investigate what exactly goes wrong when that is done, but it can be wrong on various levels as we don't ever try decoding blob columns from index images of their prefixes.

Suggested solution right now is to disable index condition for indexes that have partially-covered columns (check key_part->flag & HA_PART_KEY_SEG for all key parts?)
[4 Feb 2010 11:39] Olav Sandstå
I have simplified the test case for reproducing this valgrind error:

CREATE TABLE t1 (
  c1 TINYTEXT,
  c2 INT NOT NULL,
  PRIMARY KEY (c2),
  KEY id1 (c1(4))
) ENGINE=InnoDB;

INSERT INTO t1 VALUES ('Anastasia', 5);
INSERT INTO t1 VALUES ('Karianne', 4);

SELECT * FROM t1 WHERE (c1 <= '6566-06-15' AND c2 <> 3);
[4 Feb 2010 11:41] Olav Sandstå
Simplied version of test case for reproducing bug

Attachment: bug42991.test (application/octet-stream, text), 671 bytes.

[11 Feb 2010 20:33] Olav Sandstå
Here is a summary of what causes valgrind to report reading of
uninitialized data (using the simplified test case as an example).

1. It uses the id1 index (defined as KEY c1(4) ) and the following Index
   condition is push down to InnoDB:

       `test`.`t1`.`c2` <> 3

   This is similar to what Sergey Petrunya have commented above about
   the original test case.

2. In build_template() in ha_innodb.cc the mysql_template array is set
   up to include which fields to read from the index record (since ICP
   is in use). Both fields of the index entry (c1(4), c2) is
   considered for inclusion but only c2 is added to the mysql_template
   array's "index part". The reason c1 is not included is due to the
   following test:

         register const ibool    index_covers_field = 
                  field->part_of_key.is_set(file->active_index);

   This is evaluated to false since the c1 field is not part of the
   key/index and because of this c1 is not included in mysql_template
   array (at least not the index part of it - it is included later).

3. When the first row is read from the table and the code at the
   idx_cond_check label in row_search_for_mysql() (in row/row0sel.c)
   reads the content of the index entry based on the mysql_template
   array:

     row_sel_store_mysql_rec(buf, prebuilt, rec,
                                  offsets, 0, prebuilt->n_index_fields);

   This only reads the c2 key part from the index entry (not the c1(4) key part)
   into the mysql record.

4. Then the actual ICP test is run:

     res= prebuilt->idx_cond_func(prebuilt->idx_cond_func_arg);

   This calls the following function (see ha_innodb.cc):

/*
  Index condition check function to be called from within Innobase.
  See note on ICP_RESULT for return values description.
*/

static uint index_cond_func_innodb(void *arg)
{
  ha_innobase *h= (ha_innobase*)arg;
  if (h->end_range)
  {
    if (h->compare_key2(h->end_range) > 0)
      return 2; /* caller should return HA_ERR_END_OF_FILE already */
  }
  return test(h->pushed_idx_cond->val_int());
}

   In this case a range read is used and before the actual index
   condition is evaluated a test whether the index-entry is within the
   range is done (h->compare_key2(h->end_range). The end of the range
   is defined as:

     c1 <= '6566-06-15'

   This is compared against the content of the "c1" field in the
   mysql_record which at this point has not yet been read into the
   mysql_record (due to this field not being included in the "index
   part" of the mysql_template array). This leads to reading
   "un-initialized data".

In this case the test case did produce the correct result but based on
the above it would likely be possible to create a test case that
produced the wrong result.

A patch that disables ICP in the case where the index entry only
contains a prefix of the complete field value will be submitted
shortly (test similar to what suggested by Sergey).
[11 Feb 2010 21:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/100041

3882 Olav.Sandstaa@sun.com	2010-02-11
      Fix for Bug#42991 invalid memory access and/or crash when using index condition pushdown + innodb
      
      The valgrind error about reading uninitialized memory occurs because
      the index used for index condition pushdown contained a prefix of a
      BLOB field. This lead to this field not being read in from neither the
      index nor the real record due to not "covering the real field" but was
      still used when the index condition pushdown expression and the end of
      range test was executed.
      
      The fix for this problem is to not accept a pushed index condition in
      the case where the index contains key parts that contains only a part
      of the real field.
      
      The test case for this bug will be integrated into the innodb_icp test
      when the patch for Bug#43360 has been pushed.
     @ mysql-test/r/bug42991.result
        Result file for test for Bug#42991.
     @ mysql-test/t/bug42991.test
        Test case for Bug#42991.
     @ storage/innobase/handler/ha_innodb.cc
        When considering if InnoDB should accept a pushed index condition add an
        extra test to check that the index does not contains key parts that
        are not completely stored in the index entry (e.g. prefix of BLOB fields).
        If this is the case then reject the pushed index condition.
[5 Mar 2010 17:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/102454
[5 May 2010 11:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/107499
[1 Sep 2010 9:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/117319

3232 Olav Sandstaa	2010-09-01
      Fix for Bug#42991 invalid memory access and/or crash when using index condition pushdown + innodb
                  
      The valgrind error about reading uninitialized memory occurs because
      the index used for the range scan and index condition pushdown
      contained a prefix of a BLOB (TINYTEXT) field. This lead to this field
      not being read in from the index due to not "covering the real field"
      but was still used when the the end of range test was executed.
            
      Using the attached test case as example: The optimizer decides to use
      an index on a prefix of a TINYTEXT column for doing a range scan. In
      this case InnoDB will evaluate records against the end range value. In
      the case where ICP is not in use it will do the end-range check
      against the TINYTEXT value read from the base record. But in this case
      ICP is in use and a condtion on the primary key is pushed down to
      InnoDB. When ICP is in use the end-range evaluation is done by the
      "icp code" in InnoDB and evaluated against values read from the index
      (see the code in the function index_cond_func_innodb() in
      ha_innodb.cc). Unfortunately, since the index entry only contains a
      prefix of the TINYTEXT field it is not read from the index and the
      end-range evaluation in index_cond_func_innodb() is done against
      "uninitialized data".
            
      The implemented solution for this problem is to extend the ICP
      implementation in InnoDB to also include support for reading index
      entries that are prefixes of BLOB columns and use these for doing end
      range evaluation. When setting up the mysql_template array (in
      build_template() in ha_innodb.cc) that contains which fields to read
      from index entries and base records we will now include reading index
      fields that are prefixes of columns in the base record. Since these
      only contains the prefix for the value it is necessary to read both
      the index entry and the corresponding column value from the base
      record in this case.
     @ mysql-test/include/icp_tests.inc
        Test case for Bug#42991 "invalid memory access and/or crash when using index condition pushdown + innodb".
     @ mysql-test/r/innodb_icp.result
        Result file for test case for Bug#42991 "invalid memory access and/or crash when using index condition pushdown + innodb".
     @ mysql-test/r/innodb_icp_all.result
        Result file for test case for Bug#42991 "invalid memory access and/or crash when using index condition pushdown + innodb".
     @ mysql-test/r/innodb_icp_none.result
        Result file for test case for Bug#42991 "invalid memory access and/or crash when using index condition pushdown + innodb".
     @ mysql-test/r/myisam_icp.result
        Result file for test case for Bug#42991 "invalid memory access and/or crash when using index condition pushdown + innodb".
     @ mysql-test/r/myisam_icp_all.result
        Result file for test case for Bug#42991 "invalid memory access and/or crash when using index condition pushdown + innodb".
     @ mysql-test/r/myisam_icp_none.result
        Result file for test case for Bug#42991 "invalid memory access and/or crash when using index condition pushdown + innodb".
     @ storage/innobase/dict/dict0dict.c
        Add new function for reading out the position of a specified column 
        or prefix of a column in an index entry: dict_index_get_nth_col_or_prefix_pos().
        The implementation is done by moving the code for the existing 
        dict_index_get_nth_col_pos() function to a common function and having the 
        new function (dict_index_get_nth_col_or_prefix_pos() and the existing 
        function (dict_index_get_nth_col_pos())calling this common function with a
        boolean parameter saying if it should include prefixes in the search for
        a column in the index entry.
     @ storage/innobase/handler/ha_innodb.cc
        Extend build_template() to also including adding reading of prefixes 
        of column values from index entries when ICP is in use. This is done
        by including these in the mysql_template array. Since we now might
        read both a prefix of a column value from the index entry and the
        actual value from the base record the size of the mysql_template array
        has to be doubled when ICP is in use.
     @ storage/innobase/include/dict0dict.h
        Add new function for reading out the position of a specified column 
        or prefix of a column in an index entry: dict_index_get_nth_col_or_prefix_pos().
     @ storage/innobase/include/row0mysql.h
        Add a member to the row_prebuilt_struct to store the size of
        the allocated mysql_template array.
[8 Nov 2010 8:21] Olav Sandstå
The fix for this problem has been included in the following change set:
marko.makela@oracle.com-20101025120429-m275rpb5ylbqdccs .
[8 Nov 2010 8:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/123044
[8 Nov 2010 8:53] Olav Sandstå
Test case pushed to mysql-next-mr-opt-backporting with revision id:
olav.sandstaa@oracle.com-20101108084916-favb0td7g7qrajkh .
[5 Dec 2010 12:39] Bugs System
Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (version source revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (merge vers: 5.6.1) (pib:23)
[11 Dec 2010 17:41] Paul Dubois
Bug does not appear in any released 5.6.x version. No changelog entry needed.