Bug #37348 Crash in or immediately after JOIN::make_sum_func_list
Submitted: 11 Jun 2008 15:30 Modified: 17 Oct 2008 17:32
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:5.1/6.0-falcon OS:Any
Assigned to: Georgi Kodinov CPU Architecture:Any
Tags: regression

[11 Jun 2008 15:30] Philip Stoev
Description:
When executing a DML workload, mysqld crashed as follows:

#0  0x00110416 in __kernel_vsyscall ()
#1  0x00581c78 in pthread_kill () from /lib/libpthread.so.0
#2  0x0843eda3 in write_core (sig=11) at stacktrace.c:302
#3  0x0829b228 in handle_segfault (sig=11) at mysqld.cc:2626
#4  <signal handler called>
#5  0x00000000 in ?? ()
#6  0x0830bdee in JOIN::make_sum_func_list (this=0xa11cc30, field_list=@0xa11df68, send_fields=@0xa11df8c, before_group_by=true, recompute=true)
    at sql_select.cc:17730
#7  0x08334df5 in JOIN::exec (this=0xa11cc30) at sql_select.cc:2589
#8  0x0824f73b in subselect_single_select_engine::exec (this=0xa1186b0) at item_subselect.cc:2277
#9  0x0824c10e in Item_subselect::exec (this=0xa118610) at item_subselect.cc:280
#10 0x0824cf8b in Item_singlerow_subselect::val_int (this=0xa118610) at item_subselect.cc:629
#11 0x081e71ac in Item::val_int_result (this=0xa118610) at ../item.h:738
#12 0x081d5ace in Item_cache_int::store (this=0xa119e40, item=0xa118610) at item.cc:6647
#13 0x0824c2b1 in Item_singlerow_subselect::store (this=0xa118d48, i=0, item=0xa118610) at item_subselect.cc:541
#14 0x082825d2 in select_singlerow_subselect::send_data (this=0xa118dd8, items=@0xa11bf8c) at sql_class.cc:2123
#15 0x08309a48 in end_send (join=0xa11ac48, join_tab=0xa11c1ec, end_of_records=false) at sql_select.cc:14406
#16 0x08315b79 in evaluate_join_record (join=0xa11ac48, join_tab=0xa11c038, error=0) at sql_select.cc:13568
#17 0x08315da5 in sub_select (join=0xa11ac48, join_tab=0xa11c038, end_of_records=false) at sql_select.cc:13342
#18 0x08322d70 in do_select (join=0xa11ac48, fields=0xa11bf8c, table=0x0, procedure=0x0) at sql_select.cc:13092
#19 0x08335655 in JOIN::exec (this=0xa11ac48) at sql_select.cc:2740
#20 0x0824f73b in subselect_single_select_engine::exec (this=0xa118de8) at item_subselect.cc:2277
#21 0x0824c10e in Item_subselect::exec (this=0xa118d48) at item_subselect.cc:280
#22 0x0824cf8b in Item_singlerow_subselect::val_int (this=0xa118d48) at item_subselect.cc:629
#23 0x081ddb06 in Item::save_in_field (this=0xa118d48, field=0xa114388, no_conversions=false) at item.cc:4886
#24 0x082ea7c9 in fill_record (thd=0xa0aa1a0, fields=@0xa0ab810, values=@0xa0fa628, ignore_errors=false) at sql_base.cc:7964
#25 0x082ea8d1 in fill_record_n_invoke_before_triggers (thd=0xa0aa1a0, fields=@0xa0ab810, values=@0xa0fa628, ignore_errors=false, triggers=0x0,
    event=TRG_EVENT_INSERT) at sql_base.cc:8009
#26 0x0833fec9 in mysql_insert (thd=0xa0aa1a0, table_list=0xa0fa2d0, fields=@0xa0ab810, values_list=@0xa0ab834, update_fields=@0xa0ab828,
    update_values=@0xa0ab81c, duplic=DUP_ERROR, ignore=false) at sql_insert.cc:754
#27 0x082ae991 in mysql_execute_command (thd=0xa0aa1a0) at sql_parse.cc:2951
#28 0x082b4ec6 in mysql_parse (thd=0xa0aa1a0,
    inBuf=0xa0fa020 "INSERT INTO B ( int_key , int_nokey ) VALUES ( ( SELECT ( SELECT COUNT( int_nokey ) FROM ( SELECT COUNT( int_nokey ) FROM A AS X WHERE X . pk < 20   LIMIT 20 ) AS X WHERE X . int_nokey < 61  GROUP BY "..., length=284, found_semicolon=0xa9e29260) at sql_parse.cc:5782
#29 0x082b590f in dispatch_command (command=COM_QUERY, thd=0xa0aa1a0,
    packet=0xa1052f9 "INSERT INTO B ( int_key , int_nokey ) VALUES ( ( SELECT ( SELECT COUNT( int_nokey ) FROM ( SELECT COUNT( int_nokey ) FROM A AS X WHERE X . pk < 20   LIMIT 20 ) AS X WHERE X . int_nokey < 61  GROUP BY "..., packet_length=284) at sql_parse.cc:1059
#30 0x082b6b75 in do_command (thd=0xa0aa1a0) at sql_parse.cc:732
#31 0x082a4385 in handle_one_connection (arg=0xa0aa1a0) at sql_connect.cc:1134
#32 0x0057d32f in start_thread () from /lib/libpthread.so.0
#33 0x0049a27e in clone () from /lib/libc.so.6

The last stack frame may be corrupted, so here is the previous one:

(gdb) frame 6
#6  0x0830bdee in JOIN::make_sum_func_list (this=0xa11cc30, field_list=@0xa11df68, send_fields=@0xa11df8c, before_group_by=true, recompute=true)
    at sql_select.cc:17730
17730       if (item->type() == Item::SUM_FUNC_ITEM && !item->const_item() &&
(gdb) list
17725       DBUG_RETURN(FALSE); /* We have already initialized sum_funcs. */
17726
17727     func= sum_funcs;
17728     while ((item=it++))
17729     {
17730       if (item->type() == Item::SUM_FUNC_ITEM && !item->const_item() &&
17731           (!((Item_sum*) item)->depended_from() ||
17732            ((Item_sum *)item)->depended_from() == select_lex))
17733         *func++= (Item_sum*) item;
17734     }

How to repeat:
A simplifed test case will hopefully follow shortly.
[11 Jun 2008 20:03] Philip Stoev
Unsimplifed test case for bug 37348

Attachment: bug37348.test (application/octet-stream, text), 68.96 KiB.

[11 Jun 2008 20:05] Philip Stoev
Please find attached a very hairy test for this bug. Apologies for not being able to reduce this further -- there are other pending bugs. Let me know if simplification is required to fix this bug.
[22 Jul 2008 16:43] MySQL Verification Team
Back to verified. Crash with current source server 6.0/5.1. I wasn't able to repeat the crash with version 6.0.2-debug.
[8 Sep 2008 13:38] Sergey Petrunya
A simpler testcase

Attachment: bug37348-simpler-testcase.sql (, text), 8.65 KiB.

[8 Sep 2008 13:51] Sergey Petrunya
Stack trace for the new testcase for 5.1.30-debug:

  Program received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 0xb49d4b90 (LWP 18329)]
  0x086cff46 in _checkchunk (irem=0x8, filename=0x8811f76 "sql_base.cc", lineno=818) at safemalloc.c:470
(gdb) wher
  #0  0x086cff46 in _checkchunk (irem=0x8, filename=0x8811f76 "sql_base.cc", lineno=818) at safemalloc.c:470
  #1  0x086d0151 in _sanity (filename=0x8811f76 "sql_base.cc", lineno=818) at safemalloc.c:515
  #2  0x086cf85b in _myfree (ptr=0x8f2a168, filename=0x8811f76 "sql_base.cc", lineno=818, myflags=0) at safemalloc.c:265
  #3  0x08315453 in free_io_cache (table=0x8f32c50) at sql_base.cc:818
  #4  0x08339f58 in JOIN::cleanup (this=0x8f3d140, full=true) at sql_select.cc:6730
  #5  0x0833a1cc in JOIN::join_free (this=0x8f3d140) at sql_select.cc:6660
  #6  0x0833a7a7 in do_select (join=0x8f3d140, fields=0x8f3e290, table=0x0, procedure=0x0) at sql_select.cc:10832
  #7  0x08353dbf in JOIN::exec (this=0x8f3d140) at sql_select.cc:2182
  #8  0x0834ec14 in mysql_select (thd=0x8ee08d0, rref_pointer_array=0x8ee1d60, tables=0x8f37728, wild_num=0, fields=@0x8ee1cfc, conds=0x8f37a18, og_num=1, order=0x0, group=0x8f37ba8, having=0x0, proc_param=0x0, select_options=2147764736, result=0x8f37c40, unit=0x8ee19f0, select_lex=0x8ee1c68) at sql_select.cc:2361
  #9  0x083540f3 in handle_select (thd=0x8ee08d0, lex=0x8ee1994, result=0x8f37c40, setup_tables_done_option=0) at sql_select.cc:269
  #10 0x082c72f5 in execute_sqlcom_select (thd=0x8ee08d0, all_tables=0x8f37728) at sql_parse.cc:4826
  #11 0x082c8c0a in mysql_execute_command (thd=0x8ee08d0) at sql_parse.cc:2134
  #12 0x082d18ce in mysql_parse (thd=0x8ee08d0, inBuf=0x8f36898 "SELECT     (SELECT COUNT( int_nokey )      FROM derived1 AS X      WHERE        X.int_nokey < 61      GROUP BY pk      LIMIT 1)   FROM D AS X   WHERE X.int_key < 13    GROUP BY int_nokey LIMIT 1", length=194, found_semicolon=0xb49d425c) at sql_parse.cc:5717
  #13 0x082d24d0 in dispatch_command (command=COM_QUERY, thd=0x8ee08d0, packet=0x8f2e839 "", packet_length=194) at sql_parse.cc:1152

(gdb) p data
  $2 = 0x20 <Address 0x20 out of bounds>
[9 Sep 2008 17:15] Sergey Petrunya
An even simpler testcase

Attachment: bug37348-simpler-testcase-r2.sql (, text), 8.64 KiB.

[9 Sep 2008 17:17] Sergey Petrunya
Running the query on 5.1-valgrind produces a number of errors. The first ones are:

==18653== Invalid write of size 4
==18653==    at 0x838B67F: Copy_field::set(unsigned char*, Field*) (field_conv.cc:523)
==18653==    by 0x82CDB43: setup_copy_fields(THD*, TMP_TABLE_PARAM*, Item**, List<Item>&, List<Item>&, unsigned, List<Item>&) (sql_select.cc:
14872)
==18653==    by 0x82E35DD: JOIN::exec() (sql_select.cc:2012)
==18653==    by 0x8218CC8: subselect_single_select_engine::exec() (item_subselect.cc:1958)
==18653==    by 0x82192E4: Item_subselect::exec() (item_subselect.cc:263)
==18653==    by 0x8218308: Item_singlerow_subselect::val_int() (item_subselect.cc:567)
==18653==    by 0x81B4453: Item::send(Protocol*, String*) (item.cc:5275)
==18653==    by 0x824AEA8: select_send::send_data(List<Item>&) (sql_class.cc:1553)
==18653==    by 0x82C1115: end_send(JOIN*, st_join_table*, bool) (sql_select.cc:11910)
==18653==    by 0x82C1F0D: evaluate_join_record(JOIN*, st_join_table*, int) (sql_select.cc:11170)
==18653==    by 0x82C806C: sub_select(JOIN*, st_join_table*, bool) (sql_select.cc:11055)
==18653==    by 0x82D9D24: do_select(JOIN*, List<Item>*, st_table*, Procedure*) (sql_select.cc:10811)
==18653==    by 0x82E3DF5: JOIN::exec() (sql_select.cc:2182)
==18653==    by 0x82E4B56: mysql_select(THD*, Item***, TABLE_LIST*, unsigned, List<Item>&, Item*, unsigned, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) (sql_select.cc:2361)
==18653==    by 0x82E50AE: handle_select(THD*, st_lex*, select_result*, unsigned long) (sql_select.cc:269)
==18653==    by 0x8268F0E: execute_sqlcom_select(THD*, TABLE_LIST*) (sql_parse.cc:4827)
==18653==  Address 0x4ffa224 is 4 bytes after a block of size 96 alloc'd
==18653==    at 0x4022E38: malloc (in /usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==18653==    by 0x85DDDE9: my_malloc (my_malloc.c:34)
==18653==    by 0x85DE808: alloc_root (my_alloc.c:158)
==18653==    by 0x820A1D9: sql_alloc(unsigned) (thr_malloc.cc:65)
==18653==    by 0x82CD96E: setup_copy_fields(THD*, TMP_TABLE_PARAM*, Item**, List<Item>&, List<Item>&, unsigned, List<Item>&) (sql_list.h:34)
==18653==    by 0x82E35DD: JOIN::exec() (sql_select.cc:2012)
==18653==    by 0x8218CC8: subselect_single_select_engine::exec() (item_subselect.cc:1958)
==18653==    by 0x82192E4: Item_subselect::exec() (item_subselect.cc:263)
==18653==    by 0x8218308: Item_singlerow_subselect::val_int() (item_subselect.cc:567)
==18653==    by 0x81B4453: Item::send(Protocol*, String*) (item.cc:5275)
==18653==    by 0x824AEA8: select_send::send_data(List<Item>&) (sql_class.cc:1553)
==18653==    by 0x82C1115: end_send(JOIN*, st_join_table*, bool) (sql_select.cc:11910)
==18653==    by 0x82C1F0D: evaluate_join_record(JOIN*, st_join_table*, int) (sql_select.cc:11170)
==18653==    by 0x82C806C: sub_select(JOIN*, st_join_table*, bool) (sql_select.cc:11055)
==18653==    by 0x82D9D24: do_select(JOIN*, List<Item>*, st_table*, Procedure*) (sql_select.cc:10811)
==18653==    by 0x82E3DF5: JOIN::exec() (sql_select.cc:2182)
==18653== 

I still have no idea about the cause. It seems to be some problem with complicated grouping cases.
[12 Sep 2008 12:42] Davi Arnaut
FWIW, the probably cause of the valgrind warning is that a copy (setup_copy_fields) gets incremented to past the end of the allocated array, so a write to any of its elements generates a warning. And judging by a quick read of the code, seems possible if all_fields.elements > param->field_count.
[23 Sep 2008 6:53] Georgi Kodinov
The problem here is in the combination of a subquery with a GROUP BY, aggregate function that's calculated outside of it and a GROUP BY on the outer SELECT.
The optimizer changes the aggregate functions that will be evaluated in an outer select with a special kind of Item_ref. Then it counts the expressions of different kinds in the statement. And then it determines which fields it needs to copy to calculate GROUP BY. But before doing the last count an other subquery may calculate the result of the aggregate that's calculated into it and substitute the actual aggregate with a column from the temporary table.
And this will cause inconsistency between the items counted and the types of items actually present (and hence a crash or a valgrind error)
[23 Sep 2008 9:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/54479

2688 Georgi Kodinov	2008-09-23
      Bug #37348: Crash in or immediately after JOIN::make_sum_func_list
      
      Materializing temporary tables in outer contexts can cause the code that 
      allocates the fields that need to be copied for GROUP BY to decide to
      copy more fields than initially counted.
      Fixed by checking for the special case when it's actually wrong to 
      consider an Item_field as a thing that needs copying.
[29 Sep 2008 17:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/54676

2688 Georgi Kodinov	2008-09-29
      Bug #37348: Crash in or immediately after JOIN::make_sum_func_list
            
      Materializing temporary tables in outer contexts can cause the code that 
      allocates the fields that need to be copied for GROUP BY to decide to
      copy more fields than initially counted.
      Fixed by checking for the special case when it's actually wrong to 
      consider an Item_field as a thing that needs copying.
[2 Oct 2008 9:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55031

2757 Georgi Kodinov	2008-10-02
      Bug #37348: Crash in or immediately after JOIN::make_sum_func_list
                  
      Materializing temporary tables in outer contexts can cause the code that 
      allocates the fields that need to be copied for GROUP BY to decide to
      copy more fields than initially counted.
      Fixed by checking for the special case when it's actually wrong to 
      consider an Item_field as a thing that needs copying.
[2 Oct 2008 9:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55037

2757 Georgi Kodinov	2008-10-02
      Bug #37348: Crash in or immediately after JOIN::make_sum_func_list
                  
      Materializing temporary tables in outer contexts can cause the code that 
      allocates the fields that need to be copied for GROUP BY to decide to
      copy more fields than initially counted.
      Fixed by checking the type of the original item before considering an 
      Item_field referenced through an Item_aggregate_ref as a thing that 
      needs copying.
      Added an assertion to make sure bugs that cause similar discrepancy 
      don't go undetected.
[2 Oct 2008 10:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55043

2757 Georgi Kodinov	2008-10-02
      Bug #37348: Crash in or immediately after JOIN::make_sum_func_list
      
      The optimizer pulls up aggregate functions which should be aggregated in
      an outer select. At some point it may substitute such a function for a field
      in the temporary table. The setup_copy_fields function doesn't take this
      into account and may overrun the copy_field buffer.
      
      Fixed by checking the type of the original item before considering an 
      Item_field referenced through an Item_aggregate_ref as a thing that 
      needs copying.
      Added an assertion to make sure bugs that cause similar discrepancy 
      don't go undetected.
[2 Oct 2008 14:45] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55097

2757 Georgi Kodinov	2008-10-02
      Bug #37348: Crash in or immediately after JOIN::make_sum_func_list
            
      The optimizer pulls up aggregate functions which should be aggregated in
      an outer select. At some point it may substitute such a function for a field
      in the temporary table. The setup_copy_fields function doesn't take this
      into account and may overrun the copy_field buffer.
            
      Fixed by filtering out the fields referenced through the specialized
      reference for aggregates (Item_aggregate_ref).
      Added an assertion to make sure bugs that cause similar discrepancy 
      don't go undetected.
[8 Oct 2008 9:25] Jon Stephens
Documented in the 5.1.29 changelog as follows:

        The combination of a subquery with a GROUP BY, an aggregate function
        calculated outside the subquery, and a GROUP BY on the outer SELECT
        could cause the server to crash.

Left status as PQ pending merges to 5.1 and 6.0 trees; this is early documentation of an upcoming push into 5.1.29.
[9 Oct 2008 17:52] Bugs System
Pushed into 5.1.30  (revid:kgeorge@mysql.com-20081002144449-31idu81k563o86rx) (version source revid:kgeorge@mysql.com-20081007082452-gk4l86zq8k53wwyo) (pib:4)
[9 Oct 2008 18:44] Paul DuBois
Setting report to NDI pending push into 6.0.x.
[17 Oct 2008 16:43] Bugs System
Pushed into 6.0.8-alpha  (revid:kgeorge@mysql.com-20081002144449-31idu81k563o86rx) (version source revid:kgeorge@mysql.com-20081007153644-uypi14yjgque9obc) (pib:5)
[17 Oct 2008 17:32] Paul DuBois
Noted in 6.0.8 changelog.
[28 Oct 2008 21:04] Bugs System
Pushed into 5.1.29-ndb-6.2.17  (revid:kgeorge@mysql.com-20081002144449-31idu81k563o86rx) (version source revid:tomas.ulin@sun.com-20081028140209-u4emkk1xphi5tkfb) (pib:5)
[28 Oct 2008 22:22] Bugs System
Pushed into 5.1.29-ndb-6.3.19  (revid:kgeorge@mysql.com-20081002144449-31idu81k563o86rx) (version source revid:tomas.ulin@sun.com-20081028194045-0353yg8cvd2c7dd1) (pib:5)
[1 Nov 2008 9:48] Bugs System
Pushed into 5.1.29-ndb-6.4.0  (revid:kgeorge@mysql.com-20081002144449-31idu81k563o86rx) (version source revid:jonas@mysql.com-20081101082305-qx5a1bj0z7i8ueys) (pib:5)
[1 Apr 2009 11:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71051

2721 Gleb Shchepa	2009-04-01
      Backport bug #37348 fix 5.1 --> 5.0.
      
      Original commentary:
      
      Bug #37348: Crash in or immediately after JOIN::make_sum_func_list
                  
      The optimizer pulls up aggregate functions which should be aggregated in
      an outer select. At some point it may substitute such a function for a field
      in the temporary table. The setup_copy_fields function doesn't take this
      into account and may overrun the copy_field buffer.
                  
      Fixed by filtering out the fields referenced through the specialized
      reference for aggregates (Item_aggregate_ref).
      Added an assertion to make sure bugs that cause similar discrepancy 
      don't go undetected.
      modified:
        mysql-test/r/func_group.result
        mysql-test/t/func_group.test
        sql/item.cc
        sql/item.h
        sql/sql_select.cc
[1 Apr 2009 11:25] Gleb Shchepa
Bugfix has been ported into 5.0-bugteam: a test case for the bug #37362 affects this bug in the 5.0.
[5 May 2009 18:52] Bugs System
Pushed into 5.0.82 (revid:davi.arnaut@sun.com-20090505184158-dvmedh8n472y8np5) (version source revid:davi.arnaut@sun.com-20090505184158-dvmedh8n472y8np5) (merge vers: 5.0.82) (pib:6)
[5 May 2009 19:40] Bugs System
Pushed into 5.1.35 (revid:davi.arnaut@sun.com-20090505190206-9xmh7dlc6kom8exp) (version source revid:davi.arnaut@sun.com-20090505190206-9xmh7dlc6kom8exp) (merge vers: 5.1.35) (pib:6)
[6 May 2009 14:08] Bugs System
Pushed into 6.0.12-alpha (revid:svoj@sun.com-20090506125450-yokcmvqf2g7jhujq) (version source revid:bernt.johnsen@sun.com-20090401121459-m85iwu4ghdnnr41y) (merge vers: 6.0.11-alpha) (pib:6)
[15 Jun 2009 8:27] Bugs System
Pushed into 5.1.35-ndb-6.3.26 (revid:jonas@mysql.com-20090615074202-0r5r2jmi83tww6sf) (version source revid:jonas@mysql.com-20090615070837-9pccutgc7repvb4d) (merge vers: 5.1.35-ndb-6.3.26) (pib:6)
[15 Jun 2009 9:07] Bugs System
Pushed into 5.1.35-ndb-7.0.7 (revid:jonas@mysql.com-20090615074335-9hcltksp5cu5fucn) (version source revid:jonas@mysql.com-20090615072714-rmfkvrbbipd9r32c) (merge vers: 5.1.35-ndb-7.0.7) (pib:6)
[15 Jun 2009 9:47] Bugs System
Pushed into 5.1.35-ndb-6.2.19 (revid:jonas@mysql.com-20090615061520-sq7ds4yw299ggugm) (version source revid:jonas@mysql.com-20090615054654-ebgpz7elwu1xj36j) (merge vers: 5.1.35-ndb-6.2.19) (pib:6)