Bug #43768 Prepared query with nested subqueries core dumps on second execution
Submitted: 20 Mar 2009 13:58 Modified: 7 Jul 2011 14:29
Reporter: Roy Lyseng Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:6.0 OS:Any
Assigned to: Roy Lyseng CPU Architecture:Any
Tags: optimizer_switch, semijoin, subquery

[20 Mar 2009 13:58] Roy Lyseng
Description:
A prepared query with an EXISTS subquery, and two other subqueries nested within the EXISTS fails during optimization on the second execution.
The query is from ps.test
Note that this fails also when the EXISTS clauses are converted to IN.
The query fails on branch mysql-6.0-exists2in.
The IN version will probably fail on mysql-6.0-opt-subqueries.

How to repeat:
create table t1 (
  id int(11) unsigned not null primary key auto_increment,
  partner_id varchar(35) not null,
  t1_status_id int(10) unsigned
);

insert into t1 values ("1", "partner1", "10"), ("2", "partner2", "10"),
                      ("3", "partner3", "10"), ("4", "partner4", "10");

create table t2 (
  id int(11) unsigned not null default '0',
  t1_line_id int(11) unsigned not null default '0',
  article_id varchar(20),
  sequence int(11) not null default '0',
  primary key  (id,t1_line_id)
);

insert into t2 values ("1", "1", "sup", "0"), ("2", "1", "sup", "1"),
                      ("2", "2", "sup", "2"), ("2", "3", "sup", "3"),
                      ("2", "4", "imp", "4"), ("3", "1", "sup", "0"),
                      ("4", "1", "sup", "0");

create table t3 (
  id int(11) not null default '0',
  preceeding_id int(11) not null default '0',
  primary key  (id,preceeding_id)
);

create table t4 (
  user_id varchar(50) not null,
  article_id varchar(20) not null,
  primary key  (user_id,article_id)
);

insert into t4 values("nicke", "imp");

prepare stmt from
'select distinct t1.partner_id
from t1 left join t3 on t1.id = t3.id
     left join t1 pp on pp.id = t3.preceeding_id
where
  exists (
    select *
    from t2 as pl_inner
    where pl_inner.id = t1.id
    and pl_inner.sequence <= (
      select min(sequence) from t2 pl_seqnr
      where pl_seqnr.id = t1.id
    )
    and exists (
      select * from t4
      where t4.article_id = pl_inner.article_id
      and t4.user_id = ?
    )
  )
  and t1.id = ?
group by t1.id
having count(pp.id) = 0';

set @user_id = 'nicke';
set @id = '2';
execute stmt using @user_id, @id;
execute stmt using @user_id, @id;

The following query where the EXISTS clauses are converted to IN also fails:

prepare stmt from
'select distinct t1.partner_id
from t1 left join t3 on t1.id = t3.id
     left join t1 pp on pp.id = t3.preceeding_id
where
  t1.id in (
    select pl_inner.id
    from t2 as pl_inner
    where pl_inner.sequence <= (
      select min(sequence) from t2 pl_seqnr
      where pl_seqnr.id = t1.id
    )
    and pl_inner.article_id in (
      select t4.article_id from t4
      where t4.user_id = ?
    )
  )
  and t1.id = ?
group by t1.id
having count(pp.id) = 0';
[22 Mar 2009 23:28] MySQL Verification Team
backtrace

Attachment: bkt_43768.txt (text/plain), 5.45 KiB.

[22 Mar 2009 23:29] MySQL Verification Team
Thank you for the bug report.
[23 Mar 2009 8:31] Roy Lyseng
More prepared statements that also fail - simpler cases

Attachment: ps.test (application/octet-stream, text), 6.07 KiB.

[14 Jul 2009 22:40] Patrick Crews
Tried the attached tests numerous times and was unable to successfully reproduce the crash.  The tests and variants were all successful.
[7 Jan 2010 9:48] Roy Lyseng
Some of the queries in the simplified part of this bug case still fails in the 6.0 tree. Problem is related to semijoin conversion; it goes away when disabling semijoin.
[3 Mar 2010 14:18] Roy Lyseng
The following set of statements still fail on second execution:

create table t1 (
  id int(11) unsigned not null primary key auto_increment,
  partner_id varchar(35) not null,
  t1_status_id int(10) unsigned
);

insert into t1 values ("1", "partner1", "10"), ("2", "partner2", "10"),
                      ("3", "partner3", "10"), ("4", "partner4", "10");

create table t2 (
  id int(11) unsigned not null default '0',
  t1_line_id int(11) unsigned not null default '0',
  article_id varchar(20),
  sequence int(11) not null default '0',
  primary key  (id,t1_line_id)
);

insert into t2 values ("1", "1", "sup", "0"), ("2", "1", "sup", "1"),
                      ("2", "2", "sup", "2"), ("2", "3", "sup", "3"),
                      ("2", "4", "imp", "4"), ("3", "1", "sup", "0"),
                      ("4", "1", "sup", "0");
create table t3 (
  id int(11) not null default '0',
  preceeding_id int(11) not null default '0',
  primary key  (id,preceeding_id)
);

create table t4 (
  user_id varchar(50) not null,
  article_id varchar(20) not null,
  primary key  (user_id,article_id)
);

insert into t4 values("nicke", "imp");
prepare stmt from
'select t1.partner_id
from t1
where
  t1.id in (
    select pl_inner.id
    from t2 as pl_inner
    where  pl_inner.article_id in (
      select t4.article_id from t4
      where t4.user_id = \'nicke\'
    )
  )';

execute stmt;
execute stmt;

The problem is that pullout of semijoin tables is attempted for each execution, but used_tables in the nested_join object is not kept in sync with sj_inner_tables in sj_nest.

On first invocation of pull_out_semijoin_tables(), used_tables is {t2,t4} and table t4 is pulled out. Hence, sj_inner_tables is set to {t2}.  

On second invocation of pull_out_semijoin_tables(), no tables are pulled out, and because nested_join->used_tables has not been updated since the transformation into a semijoin nest, sj_inner_tables is set equal to used_tables, ie. {t2,t4}. This set of tables is later used in optimize_semijoin_nests() to calculate the number of tables in the semijoin nest, and because table t4 has indeed been pulled out, the program performs a segmentation fault.

The simplest fix for this is probably to synchronize used_tables and sj_inner_tables, however, it may be that used_tables should be used to push back tables that are originally pulled out. But if these fields are updated synchronously, maybe we do not need sj_inner_tables anymore?

sj_inner_tables is used a couple of places to indicate that a join nest is a semijoin nest, but I guess that test can be replaced with something else.
[14 Apr 2010 14:21] Roy Lyseng
Another possible solution is to avoid classifying tables that contain at most one row as const tables when they are inside a semijoin nest. The practical implication of this change is usually that there is one row access that is pushed from the optimization phase into the execution phase.

IMHO, the main drawback with this approach is that the const table extraction becomes asymmetrical, meaning that there will be slightly more conditional testing in the optimizer.

In this scenario, pullout_semijoin_tables() is called only before the first execution of the query, and the pullout operations are irreversible.

However, with the current implementation of make_join_statistics(), there is a potential problem when we have a subquery table that is functionally dependent upon an outer table, and that contains at most one row:

 - In the first execution, pullout_semijoin_tables() pulls out the dependent
   subquery table.

 - In the second execution, the table appears to be an outer table, and normal
   const table extraction is performed.

This is not a problem that causes an execution error, but it means that the execution path will be different in the first and subsequent executions, which I find unacceptable.

A solution to this problem is to rearrange code in make_join_statistics() according to this algorithm:

1. Initialize JOIN data structures (major part of first loop)

2. Update dependencies based on join information (Warshall)

3. Make key descriptions (update_ref_and_keys())

4. Pull out semijoin tables, called only once

5. Extract tables with at most one row as const tables (exclude semijoin tables)

6. Extract dependent tables as const tables (exclude semijoin tables)

7. The rest of the function.

A running prototype indicates that there are no regressions in data results with this approach, but there are some EXPLAIN changes that need analysis.
[12 May 2010 15:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/108163

3875 Roy Lyseng	2010-05-12
      Bug#43768: Prepared query with nested subqueries core dump on second execution
      
      The problem here is that pullout of semijoin tables is attempted for
      each execution, but because those tables are not "pushed back" again
      after each execution, the pullout fails on second attempt.
      
      The solution chosen here is to pullout only those semijoin tables that are
      functionally dependent upon the outer tables. This pullout operation
      can be performed only once, and, unlike the current procedure, is not
      dependent upon the data volume of the involved tables.
      
      The practical implication of this is as follows:
      
       - Only outer tables containing zero or one rows will now be extracted
         as "const tables". Thus, such tables from a semijoin nest are no
         longer accessed during optimization, and some (rare) optimizations
         are no longer possible.
      
       - In the majority of cases, there is no performance impact. Often,
         the new strategy chosen is Materialization, meaning that the row
         of these table is accessed only once and saved in local memory.
      
       - Const table analysis now has to be done in two phases:
         1) Pullout tables based on dependencies. Both outer and inner tables
         may apply, and
         2) Pullout outer tables based on data volume.
      
      In order to implement the latter point above, and assure that pullout
      of semijoin tables occurs only once, make_join_statistics() has been
      restructured slightly as follows:
      
      1. Initialize JOIN data structures
         (major part of first loop in existing implementation).
      
      2. Update dependencies based on join information
         (the Warshall algorithm).
      
      3. Make key descriptions (update_ref_and_keys()).
      
      4. Pull out semijoin tables, called only once.
      
      5. Extract tables with zero or one rows as const tables
         (consider outer tables only, no semijoin tables).
      
      6. Extract dependent tables as const tables.
         (consider outer tables only, no semijoin tables).
      
      7. The remaining parts of the function.
      
      mysql-test/r/func_group.result
        "Impossible WHERE noticed after reading const tables" no longer occurs.
      
      mysql-test/r/select_found.result
        Possible keys changed from "PRIMARY,kid" to "kid".
      
      mysql-test/r/subselect.result
        One table accessed using FirstMatch instead of table pullout.
      
      mysql-test/r/subselect_mat.result
        "Impossible WHERE noticed after reading const tables" no longer occurs.
      
      mysql-test/r/subselect_no_mat.result
        One table accessed using FirstMatch instead of table pullout.
      
      mysql-test/r/subselect_sj.result
        A number of plan changes because of extensive testing of semi-join
        tables with 0, 1 and 2 rows.
      
      mysql-test/r/subselect_sj_jcl6.result
        Ditto.
      
      mysql-test/t/subselect_sj.test
        bug#46744 now becomes a duplicate of bug#50489, and test case is moved.
      
      mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test
        Test for bug#46744 moved here.
      
      sql/sql_select.cc
        pull_out_semijoin_tables()
          Removed code for pullout of const tables.
        make_join_statistics()
          Removed const table pullout from first loop.
          Simplified testing based on inner/outer/semi-join properties.
          Calls pull_out_semijoin_tables() just after dependency analysis
          Then, added loop that performs pullout based on data volume.
          Simplified testing based on inner/outer/semi-join properties.
          Assured that semijoin tables are no longer pulled out based on volume.
[12 May 2010 15:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/108171

3875 Roy Lyseng	2010-05-12
      Bug#43768: Prepared query with nested subqueries core dump on second execution
      
      The problem here is that pullout of semijoin tables is attempted for
      each execution, but because those tables are not "pushed back" again
      after each execution, the pullout fails on second attempt.
      
      The solution chosen here is to pullout only those semijoin tables that are
      functionally dependent upon the outer tables. This pullout operation
      can be performed only once, and, unlike the current procedure, is not
      dependent upon the data volume of the involved tables.
      
      The practical implication of this is as follows:
      
       - Only outer tables containing zero or one rows will now be extracted
         as "const tables". Thus, such tables from a semijoin nest are no
         longer accessed during optimization, and some (rare) optimizations
         are no longer possible.
      
       - In the majority of cases, there is no performance impact. Often,
         the new strategy chosen is Materialization, meaning that the row
         of these table is accessed only once and saved in local memory.
      
       - Const table analysis now has to be done in two phases:
         1) Pullout tables based on dependencies. Both outer and inner tables
         may apply, and
         2) Pullout outer tables based on data volume.
      
      In order to implement the latter point above, and assure that pullout
      of semijoin tables occurs only once, make_join_statistics() has been
      restructured slightly as follows:
      
      1. Initialize JOIN data structures
         (major part of first loop in existing implementation).
      
      2. Update dependencies based on join information
         (the Warshall algorithm).
      
      3. Make key descriptions (update_ref_and_keys()).
      
      4. Pull out semijoin tables, called only once.
      
      5. Extract tables with zero or one rows as const tables
         (consider outer tables only, no semijoin tables).
      
      6. Extract dependent tables as const tables.
         (consider outer tables only, no semijoin tables).
      
      7. The remaining parts of the function.
      
      mysql-test/r/func_group.result
        "Impossible WHERE noticed after reading const tables" no longer occurs.
      
      mysql-test/r/select_found.result
        Possible keys changed from "PRIMARY,kid" to "kid".
      
      mysql-test/r/subselect.result
        One table accessed using FirstMatch instead of table pullout.
      
      mysql-test/r/subselect4.result
        Added test case for bug#43768
      
      mysql-test/r/subselect_mat.result
        "Impossible WHERE noticed after reading const tables" no longer occurs.
      
      mysql-test/r/subselect_no_mat.result
        One table accessed using FirstMatch instead of table pullout.
      
      mysql-test/r/subselect_sj.result
        A number of plan changes because of extensive testing of semi-join
        tables with 0, 1 and 2 rows.
      
      mysql-test/r/subselect_sj_jcl6.result
        Ditto.
      
      mysql-test/t/subselect4.test
        Added test case for bug#43768
      
      mysql-test/t/subselect_sj.test
        bug#46744 now becomes a duplicate of bug#50489, and test case is moved.
      
      mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test
        Test for bug#46744 moved here.
      
      sql/sql_select.cc
        pull_out_semijoin_tables()
          Removed code for pullout of const tables.
        make_join_statistics()
          Removed const table pullout from first loop.
          Simplified testing based on inner/outer/semi-join properties.
          Calls pull_out_semijoin_tables() just after dependency analysis
          Then, added loop that performs pullout based on data volume.
          Simplified testing based on inner/outer/semi-join properties.
          Assured that semijoin tables are no longer pulled out based on volume.
[4 Jun 2010 15:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/110250

3185 Roy Lyseng	2010-06-04
      Bug#43768: Prepared query with nested subqueries core dump on second execution
      
      The problem here is that pullout of semijoin tables is attempted for
      each execution, but because those tables are not "pushed back" again
      after each execution, the pullout fails on second attempt.
      
      The solution chosen here is to pullout only those semijoin tables that are
      functionally dependent upon the outer tables. This pullout operation
      can be performed only once, and, unlike the current procedure, is not
      dependent upon the data volume of the involved tables.
      We still pullout tables based on volume if this is a direct execution, though.
      
      The practical implication of this is as follows:
      
       - Only outer tables containing zero or one rows will now be extracted
         as "const tables". Thus, such tables from a semijoin nest are no
         longer accessed during optimization, and some (rare) optimizations
         are no longer possible.
      
       - In the majority of cases, there is no performance impact. Often,
         the new strategy chosen is Materialization, meaning that the row
         of these table is accessed only once and saved in local memory.
      
       - Const table analysis now has to be done in two phases:
         1) Pullout tables based on dependencies. Both outer and inner tables
         may apply, and
         2) Pullout outer tables based on data volume.
      
      In order to implement the latter point above, and assure that pullout
      of semijoin tables occurs only once, make_join_statistics() has been
      restructured slightly. The conditional logic within the function has also
      been enhanced for better readability.
      
      The logic of make_join_statistics() is now as follows:
      
      1. Initialize JOIN data structures
         (major part of first loop in existing implementation).
      
      2. Update dependencies based on join information
         (the Warshall algorithm).
      
      3. Make key descriptions (update_ref_and_keys()).
      
      4. Pull out semijoin tables, called only once.
      
      5. Extract tables with zero or one rows as const tables
         (consider outer tables only, no semijoin tables).
      
      6. Extract dependent tables as const tables.
         (consider outer tables only, no semijoin tables).
      
      7. The remaining parts of the function.
      
      mysql-test/r/select_found.result
        Possible keys changed from "PRIMARY,kid" to "kid".
      
      mysql-test/r/subselect4.result
        Added test case for bug#43768
      
      mysql-test/r/subselect_sj.result
        A number of plan changes because of extensive testing of semi-join
        tables with 0, 1 and 2 rows.
      
      mysql-test/r/subselect_sj_jcl6.result
        Ditto.
      
      mysql-test/t/subselect4.test
        Added test case for bug#43768
      
      mysql-test/t/subselect_sj.test
        bug#46744 now becomes a duplicate of bug#50489, and test case is moved.
      
      mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test
        Test for bug#46744 moved here.
      
      sql/sql_select.cc
        pull_out_semijoin_tables()
          join_tab->emb_sj_nest is no longer updated. Comments updated.
        make_join_statistics()
          Removed const table pullout from first loop.
          Simplified testing based on inner/outer/semi-join properties.
          Calls pull_out_semijoin_tables() just after dependency analysis.
          Then, added loop that performs pullout based on data volume
          but excludes semijoined tables in prepared statements.
          Second call of pull_out_semijoin_tables() is needed after this.
          Simplified testing based on inner/outer/semi-join properties.
          Assured that semijoin tables are no longer pulled out based on volume.
[9 Jul 2010 13:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/113262

3211 Roy Lyseng	2010-07-09
      Bug#43768: Prepared query with nested subqueries core dump on second execution
      
      The problem here is that pullout of semijoin tables is attempted for
      each execution, but because those tables are not "pushed back" again
      after each execution, the pullout fails on second attempt.
      
      The solution chosen here is to pullout only those semijoin tables that are
      functionally dependent upon the outer tables. This pullout operation
      need to be performed only once, and, unlike the current procedure, is not
      dependent upon the row count of the involved tables.
      We still pullout tables that are classified as const tables based on row count
      if this is a direct execution, though.
      
      The practical implication of this is as follows:
      
       - Only outer tables containing zero or one rows will now be extracted
         as "const tables". Thus, such tables from a semijoin nest are no
         longer accessed during optimization, and some (rare) optimizations
         are no longer possible.
      
       - In the majority of cases, there is no performance impact. Often,
         the new strategy chosen is Materialization, meaning that the row
         of these table is accessed only once and saved in local memory.
      
       - Const table analysis now has to be done in two phases:
         1) Pullout tables based on dependencies. Both outer and inner tables
         may apply, and
         2) Pullout tables based on row count. Outer tables are always pulled out,
         inner tables only if inside a non-prepared statement.
      
      In order to implement the latter point above, and assure that pullout
      of semijoin tables occurs only once, make_join_statistics() has been
      restructured slightly. The conditional logic within the function has also
      been enhanced for better readability.
      
      The logic of make_join_statistics() is now as follows:
      
      1. Initialize JOIN data structures
         (major part of first loop in existing implementation).
      
      2. Update dependencies based on join information
         (the Warshall algorithm).
      
      3. Make key descriptions (update_ref_and_keys()).
      
      4. Pull out semijoin tables, called only once.
      
      5. Extract tables with zero or one rows as const tables
         (in prepared mode, consider outer tables only, no semijoin tables).
      
      6. Extract dependent tables as const tables.
         (in prepared mode, consider outer tables only, no semijoin tables).
      
      7. The remaining parts of the function.
      
      mysql-test/r/select_found.result
        Possible keys changed from "PRIMARY,kid" to "kid".
        Happens because analysis order is slightly changed, but harmless as
        the table is identified as "const".
      
      mysql-test/r/subselect_sj.result
        Added test case for bug#43768
        A number of plan changes because of extensive testing of semi-join
        tables with 0, 1 and 2 rows.
      
      mysql-test/r/subselect_sj_jcl6.result
        Ditto.
      
      mysql-test/t/subselect_sj.test
        Added test case for bug#43768
        bug#46744 now becomes a duplicate of bug#50489, and test case is moved.
      
      mysql-test/suite/optimizer_unfixed_bugs/t/bug46744.test
        Test for bug#46744 moved here.
      
      sql/sql_select.cc
        pull_out_semijoin_tables()
          join_tab->emb_sj_nest is no longer updated. Comments updated.
        make_join_statistics()
          Removed const table pullout from first loop.
          Simplified testing based on inner/outer/semi-join properties.
          Calls pull_out_semijoin_tables() just after dependency analysis.
          Then, added loop that performs pullout based on row count
          but excludes semijoined tables in prepared statements.
          Second call of pull_out_semijoin_tables() is needed after this.
          Simplified testing based on inner/outer/semi-join properties.
          Semijoin tables are no longer pulled out based on row count in
          prepared statements.
          Because pull_out_semijoin_tables() is no longer called for each
          execution, emb_sj_nest is now set in this function.
[2 Oct 2010 18:14] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101002181053-6iotvl26uurcoryp) (version source revid:alexander.nozdrin@oracle.com-20101002180917-h0n62akupm3z20nt) (pib:21)
[13 Nov 2010 16:22] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[27 Jan 2011 12:39] Roy Lyseng
Bug#51489 is a duplicate. Remember to add reproduction case to bug fix.
[7 Jul 2011 14:29] Paul DuBois
Bug does not appear in any released version. No changelog entry needed.

CHANGESET - http://lists.mysql.com/commits/139883