Bug #68840 question about new code in make_join_statistics
Submitted: 2 Apr 2013 20:21 Modified: 3 Apr 2013 20:33
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.6.10 OS:Any
Assigned to: CPU Architecture:Any

[2 Apr 2013 20:21] Mark Callaghan
Description:
While looking for the source of perf regressions in the 5.6 optimizer I read this code in make_join_statistics. How useful is this code? HA_STATS_RECORDS_IS_EXACT isn't set for InnoDB.

  /*
    Extract const tables based on row counts, must be done for each execution.
    Tables containing exactly zero or one rows are marked as const, but
    notice the additional constraints checked below.
    Tables that are extracted have their rows read before actual execution
    starts and are placed in the beginning of the join_tab array.
    Thus, they do not take part in join order optimization process,
    which can significantly reduce the optimization time.
    The data read from these tables can also be regarded as "constant"
    throughout query execution, hence the column values can be used for
    additional constant propagation and extraction of const tables based
    on eq-ref properties.
  */
  enum enum_const_table_extraction
  {
     extract_no_table=    0,
     extract_empty_table= 1,
     extract_const_table= 2
  };

  if (join->no_const_tables)
    goto const_table_extraction_done;

  for (i= 0, s= stat; i < table_count; i++, s++)
  {
    TABLE      *const table= s->table;
    TABLE_LIST *const tables= table->pos_in_table_list;
    enum enum_const_table_extraction extract_method= extract_const_table;

#ifdef WITH_PARTITION_STORAGE_ENGINE
    const bool all_partitions_pruned_away= table->all_partitions_pruned_away;
#else
    const bool all_partitions_pruned_away= false;
#endif

    if (tables->outer_join_nest())
    {
      /*
        Table belongs to a nested join, no candidate for const table extraction.
      */
      extract_method= extract_no_table;
    }
    else if (tables->embedding && tables->embedding->sj_on_expr)
    {
      /*
        Table belongs to a semi-join.
        We do not currently pull out const tables from semi-join nests.
      */
      extract_method= extract_no_table;
    }
    else if (*s->on_expr_ref)
    {
      /* s is the only inner table of an outer join, extract empty tables */
      extract_method= extract_empty_table;
    }
    switch (extract_method)
    {
    case extract_no_table:
      break;

    case extract_empty_table:
      /* Extract tables with zero rows, but only if statistics are exact */
      if ((table->file->stats.records == 0 ||
           all_partitions_pruned_away) &&
          (table->file->ha_table_flags() & HA_STATS_RECORDS_IS_EXACT))
        set_position(join, const_count++, s, NULL);
      break;

    case extract_const_table:
      /*
        Extract tables with zero or one rows, but do not extract tables that
         1. are dependent upon other tables, or
         2. have no exact statistics, or
         3. are full-text searched
      */ 
      if ((table->s->system ||
           table->file->stats.records <= 1 ||
           all_partitions_pruned_away) &&
          !s->dependent &&                                               // 1
          (table->file->ha_table_flags() & HA_STATS_RECORDS_IS_EXACT) && // 2
          !table->fulltext_searched)                                     // 3
        set_position(join, const_count++, s, NULL);
      break;
    }
  }

How to repeat:
read the source

Suggested fix:
Not sure we need new code added to the optimizer that is MyISAM specific
[3 Apr 2013 6:05] Roy Lyseng
Hi Mark,

it is not new code, just refactoring of old code that has existed for a long time. The achievement here was to separate out the code that deals with const tables, rather than having it as part of a more general loop. The refactoring was done as part of semi-join development, which added to the complexity of this part of the function.

We also wanted to separate dependency calculations from const table calculations, because dependency calculations are generic for a query and can be done up front, whereas const table calculations has to be redone for every execution. This argument obviously relates to prepared statements only.

I guess the possibly good thing for you is that you can #ifdef away this code block entirely.
[3 Apr 2013 13:06] Mark Callaghan
I noticed this in the context of running sysbench. The queries are of the form PK = $value. Whether or not HA_STATS_RECORDS_IS_EXACT is set, the table returns at most 1 row. Why can't the optimization be used in this case?
[3 Apr 2013 19:58] Roy Lyseng
The loop you highlighted only deals with tables that have exactly zero or one rows. The loop after that reads those rows directly from the table.

There is another loop, starting at label const_table_extraction_done that looks at predicates and identifies tables that can use ref or eq_ref access. When an equality predicate can use a unique index, such as pk=<value>, the access method is changed to CONST and the row is read immediately. Thus, the greedy optimizer will loop over zero tables and JOIN::exec() becomes almost a no-op for this simple query (it just has to pass the already retrieved column values to the client).

We can probably do more to short-cut the execution patch for such simple queries, though.
[3 Apr 2013 20:31] Mark Callaghan
Thanks for the prompt responses. While there is more overhead from the optimizer in 5.6, this bug is bogus. Please close it.