MySQL Bugs: #119041: statistics automatic update cause poor execution plan

Bug #119041	statistics automatic update cause poor execution plan
Submitted:	19 Sep 8:19	Modified:	23 Sep 9:57
Reporter:	jie han (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S2 (Serious)
Version:	8.0.25 8.0.40	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
When MySQL performs background automatic statistics updates, it will acquire dict_table_stats_lock(RW_X_LATCH) and call dict_stats_empty_index() to clear the statistics.
The call chain is:
dict_stats_update() → dict_stats_update_persistent() → dict_table_stats_lock(RW_X_LATCH) → dict_stats_analyze_index() → dict_stats_analyze_index_low() → dict_stats_empty_index() → dict_stats_save().

However, I found that when MySQL executes a query and retrieves the table statistics, the passed-in flag is HA_STATUS_NO_LOCK. This means it will not call dict_table_stats_lock(ib_table, RW_S_LATCH).
The call chain in this case is:
TABLE_LIST::fetch_number_of_rows() → table->file->info(HA_STATUS_VARIABLE | HA_STATUS_NO_LOCK) → ha_innobase::info(uint flag) → ha_innobase::info_low → ...

Therefore, it is possible that while the background thread is updating the table statistics, and happens to be at the point where dict_stats_empty_index() has just cleared the statistics, a query on the same table is also being executed. In this case, the query might retrieve empty statistics, which could result in the optimizer choosing a wrong execution plan, causing a query that normally runs fine to become much slower.

How to repeat:
/** Returns statistics information of the table to the MySQL interpreter, in
various fields of the handle object.
@param[in]  flag    what information is requested
@param[in]  is_analyze  True if called from "::analyze()".
@return HA_ERR_* error code or 0 */
int ha_innobase::info_low(uint flag, bool is_analyze) {
  dict_table_t *ib_table;
  ib_uint64_t n_rows;

  DBUG_TRACE;

  DEBUG_SYNC_C("ha_innobase_info_low");

  /* If we are forcing recovery at a high level, we will suppress
  statistics calculation on tables, because that may crash the
  server if an index is badly corrupted. */

  /* We do not know if MySQL can call this function before calling
  external_lock(). To be safe, update the thd of the current table
  handle. */

  update_thd(ha_thd());

  m_prebuilt->trx->op_info = (char *)"returning various info to MySQL";

  ib_table = m_prebuilt->table;
  assert(ib_table->n_ref_count > 0);

  if (flag & HA_STATUS_TIME) {
    if (is_analyze || innobase_stats_on_metadata) {
      dict_stats_upd_option_t opt;
      dberr_t ret;

      m_prebuilt->trx->op_info = "updating table statistics";

      if (dict_stats_is_persistent_enabled(ib_table)) {
        if (is_analyze) {
          opt = DICT_STATS_RECALC_PERSISTENT;
        } else {
          /* This is e.g. 'SHOW INDEXES', fetch
          the persistent stats from disk. */
          opt = DICT_STATS_FETCH_ONLY_IF_NOT_IN_MEMORY;
        }
      } else {
        opt = DICT_STATS_RECALC_TRANSIENT;
      }

      ut_ad(!mutex_own(&dict_sys->mutex));
      ret = dict_stats_update(ib_table, opt);

      if (ret != DB_SUCCESS) {
        m_prebuilt->trx->op_info = "";
        return HA_ERR_GENERIC;
      }

      m_prebuilt->trx->op_info = "returning various info to MySQL";
    }

    stats.update_time = (ulong)ib_table->update_time;
  }

  if (flag & HA_STATUS_VARIABLE) {
    ulint stat_clustered_index_size;
    ulint stat_sum_of_other_index_sizes;

    if (!(flag & HA_STATUS_NO_LOCK)) {      ## here flag is HA_STATUS_VARIABLE | HA_STATUS_NO_LOCK, so dict_table_stats_lock(ib_table, RW_S_LATCH) is skipped.
      dict_table_stats_lock(ib_table, RW_S_LATCH);
    }

    ut_a(ib_table->stat_initialized);

    n_rows = ib_table->stat_n_rows;

    stat_clustered_index_size = ib_table->stat_clustered_index_size;

    stat_sum_of_other_index_sizes = ib_table->stat_sum_of_other_index_sizes;

    if (!(flag & HA_STATUS_NO_LOCK)) {
      dict_table_stats_unlock(ib_table, RW_S_LATCH);
    }

Suggested fix:
When retrieving statistics, is it necessary to acquire dict_table_stats_lock(ib_table, RW_S_LATCH), or is there a better approach?

Thanks for your analysis

Maybe related: Bug#82968.

I find another bug report：https://bugs.mysql.com/bug.php?id=105224
It's similar to my situation. 
In our production environment, we occasionally encounter cases where SQL queries use the wrong execution plan. We have observed that, when this occurs, the table statistics happen to be updated at the same time as the slow query (as indicated by the last_update field in the innodb_index_stats/innodb_table_stats tables). This phenomenon accounts for approximately 50% of all incorrect execution plan cases in our production environment.

Hello，I find some new.
In my_base.h we have the following definition:

/*
  update the 'variable' part of the info:
  handler::records, deleted, data_file_length, index_file_length,
  check_time, mean_rec_length
*/
#define HA_STATUS_VARIABLE 16

From this we can see that HA_STATUS_VARIABLE includes the following variables:
handler::records, deleted, data_file_length, index_file_length, check_time, mean_rec_length.

These statistics are retrieved when executing a query via the handler::info(uint flag) method call, where flag by default is:HA_STATUS_VARIABLE | HA_STATUS_NO_LOCK

How these statistics are calculated：
From ha_innobase::info_low → calculate_index_size_stats, we can see that they are calculated in calculate_index_size_stats:

/** Calculate stats based on index size.
@param[in]  ib_table                       table object
@param[in]  n_rows                         number of rows
@param[in]  stat_clustered_index_size      clustered index size
@param[in]  stat_sum_of_other_index_sizes  sum of non-clustered index sizes
@param[in,out]  stats                      the stats structure to hold
                                           calculated values */
static void calculate_index_size_stats(const dict_table_t *ib_table,
                                       ib_uint64_t n_rows,
                                       ulint stat_clustered_index_size,
                                       ulint stat_sum_of_other_index_sizes,
                                       ha_statistics *stats) {
  const page_size_t &page_size = dict_table_page_size(ib_table);

  stats->records = static_cast<ha_rows>(n_rows);
  stats->data_file_length =
      static_cast<ulonglong>(stat_clustered_index_size) * page_size.physical();
  stats->index_file_length =
      static_cast<ulonglong>(stat_sum_of_other_index_sizes) *
      page_size.physical();
  if (stats->records == 0) {
    stats->mean_rec_length = 0;
  } else {
    stats->mean_rec_length =
        static_cast<ulong>(stats->data_file_length / stats->records);
  }
}

Here:
records comes from n_rows
data_file_length = stat_clustered_index_size × page_size
index_file_length = stat_sum_of_other_index_sizes × page_size
mean_rec_length = data_file_length / records

Accuracy of n_rows, stat_clustered_index_size, stat_sum_of_other_index_sizes:
These values are collected in dict_stats_update_persistent():

/** Calculates new estimates for table and index statistics.
This function is relatively slow and is used to calculate persistent
statistics that will be saved on disk.
@return DB_SUCCESS or error code */
static dberr_t dict_stats_update_persistent(dict_table_t *table) {
  dict_index_t *index;

  dict_table_stats_lock(table, RW_X_LATCH);

  /* analyze the clustered index first */
  index = table->first_index();

  dict_stats_analyze_index(index);

  ulint n_unique = dict_index_get_n_unique(index);

  table->stat_n_rows = index->stat_n_diff_key_vals[n_unique - 1];
  table->stat_clustered_index_size = index->stat_index_size;

  /* analyze other indexes from the table, if any */
  table->stat_sum_of_other_index_sizes = 0;

  for (index = index->next(); index != nullptr; index = index->next()) {
    ut_ad(!dict_index_is_ibuf(index));

    dict_stats_empty_index(index);

    if (!(table->stats_bg_flag & BG_STAT_SHOULD_QUIT)) {
      dict_stats_analyze_index(index);
    }

    table->stat_sum_of_other_index_sizes += index->stat_index_size;
  }
}

The call sequence is:
dict_stats_update_persistent → dict_stats_analyze_index → dict_stats_analyze_index_low → dict_stats_empty_index.

However:

- Even if dict_stats_empty_index clears some index-related info, it does not affect the already collected stat_n_rows and stat_clustered_index_size — these are only updated after dict_stats_analyze_index completes.
- Therefore, concurrent queries will still observe correct stat_n_rows and stat_clustered_index_size.
But stat_sum_of_other_index_sizes is immediately set to 0 before the loop adds up each index’s size.
This means:
If a query is executed during the time after stat_sum_of_other_index_sizes is set to 0 and before it is recalculated in the loop, that query could get an incorrect index_file_length (because index_file_length = stat_sum_of_other_index_sizes × page_size.physical()), possibly seeing it as 0. This might cause the optimizer to choose a wrong execution plan.

Difference in MySQL 5.7.44
In MySQL 5.7.44, the dict_stats_update_persistent() function looks like:

dict_stats_update_persistent(dict_table_t* table) {
    /* analyze other indexes from the table, if any */
    ib_uint64_t stat_sum_of_other_index_sizes_tmp = 0;

    for (index = dict_table_get_next_index(index);
         index != NULL;
         index = dict_table_get_next_index(index)) {

        if (!(table->stats_bg_flag & BG_STAT_SHOULD_QUIT)) {
            dict_stats_analyze_index(index);
        }

        stat_sum_of_other_index_sizes_tmp
            += index->stat_index_size;
    }

    table->stat_sum_of_other_index_sizes = stat_sum_of_other_index_sizes_tmp;
}
So in 5.7:
- The temporary variable stat_sum_of_other_index_sizes_tmp is used for accumulation.
- table->stat_sum_of_other_index_sizes is only updated once at the end.
- This avoids exposing an intermediate zero or partial value to concurrent sessions.

My Question / Observation
From the above, we can see:
- In 8.0, table->stat_sum_of_other_index_sizes is reset to 0 first, then incrementally updated inside the loop over indexes.
- This leaves a timing window where concurrent queries calling handler::info() might read a stat_sum_of_other_index_sizes of 0 before the full accumulation is completed.
- Since index_file_length = stat_sum_of_other_index_sizes × page_size, they would get an incorrect index_file_length (possibly 0), potentially causing the optimizer to mis-estimate costs and choose an inefficient execution plan.
- In 5.7 this does not happen because the value is only assigned after the loop completes (no exposure of partial values).

Questions:
- Could this change in 8.0 actually affect optimizer decisions and lead to wrong plan choices in real workloads? (Especially when persistent stats collection overlaps with query execution.)
- Would it be worth reverting this part to the 5.7-style approach with a local temporary variable, to avoid exposing intermediate/incorrect stat_sum_of_other_index_sizes?
- Or, is this change in 8.0 based on a specific design consideration that makes the exposure of intermediate values acceptable (e.g., trading correctness for some benefit)?