MySQL Bugs: #110668: Innodb crash recovery too slow if have wide table redo during apply phase

Bug #110668	Innodb crash recovery too slow if have wide table redo during apply phase
Submitted:	12 Apr 2023 12:36	Modified:	13 Apr 2023 6:00
Reporter:	zechao zhuang	Email Updates:
Status:	Unsupported	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S5 (Performance)
Version:	5.7.26, 5.7.29, 8.0.32	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	crash recovery, redo apply

Description:
We recently did some testing on wide tables(The table has too many columns.) 

When we simulated the Crash recovery scenario of a wide tables, it was very slow. We checked the code and checked through the perf tool, and found that if the redo log contains a large number of modification log records of the wide table, hot spots will appear when applying these redo, resulting in page apply takes too long, resulting in slower crash recovery.

The hot spots code path in 8.0.20 is:
--recv_parse_or_apply_log_rec_body
  --mlog_parse_index
    --dict_index_add_col
      --get_col_name
        --strlen

Why the strlen() could to be hot spots? Because before the real apply to the page, it is necessary to build a dummy index from the log records according to the length and type of redo. However, there are many columns in the wide table, which requires more calls to add columns to the dummy index. In the dict_index_add_col function, the column name needs to be obtained, and the calculation of the column name is related to the number of columns. This presents a problem of N * N to get column name, where N is the number of columns in the wide table.

The specific code is as follows:
storage/innobase/mtr/mtr0log.cc:587
mlog_parse_index()

for (i = 0; i < n; i++) {
      ulint len = mach_read_from_2(ptr);
      ptr += 2;
      /* The high-order bit of len is the NOT NULL flag;
      the rest is 0 or 0x7fff for variable-length fields,
      and 1..0x7ffe for fixed-length fields. */
      dict_mem_table_add_col(
          table, NULL, NULL,
          ((len + 1) & 0x7fff) <= 1 ? DATA_BINARY : DATA_FIXBINARY,
          len & 0x8000 ? DATA_NOT_NULL : 0, len & 0x7fff);

      /* The is_ascending flag does not matter during
      redo log apply, because we do not compare for
      "less than" or "greater than". */
      dict_index_add_col(ind, table, table->get_col(i), 0, true);
}

storage/innobase/dict/dict.cc:80
dict_index_add_col()
col_name = table->get_col_name(dict_col_get_no(col));

storage/innobase/include/dict0mem.h:2055
const char *get_col_name(ulint col_nr) const {
    ut_ad(col_nr < n_def);
    ut_ad(magic_n == DICT_TABLE_MAGIC_N);

    const char *s = col_names;
    if (s) {
      for (ulint i = 0; i < col_nr; i++) {
        s += strlen(s) + 1;
      }
    }

    return (s);
}

How to repeat:
1.Create some wide tables. such as have 1000 cloumns per table.
2.Run the writes(insert, update and delete operations.) test for these tables.
3.Kill the mysqld
4.Restart the mysqld and use perf tool to analysing recovery phase.

And the above code analysis can also fully explain the problem.

Suggested fix:
When building a dummy index in the apply phase, the names of the columns are actually empty and have no practical meaning, so there is no need to splice the column names by the number of columns, just use a fake column name instead of splicing.

The fix code is as follows:
diff --git a/storage/innobase/dict/dict.cc b/storage/innobase/dict/dict.cc
index eb3f3ff9050..9f4e390533f 100644
--- a/storage/innobase/dict/dict.cc
+++ b/storage/innobase/dict/dict.cc
@@ -45,9 +45,11 @@ external tools. */
 @param[in]     table           table
 @param[in]     col             column
 @param[in]     prefix_len      column prefix length
-@param[in]     is_ascending    true=ASC, false=DESC */
+@param[in]     is_ascending    true=ASC, false=DESC
+@param[in]     apply_only    TRUE=in page apply phase */
 void dict_index_add_col(dict_index_t *index, const dict_table_t *table,
-                        dict_col_t *col, ulint prefix_len, bool is_ascending) {
+                        dict_col_t *col, ulint prefix_len, bool is_ascending,
+                        bool apply_only) {
   dict_field_t *field;
   const char *col_name;

@@ -57,7 +59,7 @@ void dict_index_add_col(dict_index_t *index, const dict_table_t *table,
     dict_v_col_t *v_col = reinterpret_cast<dict_v_col_t *>(col);

     /* When v_col->v_indexes==NULL,
-    ha_innobase::commit_inplace_alter_table(commit=true)
+    ha_innobase::commit_inplace_alter_table(commit=tru)e
     will evict and reload the table definition, and
     v_col->v_indexes will not be NULL for the new table. */
     if (v_col->v_indexes != NULL) {
@@ -77,7 +79,15 @@ void dict_index_add_col(dict_index_t *index, const dict_table_t *table,
   } else
 #endif /* !UNIV_LIBRARY */
   {
-    col_name = table->get_col_name(dict_col_get_no(col));
+    ulint col_nr = dict_col_get_no(col);
+
+    /** Apply use dummy index, so the column just use a dummy too, to avoid
+    calculate column too frequently. */
+    if (apply_only) {
+      col_nr = 0;
+    }
+
+    col_name = table->get_col_name(col_nr);
   }

   index->add_field(col_name, prefix_len, is_ascending);
diff --git a/storage/innobase/dict/dict.h b/storage/innobase/dict/dict.h
index 3904a83f097..37589cec64e 100644
--- a/storage/innobase/dict/dict.h
+++ b/storage/innobase/dict/dict.h
@@ -42,8 +42,10 @@ external tools. */
 @param[in]     table           table
 @param[in]     col             column
 @param[in]     prefix_len      column prefix length
-@param[in]     is_ascending    true=ASC, false=DESC */
+@param[in]     is_ascending    true=ASC, false=DESC
+@param[in]     apply_only    TRUE=in page apply phase */
 void dict_index_add_col(dict_index_t *index, const dict_table_t *table,
-                        dict_col_t *col, ulint prefix_len, bool is_ascending);
+                        dict_col_t *col, ulint prefix_len, bool is_ascending
+                        bool apply_only = false);

Hi Mr. Zhuang,

Thank you for taking the time to report a problem.  Unfortunately you are not using a current release of the product you reported a problem with -- the problem might already be fixed. Please download a new version from http://www.mysql.com/downloads/

If you are able to reproduce the bug with one of the latest versions, please change the version on this bug report to the version you tested and change the status back to "Open".  Again, thank you for your continued support of MySQL.

We have analysed carefully your contribution and we can report to you that releases 8.0.32 and 8.0.33 have been drastically changed, so that the code that you have presented is totally non-recognisable !!!

Furthermore, your proposed patch has some drastic typing errors, which make it unreadable. Last, but not least, yes, strlen() can be a bottleneck in many cases, but you have not tried to eliminate it, at all !!!!

Last, but not least version 5.7 only gets hard bug fixes, not performance improvements.

Unsupported.

I checkout Mysql to the latest release version 8.0.32, and rechecked the latest code logic, the upstream code does not have this problem. 

The root cause of this problem is that the dummy index and table are cached in our code. And the next redo apply can be reused cached dummy index and table, to avoid frequent creation of dummy index and table. When the dummy table is reused, the table->col_names of the table are not cleared, which leads to the problem mentioned above.

Thank you Mr. Zhuang,

For confirming that latest releases do not have the problem you are reporting ......