Description:
We recently did some testing on wide tables(The table has too many columns.)
When we simulated the Crash recovery scenario of a wide tables, it was very slow. We checked the code and checked through the perf tool, and found that if the redo log contains a large number of modification log records of the wide table, hot spots will appear when applying these redo, resulting in page apply takes too long, resulting in slower crash recovery.
The hot spots code path in 8.0.20 is:
--recv_parse_or_apply_log_rec_body
--mlog_parse_index
--dict_index_add_col
--get_col_name
--strlen
Why the strlen() could to be hot spots? Because before the real apply to the page, it is necessary to build a dummy index from the log records according to the length and type of redo. However, there are many columns in the wide table, which requires more calls to add columns to the dummy index. In the dict_index_add_col function, the column name needs to be obtained, and the calculation of the column name is related to the number of columns. This presents a problem of N * N to get column name, where N is the number of columns in the wide table.
The specific code is as follows:
storage/innobase/mtr/mtr0log.cc:587
mlog_parse_index()
for (i = 0; i < n; i++) {
ulint len = mach_read_from_2(ptr);
ptr += 2;
/* The high-order bit of len is the NOT NULL flag;
the rest is 0 or 0x7fff for variable-length fields,
and 1..0x7ffe for fixed-length fields. */
dict_mem_table_add_col(
table, NULL, NULL,
((len + 1) & 0x7fff) <= 1 ? DATA_BINARY : DATA_FIXBINARY,
len & 0x8000 ? DATA_NOT_NULL : 0, len & 0x7fff);
/* The is_ascending flag does not matter during
redo log apply, because we do not compare for
"less than" or "greater than". */
dict_index_add_col(ind, table, table->get_col(i), 0, true);
}
storage/innobase/dict/dict.cc:80
dict_index_add_col()
col_name = table->get_col_name(dict_col_get_no(col));
storage/innobase/include/dict0mem.h:2055
const char *get_col_name(ulint col_nr) const {
ut_ad(col_nr < n_def);
ut_ad(magic_n == DICT_TABLE_MAGIC_N);
const char *s = col_names;
if (s) {
for (ulint i = 0; i < col_nr; i++) {
s += strlen(s) + 1;
}
}
return (s);
}
How to repeat:
1.Create some wide tables. such as have 1000 cloumns per table.
2.Run the writes(insert, update and delete operations.) test for these tables.
3.Kill the mysqld
4.Restart the mysqld and use perf tool to analysing recovery phase.
And the above code analysis can also fully explain the problem.
Suggested fix:
When building a dummy index in the apply phase, the names of the columns are actually empty and have no practical meaning, so there is no need to splice the column names by the number of columns, just use a fake column name instead of splicing.
The fix code is as follows:
diff --git a/storage/innobase/dict/dict.cc b/storage/innobase/dict/dict.cc
index eb3f3ff9050..9f4e390533f 100644
--- a/storage/innobase/dict/dict.cc
+++ b/storage/innobase/dict/dict.cc
@@ -45,9 +45,11 @@ external tools. */
@param[in] table table
@param[in] col column
@param[in] prefix_len column prefix length
-@param[in] is_ascending true=ASC, false=DESC */
+@param[in] is_ascending true=ASC, false=DESC
+@param[in] apply_only TRUE=in page apply phase */
void dict_index_add_col(dict_index_t *index, const dict_table_t *table,
- dict_col_t *col, ulint prefix_len, bool is_ascending) {
+ dict_col_t *col, ulint prefix_len, bool is_ascending,
+ bool apply_only) {
dict_field_t *field;
const char *col_name;
@@ -57,7 +59,7 @@ void dict_index_add_col(dict_index_t *index, const dict_table_t *table,
dict_v_col_t *v_col = reinterpret_cast<dict_v_col_t *>(col);
/* When v_col->v_indexes==NULL,
- ha_innobase::commit_inplace_alter_table(commit=true)
+ ha_innobase::commit_inplace_alter_table(commit=tru)e
will evict and reload the table definition, and
v_col->v_indexes will not be NULL for the new table. */
if (v_col->v_indexes != NULL) {
@@ -77,7 +79,15 @@ void dict_index_add_col(dict_index_t *index, const dict_table_t *table,
} else
#endif /* !UNIV_LIBRARY */
{
- col_name = table->get_col_name(dict_col_get_no(col));
+ ulint col_nr = dict_col_get_no(col);
+
+ /** Apply use dummy index, so the column just use a dummy too, to avoid
+ calculate column too frequently. */
+ if (apply_only) {
+ col_nr = 0;
+ }
+
+ col_name = table->get_col_name(col_nr);
}
index->add_field(col_name, prefix_len, is_ascending);
diff --git a/storage/innobase/dict/dict.h b/storage/innobase/dict/dict.h
index 3904a83f097..37589cec64e 100644
--- a/storage/innobase/dict/dict.h
+++ b/storage/innobase/dict/dict.h
@@ -42,8 +42,10 @@ external tools. */
@param[in] table table
@param[in] col column
@param[in] prefix_len column prefix length
-@param[in] is_ascending true=ASC, false=DESC */
+@param[in] is_ascending true=ASC, false=DESC
+@param[in] apply_only TRUE=in page apply phase */
void dict_index_add_col(dict_index_t *index, const dict_table_t *table,
- dict_col_t *col, ulint prefix_len, bool is_ascending);
+ dict_col_t *col, ulint prefix_len, bool is_ascending
+ bool apply_only = false);