Description:
While investigating the crash recovery of MySQL 5.7, I found that some CPU time is spent on creating dummy index and dummy table.
Quoted from Output of perf record during crash recovery:
+ 12.42% mysqld mysqld [.] recv_recover_page_func(unsigned long, buf_block_t*) ▒
+ 10.94% mysqld mysqld [.] recv_add_to_hash_table(mlog_id_t, unsigned long, unsigned long, unsigned char*, unsigned char*, unsigned long, unsigned long) ◆
- 6.12% mysqld libc-2.15.so [.] __strcmp_sse42 ▒
- __strcmp_sse42 ▒
- 89.89% ut_allocator<unsigned char>::get_mem_key(char const*) const [clone .isra.31] ▒
- mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long) ▒
- 61.40% mem_heap_add_block(mem_block_info_t*, unsigned long) ▒
- 47.96% dict_mem_index_create(char const*, char const*, unsigned long, unsigned long, unsigned long) ▒
mlog_parse_index(unsigned char*, unsigned char const*, unsigned long, dict_index_t**) ▒
+ recv_parse_or_apply_log_rec_body(mlog_id_t, unsigned char*, unsigned char*, unsigned long, unsigned long, bool, buf_block_t*, mtr_t*) ▒
- 34.13% dict_mem_table_create(char const*, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ▒
mlog_parse_index(unsigned char*, unsigned char const*, unsigned long, dict_index_t**) ▒
+ recv_parse_or_apply_log_rec_body(mlog_id_t, unsigned char*, unsigned char*, unsigned long, unsigned long, bool, buf_block_t*, mtr_t*) ▒
+ 15.69% mem_heap_strdup(mem_block_info_t*, char const*) ▒
+ 2.22% rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long, mem_block_info_t**) ▒
- 15.48% dict_mem_table_create(char const*, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ▒
mlog_parse_index(unsigned char*, unsigned char const*, unsigned long, dict_index_t**) ▒
+ recv_parse_or_apply_log_rec_body(mlog_id_t, unsigned char*, unsigned char*, unsigned long, unsigned long, bool, buf_block_t*, mtr_t*) ▒
+ 13.73% btr_cur_parse_update_in_place(unsigned char*, unsigned char*, unsigned char*, page_zip_des_t*, dict_index_t*) ▒
- 9.39% dict_mem_index_create(char const*, char const*, unsigned long, unsigned long, unsigned long) ▒
mlog_parse_index(unsigned char*, unsigned char const*, unsigned long, dict_index_t**) ▒
+ recv_parse_or_apply_log_rec_body(mlog_id_t, unsigned char*, unsigned char*, unsigned long, unsigned long, bool, buf_block_t*, mtr_t*) ▒
- 10.03% ut_allocator<unsigned char>::get_mem_key(char const*) const [clone .isra.25] ▒
ut_allocator<unsigned char>::allocate(unsigned long, unsigned char const*, char const*, bool, bool) [clone .constprop.101] ▒
dict_mem_table_create(char const*, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ▒
mlog_parse_index(unsigned char*, unsigned char const*, unsigned long, dict_index_t**) ▒
+ recv_parse_or_apply_log_rec_body(mlog_id_t, unsigned char*, unsigned char*, unsigned long, unsigned long, bool, buf_block_t*, mtr_t*)
Actually these costs are not necessary, because dummy index and dummy table could be reused, and you just need to reset some variables before using it.
How to repeat:
Kill the server under heavy workload and restart it, then check the output of `perf record`
Suggested fix:
Use something like std::multiset<int, dict_index_t*> to maintain the created dummy index (you can access dummy table via dummy_index->table), the first element means column number of the dummy index.
A newly created index will be added to the set. And then if it is required while parsing/applying redo log, remove it from the set, and reset some variables of the index to zero:
dict_index_t::n_def
dict_index_t::type
dict_index_t::n_nullable
dict_table_t::n_def
Also it's not necessary to add system columns for a reused dummy table, only need to update dict_table_t::n_def += DATA_N_SYS_COLS;