MySQL Bugs: #56276: Re-architect index->lock mutex for less contention to improve scaling

Bug #56276	Re-architect index->lock mutex for less contention to improve scaling
Submitted:	26 Aug 2010 2:34	Modified:	26 Aug 2010 2:37
Reporter:	Shannon Wade	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S4 (Feature request)
Version:		OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
InnoDB uses a single mutex per index, subsequently on large multi-cpu, fast storage servers , and as those servers get more powerful, it becomes more and more a point of contention since InnoDB locks the index for write operations which blocks selects waiting when the update is done. 

How to repeat:
n/a

Suggested fix:
This global structure should be re-architected to remove this limitation and point of contention, or at least be prepared to be addressed.

The following may be an example, from an affected customer:

We have a case where trx_purge has a lock that blocks other threads. The basic problem is that the thread running trx_purge holds the dict lock for the table/index in X mode
while it is blocked on IO (calling pread64). That limits throughput and hurts response time. Why does btr_cur_search_to_nth_level need this code?
if (latch_mode == BTR_MODIFY_TREE) {
mtr_x_lock(dict_index_get_lock(index), mtr);

trx_purge:
pread64,os_file_pread,os_file_read,os_aio,_fil_io,buf_read_page_low,buf_read_page,buf_page_get_gen,
btr_cur_latch_leaves,btr_cur_search_to_nth_level,row_search_index_entry,
row_purge_remove_sec_if_poss_low,row_purge_step,que_run_threads,trx_purge,srv_master_thread,start_thread,clone

blocked thread:
pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,
btr_cur_search_to_nth_level,row_ins_index_entry_low,row_ins_index_entry,row_ins_step,
row_insert_for_mysql,ha_innobase::write_row,handler::ha_write_row,write_record,
mysql_insert,mysql_execute_command,mysql_parse,Query_log_event::do_apply_event,
apply_event_and_update_pos,handle_slave_sql,start_thread,clone

I think this comes from this code in btr_cur_search_to_nth_level

if (latch_mode == BTR_MODIFY_TREE) {
mtr_x_lock(dict_index_get_lock(index), mtr);

} else if (latch_mode == BTR_CONT_MODIFY_TREE) {
/* Do nothing */
ut_ad(mtr_memo_contains(mtr, dict_index_get_lock(index),
MTR_MEMO_X_LOCK));
} else {
mtr_s_lock(dict_index_get_lock(index), mtr);
}

And trx_purge did mtr_x_lock because of:

row_purge_remove_sec_if_poss(
/*=========================*/
purge_node_t* node, /*!< in: row purge node */
dict_index_t* index, /*!< in: index */
dtuple_t* entry) /*!< in: index entry */
{
ibool success;
ulint n_tries = 0;

/* fputs("Purge: Removing secondary record\n", stderr); */

success = row_purge_remove_sec_if_poss_low(node, index, entry,
BTR_MODIFY_LEAF);
if (success) {

return;
}
retry:
success = row_purge_remove_sec_if_poss_low(node, index, entry,
BTR_MODIFY_TREE);

Which means that the first call to row_purge_remove_sec_if_poss_low(..., BTR_MODIFY_LEAF) failed and then it was called again with BTR_MODIFY_TREE. It can fail when
row_purge_remove_sec_if_poss_low returns FALSE here:

if (!success || !old_has) {
/* Remove the index record */

if (mode == BTR_MODIFY_LEAF) {
success = btr_cur_optimistic_delete(btr_cur, &mtr);
} else {
ut_ad(mode == BTR_MODIFY_TREE);
btr_cur_pessimistic_delete(&err, FALSE, btr_cur,
RB_NONE, &mtr);
success = err == DB_SUCCESS;
ut_a(success || err == DB_OUT_OF_FILE_SPACE);
}
}
btr_pcur_close(&pcur);
mtr_commit(&mtr);
return(success);

And for that to return false, btr_cur_pessimistic_delete must return false and that returns the value of:

no_compress_needed = !rec_offs_any_extern(offsets)
&& btr_cur_can_delete_without_compress(
cursor, rec_offs_size(offsets), mtr);

This can be updated and maybe closed based on work in 5.7