Bug #56433 Auto-extension of InnoDB files
Submitted: 1 Sep 2010 1:49 Modified: 15 Oct 2012 13:48
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S4 (Feature request)
Version:5.1.50 OS:Any
Assigned to: Inaam Rana CPU Architecture:Any
Tags: extend, file, innodb, performance

[1 Sep 2010 1:49] Mark Callaghan
Description:
fil_extend_space_to_desired_size locks fil_system->mutex. Page reads are blocked until that is done because they call fil_space_get_version() and that must lock fil_system->mutex to work.

This is an intermittent source of stalls on our servers. We think this is worse with innodb_file_per_table.

How to repeat:
Run poor man's profiler on a busy server when there are stalls to identify the problem.

Suggested fix:
Can index metadata structs store the tablespace ID and version? Today they only store the tablespace ID. Were they to also store the version then fil_space_get_version() would not be called before a page read.
[1 Sep 2010 3:51] Domas Mituzas
and a background thread for non-blocking extending please!!!!! :)
[3 Feb 2011 16:35] Mark Callaghan
page reads call this and must lock fil_system->mutex, when a file is being extended, those reads block

/*******************************************************************//**
Returns the version number of a tablespace, -1 if not found.
@return version number, -1 if the tablespace does not exist in the
memory cache */
UNIV_INTERN
ib_int64_t
fil_space_get_version(
/*==================*/
        ulint   id)     /*!< in: space id */
{
        fil_space_t*    space;
        ib_int64_t      version         = -1;
        ut_ad(fil_system);

        mutex_enter(&fil_system->mutex);

        space = fil_space_get_by_id(id);
        if (space) {
                version = space->tablespace_version;
        }

        mutex_exit(&fil_system->mutex);
        return(version);
}

In one example, this is the stack for the thread that holds fil_system->mutex

pwrite64,os_file_write,os_aio,fil_extend_space_to_desired_size,fsp_header_write_space_id,fsp_reserve_free_extents,btr_cur_pessimistic_update,row_upd_in_place_in_select,row_upd_step,row_update_for_mysql,ha_innobase::update_row,write_record,mysql_insert,mysql_execute_command,mysql_parse,dispatch_command,handle_one_connection,start_thread,clone

many threads attempting reads then block on it

pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,mutex_spin_wait,fil_space_get_version,buf_read_page,buf_page_get_gen,btr_cur_search_to_nth_level,row_search_for_mysql,ha_innobase::index_read,create_tmp_field,create_tmp_field,st_select_lex::print,JOIN::optimize,mysql_select,handle_select,mysql_execute_command,mysql_parse,dispatch_command,handle_one_connection,start_thread,clone

pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,mutex_spin_wait,trx_undo_assign_undo,trx_undo_report_row_operation,btr_cur_update_in_place,btr_cur_optimistic_update,row_upd_in_place_in_select,row_upd_step,row_update_for_mysql,ha_innobase::update_row,mysql_update,mysql_execute_command,mysql_parse,dispatch_command,handle_one_connection,start_thread,clone
 
pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,mutex_spin_wait,fil_space_get_version,buf_read_page,buf_page_get_gen,btr_cur_search_to_nth_level,btr_estimate_n_rows_in_range,ha_innobase::records_in_range,TRP_ROR_INTERSECT::make_quick,TRP_ROR_INTERSECT::make_quick,TRP_ROR_INTERSECT::make_quick,TRP_ROR_INTERSECT::make_quick,TRP_ROR_INTERSECT::make_quick,SQL_SELECT::test_quick_select,st_select_lex::print,JOIN::optimize,mysql_select,handle_select,mysql_execute_command,mysql_parse,dispatch_command,handle_one_connection,start_thread,clone

 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,mutex_spin_wait,fil_space_get_version,buf_read_page,buf_page_get_gen,btr_cur_search_to_nth_level,btr_estimate_n_rows_in_range,ha_innobase::records_in_range,TRP_ROR_INTERSECT::make_quick,TRP_ROR_INTERSECT::make_quick,TRP_ROR_INTERSECT::make_quick,SQL_SELECT::test_quick_select,st_select_lex::print,JOIN::optimize,st_select_lex_unit::exec,mysql_union,handle_select,mysql_execute_command,mysql_parse,dispatch_command,handle_one_connection,start_thread,clone

 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,mutex_spin_wait,fil_space_get_version,buf_read_page,buf_page_get_gen,btr_cur_search_to_nth_level,fetch_step,row_search_for_mysql,ha_innobase::index_read,cp_buffer_from_ref,sub_select,sub_select_cache,JOIN::exec,mysql_select,handle_select,mysql_execute_command,mysql_parse,dispatch_command,handle_one_connection,start_thread,clone

Those threads don't give up their thread concurrency tickets and other threads get stuck waiting to enter innodb
[5 Feb 2011 2:02] James Day
Not giving up the tickets looks like a bug.

Background extending so a new space chunk is allocated when the final empty one starts being used or something similar looks like a feature request.
[9 Feb 2011 21:49] Mark Callaghan
I think this patch fixes the stall

Attachment: 0001-s.patch (application/octet-stream, text), 3.59 KiB.

[13 Feb 2011 17:48] Mark Callaghan
That patch is bad. A new one is in progress.
[22 Feb 2011 1:47] Yasufumi Kinoshita
simplest fix, I think

Attachment: fix_suggestion_56433.patch (text/x-patch), 2.89 KiB.

[25 Feb 2011 8:35] Yasufumi Kinoshita
Sorry, my previous patch need to patch also following for UNIV_SYNC_DEBUG

--- a/storage/innodb_plugin/sync/sync0sync.c    2011-02-25 14:09:57.710270419 +0900
+++ b/storage/innodb_plugin/sync/sync0sync.c    2011-02-25 14:12:20.138232965 +0900
@@ -1161,6 +1161,7 @@
        case SYNC_LOG:
        case SYNC_THR_LOCAL:
        case SYNC_ANY_LATCH:
+       case SYNC_OUTER_ANY_LATCH:
        case SYNC_TRX_SYS_HEADER:
        case SYNC_FILE_FORMAT_TAG:
        case SYNC_DOUBLEWRITE:
[25 Jul 2011 22:32] James Day
http://blogs.innodb.com/wp/2011/07/reduced-contention-during-datafile-extension/ describes an improvement to this that is in the Summer 2011 labs release. This is not a production release, just a technology demonstration.
[15 Oct 2012 13:48] Erlend Dahl
Fixed in 5.6.3