MySQL Bugs: #49463: LOCK_table and innodb are not nice when handler instances are created

Bug #49463	LOCK_table and innodb are not nice when handler instances are created
Submitted:	4 Dec 2009 16:57	Modified:	8 Jul 2010 16:11
Reporter:	Mark Callaghan	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Locking	Severity:	S4 (Feature request)
Version:	5.0,5.1,5.1 plugin	OS:	Any
Assigned to:	Konstantin Osipov	CPU Architecture:	Any
Tags:	Contribution, innodb, LOCK_table, restart, slow

Description:
LOCK_table is locked when table handler instances are created.
InnoDB samples index stats when table handler instances are created -- 8 random dives per index.
I don't think InnoDB shares stats between handler instances for the same table.

Database servers are useless for some time after startup. All threads are blocked in open_tables with the lock held by the thread sampling index stats.

Aggregated thread stacks from this. The data shows one thread getting index stats from innodb.

(# times this stack occurred, thread stack)

   1314 __lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,open_table,open_tables,open_and_lock_tables,mysql_execute_command,mysql_parse,dispatch_command,handle_one_connection,start_thread,clone
    275 __lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,open_table,open_tables,open_and_lock_tables,mysql_insert,mysql_execute_command,mysql_parse,dispatch_command,handle_one_connection,start_thread,clone
    262 __lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,close_thread_tables,dispatch_command,handle_one_connection,start_thread,clone
    112 __lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,open_table,open_tables,mysql_update,mysql_execute_command,mysql_parse,dispatch_command,handle_one_connection,start_thread,clone
     35 pthread_cond_wait@@GLIBC_2.3.2,end_thread,handle_one_connection,start_thread,clone
     32 mysql_parse
     32 dispatch_command,handle_one_connection,start_thread,clone
     27 __lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,open_table,open_tables,open_and_lock_tables,mysql_execute_command
      5 __lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,open_table,open_tables,mysql_update,mysql_execute_command
      4 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,os_aio_simulated_handle,fil_aio_wait,io_handler_thread,start_thread,clone
      3 __lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,open_table,open_tables,open_and_lock_tables,mysql_delete,mysql_execute_command,mysql_parse,dispatch_command,handle_one_connection,start_thread,clone
      3 __lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,open_table,open_tables,mysql_multi_update_prepare,mysql_execute_command,mysql_parse,dispatch_command,handle_one_connection,start_thread,clone
      1 select,os_thread_sleep,srv_master_thread,start_thread,clone
      1 select,os_thread_sleep,srv_lock_timeout_and_monitor_thread,start_thread,clone
      1 select,os_thread_sleep,srv_error_monitor_thread,start_thread,clone
      1 select,handle_connections_sockets,main,select
      1 pread64,os_file_pread,os_file_read,os_aio,fil_io,buf_read_page_low,buf_read_page,buf_page_get_gen,btr_cur_open_at_rnd_pos,btr_estimate_number_of_different_key_vals,dict_update_statistics_low,dict_table_get_and_increment_handle_count,ha_innobase::open,handler::ha_open,openfrm,open_unireg_entry,open_table,open_tables,mysql_update,mysql_execute_command,mysql_parse,dispatch_command,handle_one_connection,start_thread,clone
       1
__lll_mutex_lock_wait,_L_mutex_lock_1133,pthread_mutex_lock,open_table,open_tables,open_normal_and_derived_tables,get_all_tables,get_schema_tables_result,JOIN::exec,mysql_select,handle_select,mysql_execute_command,mysql_parse,dispatch_command,handle_one_connection,start_thread,clone
      1 do_sigwait,sigwait,signal_hand,start_thread,clone

How to repeat:
restart a busy server
profile it with the poor mans profiler (google for that)

Suggested fix:
Defer stats sampling until the first use of the instance. At that point LOCK_table is not held.

Patch : http://p.defau.lt/?7z_wf6C9Xk0b0XFyD9AaeA

By moving statistics out of ha_innobase::open() I got 13->105 opens/s speedup (patch similar to one above, my tests were with 5.1 though). 

do note, that in 5.1 it may make sense to move out auto-inc initialization - it isn't part of 5.0 open() routines. 

Anyway, this is very low hanging fruit that makes restarts of mysqld not feasible in quite a lot of situations .

The solution is not within InnoDB. 
There is no need to keep LOCK_open around ha_open().

> Defer stats sampling until the first use of the instance. 
> At that point LOCK_table is not held

The stats is important for the optimizer to make the right query plan, so we cannot delay it until it re-enters storage when executing a query, the query could generated with wrong stats info. 

However, stats could share between instances of the same table. And it ideally collected without have the LOCK_open held. We are looking into these two aspect.

There is an ongoing project of "Persistent Statistics" would help this as well, since the Optimizer can directly obtain the stats even after a reboot, without another round of stats sampling/collection.

http://bugs.mysql.com/bug.php?id=51557 really should be a duplicate of this. I liked the bug so much I filed it twice.

Patch for 5.1 at http://bazaar.launchpad.net/~mysqlatfacebook/mysqlatfacebook/5.1/revision/3470

We can contribute this. I don't know if it is relevant given the changes in 5.5

Thanks, Mark. We hope to see if you can try the new 5.5 release with LOCK_open released before table open, and see if that resolves your problem to some degree. Or if statisticss collection during open still takes too much time during start up time, then we might want to consider moving the stats collection away from table open, and let optimizer to call info to fetch the stats when it needed. In the end, the persistent stats project might give you a complete solution.

BTW, somehow, the above link seems to be not accessible? http://bazaar.launchpad.net/~mysqlatfacebook/mysqlatfacebook/5.1/revision/3470

The link is broken for me too. But the URL is valid and I can 'bzr pull' that change.

In 5.5 is LOCK_open or whatever replaced LOCK_open held when handler::open is called

Mark,

> In 5.5 is LOCK_open or whatever replaced LOCK_open held when handler::open is called

In 5.5, LOCK_open is released before handler::open is called. However, the stats collection are still being performed in ha_innodb::open during first time access to the table (in start up), however, without holding the LOCK_open.

Discussed with Konstantin, and we will move index status collection (include ::info) away from table open. MySQL will call ::info with HA_STATUS_VARIABLE and HA_STATUS_CONST flag:

"<kostja> Jimmy: you can safely assume that whenever we want innodb with stats we'll run the extra"

This change will be committed only when MySQL is done necessary code change to deal with table open with out stats collected.

-Jimmy

Queued into 5.5.5 runtime, tagged 5.5.5
3054 Konstantin Osipov	2010-06-11
      WL#5419 "LOCK_open scalability: make tdc_refresh_version 
      an atomic counter"
      
      Split the large LOCK_open section in open_table(). 
      Do not call open_table_from_share() under LOCK_open.
      Remove thd->version.
      
      This fixes
      Bug#50589 "Server hang on a query evaluated using a temporary 
      table"
      Bug#51557 "LOCK_open and kernel_mutex are not happy together"
      Bug#49463 "LOCK_table and innodb are not nice when handler 
      instances are created".
      
      This patch has effect on storage engines that rely on
      ha_open() PSEA method being called under LOCK_open.
      In particular:
      
      1) NDB is broken and left unfixed. NDB relies on LOCK_open
      being kept as part of ha_open(), since it uses auto-discovery.
      While previously the NDB open code was race-prone, now
      it simply fails on asserts.
      
      2) HEAP engine had a race in ha_heap::open() when
      a share for the same table could be added twice
      to the list of shares, or a dangling reference to a share
      stored in HEAP handler. This patch aims to address this
      problem by 'pinning' the newly created share in the 
      internal HEAP engine share list until at least one
      handler instance is created using that share.
     @ include/heap.h
        Add members to HP_CREATE_INFO.
        Declare heap_release_share().
     @ sql/lock.cc
        Remove thd->version, use thd->open_tables->s->version instead.
     @ sql/repl_failsafe.cc
        Remove thd->version.
     @ sql/sql_base.cc
        - close_thread_table(): move handler cleanup code outside the critical section
protected by LOCK_open.
        - remove thd->version
        - split the large critical section in open_table() that
        opens a new table from share and is protected by LOCK_open
        into 2 critical sections, thus reducing the critical path.
        - make check_if_table_exists() acquire LOCK_open internally.
        - use thd->open_tables->s->version instead of thd->refresh_version to make sure
that all tables in
        thd->open_tables are in the same refresh series.
     @ sql/sql_base.h
        Add declaration for check_if_table_exists().
     @ sql/sql_class.cc
        Remove init_open_tables_state(), it's now equal to
        reset_open_tables_state().
     @ sql/sql_class.h
        Remove thd->version, THD::init_open_tables_state().
     @ sql/sql_plugin.cc
        Use table->m_needs_reopen to mark the table as stale
        rather than manipulate with thd->version, which is no more.
     @ sql/sql_udf.cc
        Use table->m_needs_reopen to mark the table as stale
        rather than manipulate with thd->version, which is no more.
     @ sql/table.h
        Remove an unused variable.
     @ sql/tztime.cc
        Use table->m_needs_reopen to mark the table as stale
        rather than manipulate with thd->version, which is no more.
     @ storage/heap/CMakeLists.txt
        Add heap tests to cmake build files.
     @ storage/heap/ha_heap.cc
        Fix a race when ha_heap::ha_open() could insert two 
        HP_SHARE objects into the internal share list or store
        a dangling reference to a share in ha_heap instance,
        or wrongly set implicit_emptied.
     @ storage/heap/hp_create.c
        Optionally pin a newly created share in the list of shares
        by increasing its open_count. This is necessary to 
        make sure that a newly created share doesn't disappear while
        a HP_INFO object is being created to reference it.
     @ storage/heap/hp_open.c
        When adding a new HP_INFO object to the list of objects
        in the heap share, make sure the open_count is not increased
        twice.
     @ storage/heap/hp_test1.c
        Adjust the test to new function signatures.
     @ storage/heap/hp_test2.c
        Adjust the test to new function signatures.

Good to read about these changes and moving stats out of open.

Pushed into 5.5.5-m3 (revid:alik@sun.com-20100622130139-u05awgya93zvbsop) (version source revid:marko.makela@oracle.com-20100603095032-v5ptkkzt1bhz0m1d) (merge vers: 5.1.48) (pib:16)

Pushed into mysql-next-mr (revid:alik@sun.com-20100622130623-r7yhm89fz9n5t9nb) (version source revid:alik@sun.com-20100622130528-187gd949sa9b6pa6) (pib:16)

Noted in 5.5.5 changelog.

Previously, the server held a global mutex while performing file
operations such as deleting an .frm or data file, or reading table
metadata from an .frm file or index statistics from a data file. Now
the mutex is not held for these operations. Instead, the server uses
metadata locks.