Bug #50589 Server hang on a query evaluated using a temporary table
Submitted: 25 Jan 2010 12:37 Modified: 8 Jul 2010 16:11
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Locking Severity:S2 (Serious)
Version:next-mr, next-mr-4284 OS:Any
Assigned to: Konstantin Osipov
Triage: Triaged: D2 (Serious)

[25 Jan 2010 12:37] Philip Stoev
Description:
Certain queries on PERFORMANCE_SCHEMA, when executed against a server that has --log-output=table, cause the server to temporarily hang while the query is being processed. During that time, SHOW PROCESSLIST hangs and connecting new clients is not possible.

During the hang, the backtace for SHOW PROCESSLIST is as follows:

Thread 14 (Thread 0x7fda35959950 (LWP 30790)):
#0  0x000000315b00db14 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x000000315b00921c in _L_lock_1078 () from /lib64/libpthread.so.0
#2  0x000000315b0090f0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000a06c70 in safe_mutex_lock (mp=0x103c0e0, try_lock=0 '\0', file=0xb6768d "sql_base.cc", line=2741) at thr_mutex.c:152
#4  0x000000000069b640 in inline_mysql_mutex_lock (that=0x103c0e0, src_file=0xb6768d "sql_base.cc", src_line=2741) at ../include/mysql/psi/mysql_thread.h:519
#5  0x00000000006aa9a1 in open_table (thd=0x7fda3018c488, table_list=0x7fda35957b60, mem_root=0x7fda3018eac8, refresh=0x7fda35957aa7, flags=0)
    at sql_base.cc:2741
#6  0x00000000006ab6f7 in open_ltable (thd=0x7fda3018c488, table_list=0x7fda35957b60, lock_type=TL_WRITE_CONCURRENT_INSERT, lock_flags=51)
    at sql_base.cc:4953
#7  0x00000000006ab986 in open_log_table (thd=0x7fda3018c488, one_table=0x7fda35957b60, backup=0x7fda359580d0) at sql_base.cc:8886
#8  0x000000000072480b in Log_to_csv_event_handler::log_general (this=0x7fda300017f0, thd=0x7fda3018c488, event_time=1264422845,
    user_host=0x7fda35958200 "root[root] @ localhost [127.0.0.1]", user_host_len=34, thread_id=59, command_type=0xb0e8e5 "Query", command_type_len=5,
    sql_text=0x38fcdf8 "show full processlist", sql_text_len=21, client_cs=0x10330a0) at log.cc:493
#9  0x00000000007230cd in LOGGER::general_log_write (this=0x104f800, thd=0x7fda3018c488, command=COM_QUERY, query=0x38fcdf8 "show full processlist",
    query_length=21) at log.cc:1187
#10 0x000000000072317d in general_log_write (thd=0x7fda3018c488, command=COM_QUERY, query=0x38fcdf8 "show full processlist", query_length=21) at log.cc:4811
#11 0x0000000000657e10 in dispatch_command (command=COM_QUERY, thd=0x7fda3018c488, packet=0x7fda3018ede9 "show full processlist", packet_length=21)
    at sql_parse.cc:1134
#12 0x000000000065929f in do_command (thd=0x7fda3018c488) at sql_parse.cc:810
#13 0x000000000064611e in do_handle_one_connection (thd_arg=0x7fda3018c488) at sql_connect.cc:1172
#14 0x00000000006461ed in handle_one_connection (arg=0x7fda3018c488) at sql_connect.cc:1112
#15 0x00000000009ce2a2 in pfs_spawn_thread (arg=0x7fda300a5d28) at pfs.cc:1011
#16 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#17 0x000000315a4e627d in clone () from /lib64/libc.so.6

The backtrace of the PERFORMANCE_SCHEMA looks as follows:

Thread 13 (Thread 0x7fda35c24950 (LWP 31754)):
#0  0x000000315b00dedb in write () from /lib64/libpthread.so.0
#1  0x00000000009d3cd8 in my_write (Filedes=168, Buffer=0x7fda2c2f6348 "", Count=131072, MyFlags=4) at my_write.c:41
#2  0x00000000009d7671 in my_b_flush_io_cache (info=0x7fda2c163610, need_append_buffer_lock=0) at mf_iocache.c:1760
#3  0x00000000009d6e2a in _my_b_write (info=0x7fda2c163610, Buffer=0x7fda2c061892 "", Count=2728) at mf_iocache.c:1516
#4  0x000000000096dd9b in _mi_write_static_record (info=0x7fda2c163408, record=0x7fda2c0614c8 "Ы\rOLD_PASSWORDS") at mi_statrec.c:51
#5  0x000000000097ae3d in mi_write (info=0x7fda2c163408, record=0x7fda2c0614c8 "Ы\rOLD_PASSWORDS") at mi_write.c:146
#6  0x00000000009920a0 in ha_myisam::write_row (this=0x7fda2c066508, buf=0x7fda2c0614c8 "Ы\rOLD_PASSWORDS") at ha_myisam.cc:785
#7  0x000000000078da2b in handler::ha_write_row (this=0x7fda2c066508, buf=0x7fda2c0614c8 "Ы\rOLD_PASSWORDS") at handler.cc:4722
#8  0x00000000006ca7f8 in end_write (join=0x7fda2c0d1e78, join_tab=0x7fda2c0d4328, end_of_records=false) at sql_select.cc:12521
#9  0x00000000006c6968 in evaluate_join_record (join=0x7fda2c0d1e78, join_tab=0x7fda2c0d40d0, error=0) at sql_select.cc:11532
#10 0x00000000006c6baa in sub_select (join=0x7fda2c0d1e78, join_tab=0x7fda2c0d40d0, end_of_records=false) at sql_select.cc:11413
#11 0x00000000006c6968 in evaluate_join_record (join=0x7fda2c0d1e78, join_tab=0x7fda2c0d3e78, error=0) at sql_select.cc:11532
#12 0x00000000006c6baa in sub_select (join=0x7fda2c0d1e78, join_tab=0x7fda2c0d3e78, end_of_records=false) at sql_select.cc:11413
---Type <return> to continue, or q <return> to quit---
#13 0x00000000006d7dc9 in do_select (join=0x7fda2c0d1e78, fields=0x0, table=0x7fda2c0f3108, procedure=0x0) at sql_select.cc:11159
#14 0x00000000006e832e in JOIN::exec (this=0x7fda2c0d1e78) at sql_select.cc:1865
#15 0x00000000006e4af7 in mysql_select (thd=0x7fda302f2f18, rref_pointer_array=0x7fda302f4cb0, tables=0x38cab90, wild_num=1, fields=@0x7fda302f4be8,
    conds=0x38cc128, og_num=2, order=0x7fda2c0f8c58, group=0x0, having=0x0, proc_param=0x0, select_options=2684619520, result=0x7fda2c0f91a8,
    unit=0x7fda302f4690, select_lex=0x7fda302f4ae0) at sql_select.cc:2466
#16 0x00000000006e9fa8 in handle_select (thd=0x7fda302f2f18, lex=0x7fda302f45f0, result=0x7fda2c0f91a8, setup_tables_done_option=0) at sql_select.cc:271
#17 0x000000000064c90e in execute_sqlcom_select (thd=0x7fda302f2f18, all_tables=0x38cab90) at sql_parse.cc:4953
#18 0x000000000064e6a1 in mysql_execute_command (thd=0x7fda302f2f18) at sql_parse.cc:2135
#19 0x000000000065723f in mysql_parse (thd=0x7fda302f2f18,
    inBuf=0x38ca258 "SELECT * FROM (INFORMATION_SCHEMA . `GLOBAL_VARIABLES` AS table1 RIGHT OUTER JOIN performance_schema . `EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME` AS table2 ON ( table2 . `MAX_TIMER_WAIT` = table1 "..., length=446, found_semicolon=0x7fda35c23e90) at sql_parse.cc:5989
#20 0x0000000000657ea1 in dispatch_command (command=COM_QUERY, thd=0x7fda302f2f18,
    packet=0x3ba3429 "SELECT * FROM (INFORMATION_SCHEMA . `GLOBAL_VARIABLES` AS table1 RIGHT OUTER JOIN performance_schema . `EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME` AS table2 ON ( table2 . `MAX_TIMER_WAIT` = table1 "..., packet_length=446) at sql_parse.cc:1140
#21 0x000000000065929f in do_command (thd=0x7fda302f2f18) at sql_parse.cc:810
#22 0x000000000064611e in do_handle_one_connection (thd_arg=0x7fda302f2f18) at sql_connect.cc:1172
#23 0x00000000006461ed in handle_one_connection (arg=0x7fda302f2f18) at sql_connect.cc:1112
#24 0x00000000009ce2a2 in pfs_spawn_thread (arg=0x3934208) at pfs.cc:1011
#25 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#26 0x000000315a4e627d in clone () from /lib64/libc.so.6

It is possible that the hang is related to the particular query plan used to evaluate the query (e.g. temporary tables are required). It is also possible that queries not involving performance_schema would cause the same situation, but at this time the only query that hangs the server long enough to observe the issue visually and with a debugger involves performance_schema.

How to repeat:
A test case will be uploaded shortly.
[25 Jan 2010 12:45] Philip Stoev
To reproduce:

Step 1. With the RQG , Issue the following command:

perl runall.pl --grammar=conf/WL5004_sql.yy --gendata=conf/WL5004_data.zz --basedir=/build/bzr/mysql-next-mr --mysqld=--enable-performance-schema --mysqld=--secure-file-priv=/tmp --queries=100K

This provides the background load on the server. This grammar contains a lot of DDL, but smaller-scale hangs have been observed even with a simple, DML only grammar; Logging to table is turned on by default.

The RQG will print various messages and the workload itself may cause a crash or a deadlock. You can disregard all output.

Step 2. While Step 1 is running, issue the following query using a mysql client:

 SELECT * FROM (INFORMATION_SCHEMA . `GLOBAL_VARIABLES` AS table1 RIGHT OUTER JOIN performance_schema . `EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME` AS table2 ON ( table2 . `MAX_TIMER_WAIT` = table1 . `VARIABLE_VALUE` ) ) WHERE  ( ( table2 . `SUM_TIMER_WAIT` >= table2 . `SUM_TIMER_WAIT` OR table1 . `VARIABLE_VALUE` > 0) OR table2 . `MAX_TIMER_WAIT` > table2 . `EVENT_NAME`)   ORDER BY table2 . `THREAD_ID` , table2 . `SUM_TIMER_WAIT` LIMIT 169 ;

This query will take up to 40 seconds.

Step 3. While Step 2 is running, issue SHOW PROCESSLIST in a separate mysql client. You will observe that you can not connect at all, or that SHOW PROCESSLIST will take up to 20 seconds for a command that generally executes instanteneously.
[25 Jan 2010 12:59] Philip Stoev
There is a query that causes the hang and uses performance_schema tables only

SELECT * FROM (performance_schema . `EVENTS_WAITS_SUMMARY_BY_THREAD_BY_EVENT_NAME` AS table1 RIGHT  JOIN performance_schema . `EVENTS_WAITS_HISTORY` AS table2 ON ( table2 . `FLAGS` = table1 . `MAX_TIMER_WAIT` ) ) WHERE  ( (NOT table1 . `AVG_TIMER_WAIT` <= table1 . `AVG_TIMER_WAIT` AND table2 . `OBJECT_SCHEMA` <> '2005-03-11 08:29:55') OR table2 . `THREAD_ID` >= table1 . `SUM_TIMER_WAIT`)   ORDER BY table2 . `FLAGS` , table2 . `SPINS` , table1 . `EVENT_NAME` LIMIT 151;
[25 Jan 2010 13:50] Philip Stoev
It turns out PERFORMANCE_TABLE is not required at all. If the participating tables are cloned with CREATE ... LIKE and INSERT ... SELECT, the issue is reproducible with MyISAM as well.

Also, the background load can even be --threads=1 and the issue will still show up.
[25 Jan 2010 14:15] Philip Stoev
bug50589.dump

Attachment: bug50589.dump (application/octet-stream, text), 50.92 KiB.

[25 Jan 2010 14:17] Philip Stoev
To reproduce on a tree that does not contain performance_schema, load the attached SQL dump in a separate database and then use the following SELECT

 SELECT * FROM (t1  AS table1 RIGHT OUTER JOIN
t2 AS table2 ON ( table2
. `MAX_TIMER_WAIT` = table1 . `VARIABLE_VALUE` ) ) WHERE  ( ( table2 . `SUM_TIMER_WAIT` >=
table2 . `SUM_TIMER_WAIT` OR table1 . `VARIABLE_VALUE` > 0) OR table2 . `MAX_TIMER_WAIT` >
table2 . `EVENT_NAME`)   ORDER BY table2 . `THREAD_ID` , table2 . `SUM_TIMER_WAIT` LIMIT
169 ;
[26 Jan 2010 9:08] Philip Stoev
The longest wait occurs here:

mysql> select * from EVENTS_WAITS_HISTORY_LONG where thread_id = 31 order by timer_wait desc limit 1\G
*************************** 1. row ***************************
            THREAD_ID: 31
             EVENT_ID: 174
           EVENT_NAME: wait/io/file/myisam/dfile
               SOURCE: mi_delete_table.c:69
          TIMER_START: 23409590088000
            TIMER_END: 32260511409200
           TIMER_WAIT: 8850921321200
                SPINS: NULL
        OBJECT_SCHEMA: NULL
          OBJECT_NAME: /build/bzr/mysql-next-mr/mysql-test/var/tmp/#sql_1e0f_1.MYD
          OBJECT_TYPE: FILE
OBJECT_INSTANCE_BEGIN: 139737476289808
     NESTING_EVENT_ID: NULL
            OPERATION: delete
      NUMBER_OF_BYTES: NULL
                FLAGS: 0
1 row in set (0.03 sec)

Strace shows as follows:

[pid  7748] 11:00:04.561035 unlink("/build/bzr/mysql-next-mr/mysql-test/var/tmp/#sql_1e0f_1.MYD" <unfinished ...>
[pid  7748] 11:00:13.736838 <... unlink resumed> ) = 0 <9.175728>

in other words, removing the temporary file that is being produced by the query takes many seconds, during which time locks are being held that block the rest of the server.
[26 Jan 2010 11:33] Jon Olav Hauglid
The IS query is trying to close() a temp file used in the join.
While doing this, it holds the THR_LOCK_open mutex.

While this happens, there is a chance that a statement (for example ALTER TABLE)
from the RGQ workload will try to open a table (my_open()). This will cause
it to block on THR_LOCK_open while holding LOCK_open.

In turn, this can cause SHOW PROCESSLIST to block on LOCK_open when it tries
to write the statement to the general query log.

I'm only able to reproduce this if I put the temp files on a NFS share so
accessing the temp files becomes very slow.
[26 Jan 2010 12:57] Philip Stoev
If logging to CSV is enabled, essentially all statements are affected, including SHOW PROCESSLIST, since every statement has to open the CSV log file and write to it.

If logging to CSV is not enabled, DML and DDL statements may be affected.
[16 Feb 2010 11:29] Konstantin Osipov
WL#3983
[15 Jun 2010 15:13] Konstantin Osipov
Queued into trunk-runtime, 
3054 Konstantin Osipov	2010-06-11
      WL#5419 "LOCK_open scalability: make tdc_refresh_version 
      an atomic counter"
      
      Split the large LOCK_open section in open_table(). 
      Do not call open_table_from_share() under LOCK_open.
      Remove thd->version.
      
      This fixes
      Bug#50589 "Server hang on a query evaluated using a temporary 
      table"
      Bug#51557 "LOCK_open and kernel_mutex are not happy together"
      Bug#49463 "LOCK_table and innodb are not nice when handler 
      instances are created".
      
      This patch has effect on storage engines that rely on
      ha_open() PSEA method being called under LOCK_open.
      In particular:
      
      1) NDB is broken and left unfixed. NDB relies on LOCK_open
      being kept as part of ha_open(), since it uses auto-discovery.
      While previously the NDB open code was race-prone, now
      it simply fails on asserts.
      
      2) HEAP engine had a race in ha_heap::open() when
      a share for the same table could be added twice
      to the list of shares, or a dangling reference to a share
      stored in HEAP handler. This patch aims to address this
      problem by 'pinning' the newly created share in the 
      internal HEAP engine share list until at least one
      handler instance is created using that share.
     @ include/heap.h
        Add members to HP_CREATE_INFO.
        Declare heap_release_share().
     @ sql/lock.cc
        Remove thd->version, use thd->open_tables->s->version instead.
     @ sql/repl_failsafe.cc
        Remove thd->version.
     @ sql/sql_base.cc
        - close_thread_table(): move handler cleanup code outside the critical section protected by LOCK_open.
        - remove thd->version
        - split the large critical section in open_table() that
        opens a new table from share and is protected by LOCK_open
        into 2 critical sections, thus reducing the critical path.
        - make check_if_table_exists() acquire LOCK_open internally.
        - use thd->open_tables->s->version instead of thd->refresh_version to make sure that all tables in
        thd->open_tables are in the same refresh series.
     @ sql/sql_base.h
        Add declaration for check_if_table_exists().
     @ sql/sql_class.cc
        Remove init_open_tables_state(), it's now equal to
        reset_open_tables_state().
     @ sql/sql_class.h
        Remove thd->version, THD::init_open_tables_state().
     @ sql/sql_plugin.cc
        Use table->m_needs_reopen to mark the table as stale
        rather than manipulate with thd->version, which is no more.
     @ sql/sql_udf.cc
        Use table->m_needs_reopen to mark the table as stale
        rather than manipulate with thd->version, which is no more.
     @ sql/table.h
        Remove an unused variable.
     @ sql/tztime.cc
        Use table->m_needs_reopen to mark the table as stale
        rather than manipulate with thd->version, which is no more.
     @ storage/heap/CMakeLists.txt
        Add heap tests to cmake build files.
     @ storage/heap/ha_heap.cc
        Fix a race when ha_heap::ha_open() could insert two 
        HP_SHARE objects into the internal share list or store
        a dangling reference to a share in ha_heap instance,
        or wrongly set implicit_emptied.
     @ storage/heap/hp_create.c
        Optionally pin a newly created share in the list of shares
        by increasing its open_count. This is necessary to 
        make sure that a newly created share doesn't disappear while
        a HP_INFO object is being created to reference it.
     @ storage/heap/hp_open.c
        When adding a new HP_INFO object to the list of objects
        in the heap share, make sure the open_count is not increased
        twice.
     @ storage/heap/hp_test1.c
        Adjust the test to new function signatures.
     @ storage/heap/hp_test2.c
        Adjust the test to new function signatures.
[22 Jun 2010 13:07] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100622130139-u05awgya93zvbsop) (version source revid:marko.makela@oracle.com-20100603095032-v5ptkkzt1bhz0m1d) (merge vers: 5.1.48) (pib:16)
[22 Jun 2010 13:09] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100622130623-r7yhm89fz9n5t9nb) (version source revid:alik@sun.com-20100622130528-187gd949sa9b6pa6) (pib:16)
[8 Jul 2010 16:11] Paul Dubois
Noted in 5.5.5 changelog.

Previously, the server held a global mutex while performing file
operations such as deleting an .frm or data file, or reading table
metadata from an .frm file or index statistics from a data file. Now
the mutex is not held for these operations. Instead, the server uses
metadata locks.