Bug #85304 Reduce mutex contention of fil_system->mutex
Submitted: 4 Mar 2017 8:26 Modified: 2 Sep 2017 12:52
Reporter: zhai weixiang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[4 Mar 2017 8:26] zhai weixiang
Description:
While investigating the write performance of mysql5.7, I occasionally observed some threads waits for fil_system->mutex with backtrace bellow (from pt-pmp):

  21 pthread_cond_wait,wait(os0event.cc:170),wait_low(os0event.cc:170),os_event_wait_low(os0event.cc:170),sync_array_wait_event(sync0arr.cc:475),wait(ut0mutex.ic:89),spin_and_try_lock(ut0mutex.ic:89),enter(ut0mutex.ic:89),PolicyMutex<TTASEventMutex<GenericPolicy>(ut0mutex.ic:89),fil_space_get(fil0fil.cc:427),mtr_t::lookup_user_space(mtr0mtr.cc:766),set_named_space(mtr0mtr.h:367),row_ins_clust_index_entry_low(mtr0mtr.h:367),row_ins_clust_index_entry(row0ins.cc:3299),row_ins_index_entry(row0ins.cc:3429),row_ins_index_entry_step(row0ins.cc:3429),row_ins(row0ins.cc:3429),row_ins_step(row0ins.cc:3429),row_insert_for_mysql_using_ins_graph(row0mysql.cc:1716),ha_innobase::write_row(ha_innodb.cc:8087),handler::ha_write_row(handler.cc:8047),write_record(sql_insert.cc:1871),Sql_cmd_insert::mysql_insert(sql_insert.cc:769),Sql_cmd_insert::execute(sql_insert.cc:3102),mysql_execute_command(sql_parse.cc:3764),mysql_parse(sql_parse.cc:5795),dispatch_command(sql_parse.cc:1497),do_command(sql_parse.cc:1005),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)

      1 fil_space_get_by_id(libc.so.6),fil_space_get(libc.so.6),mtr_t::lookup_user_space(mtr0mtr.cc:766),set_named_space(mtr0mtr.h:367),row_ins_clust_index_entry_low(mtr0mtr.h:367),row_ins_clust_index_entry(row0ins.cc:3299),row_ins_index_entry(row0ins.cc:3429),row_ins_index_entry_step(row0ins.cc:3429),row_ins(row0ins.cc:3429),row_ins_step(row0ins.cc:3429),row_insert_for_mysql_using_ins_graph(row0mysql.cc:1716),ha_innobase::write_row(ha_innodb.cc:8087),handler::ha_write_row(handler.cc:8047),write_record(sql_insert.cc:1871),Sql_cmd_insert::mysql_insert(sql_insert.cc:769),Sql_cmd_insert::execute(sql_insert.cc:3102),mysql_execute_command(sql_parse.cc:3764),mysql_parse(sql_parse.cc:5795),dispatch_command(sql_parse.cc:1497),do_command(sql_parse.cc:1005),handle_connection(connection_handler_per_thread.cc:300),pfs_spawn_thread(pfs.cc:2188),start_thread(libpthread.so.0),clone(libc.so.6)

So the function fil_space_get() is invoked before a mini transaction starts to make something dirty. and this lead to mutex contention if the server is under high coucnrrenty DML workloads.

 

How to repeat:
High concurrent DML workloads and check the output of pt-pmp (or performance schema)

Suggested fix:
I am not sure if it's safe to store the pointer of fil_space_t into dict_table_t. so we can directly refers to it while invoking mtr.set_named_space.  

A proof-of-concept patch will be attached later.
[4 Mar 2017 8:33] zhai weixiang
a simple patch based on 5.7.17

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: reduce-space-lookup.diff (application/octet-stream, text), 13.30 KiB.

[6 Mar 2017 5:25] Umesh Shastry
Hello Zhai,

Thank you for the report and contribution.

Thanks,
Umesh
[6 Mar 2017 6:03] Sunny Bains
This problem is on our radar. There are two parts to this fix.

 1. Remove the overhead introduced by WL#7142
 2. Reduce the overhead of fil_sys mutex in general

In 8.0.1 we have removed the WL#7142 code. The fil sys mutex is not in the path of mtr commit. For #2 stay tuned :-)
[11 Mar 2017 18:03] Mark Callaghan
Is this still relevant? https://bugs.mysql.com/bug.php?id=69276
[13 Mar 2017 7:32] Dimitri Kravtchuk
yes, the problem with fil_system mutex contention was here from a long time, and yet more amplified in 5.7, etc.. - so we really hope to get a rid of it by 8.0 GA.. - stay tuned ;-)

Rgds,
-Dimitri
[2 Sep 2017 12:52] zhai weixiang
I think this bug should have been fixed in current MySQL8.0 by this commit:
 201b2b20d110bc35ddf699754571cb0c064a3f72