Description:
When aggregate_thread_status is called for other threads, it's possible for that thread to have exited and freed the THD between the time we check that the thread was valid, until the time we call get_thd_status_var.
This causes us to access invalid memory.
How to repeat:
To make race more likely, add:
diff --git a/sql/mysqld.cc b/sql/mysqld.cc
index c30315d4702..88a6d92ccc9 100644
--- a/sql/mysqld.cc
+++ b/sql/mysqld.cc
@@ -1610,6 +1610,9 @@ ulong sql_rnd_with_mutex() {
}
struct System_status_var *get_thd_status_var(THD *thd, bool *aggregated) {
+ if (thd->thread_id() > 8) {
+ my_sleep(5000 * 1000);
+ }
*aggregated = thd->status_var_aggregated;
return &thd->status_var;
}
Then run this in mtr with --valgrind:
create table t1 (a int);
let $count= 4000;
while ($count)
{
connect (con1,localhost,root,,,,,);
eval insert into t1 values ($count);
dec $count;
connect (con2,localhost,root,,,,,);
eval insert into t1 values ($count);
dec $count;
connection default;
--send truncate table performance_schema.status_by_account
disconnect con1;
disconnect con2;
connection default;
--reap
}
drop table t1;
Then you can check mysqld.1.err for errors during execution:
==2597531== Thread 39:
==2597531== Invalid read of size 1
==2597531== at 0x2DDD88E: get_thd_status_var(THD*, bool*) (mysqld.cc:1616)
==2597531== by 0x4C32F4D: aggregate_thread_status(PFS_thread*, PFS_account*, PFS_user*, PFS_host*) (pfs_instr.cc:1757)
==2597531== by 0x4C4EF63: fct_reset_status_by_thread(PFS_thread*) (pfs_status.cc:86)
==2597531== by 0x4C2B879: PFS_buffer_scalable_container<PFS_thread, 256, 256, PFS_thread_array, PFS_thread_allocator>::apply_all(void (*)(PFS_thread*)) (pfs_buffer_container.h:721)
==2597531== by 0x4C4EF80: reset_status_by_thread() (pfs_status.cc:92)
==2597531== by 0x4CA0757: table_status_by_account::delete_all_rows() (table_status_by_account.cc:110)
==2597531== by 0x4C0CB7B: ha_perfschema::delete_all_rows() (ha_perfschema.cc:1699)
==2597531== by 0x4C0CBE4: ha_perfschema::truncate(dd::Table*) (ha_perfschema.cc:1706)
==2597531== by 0x330F505: handler::ha_truncate(dd::Table*) (handler.cc:4769)
==2597531== by 0x35EAE03: handler_truncate_base(THD*, TABLE_LIST*, dd::Table*) (sql_truncate.cc:213)
==2597531== by 0x35EBD79: Sql_cmd_truncate_table::truncate_base(THD*, TABLE_LIST*) (sql_truncate.cc:566)
==2597531== by 0x35EC741: Sql_cmd_truncate_table::execute(THD*) (sql_truncate.cc:740)
==2597531== by 0x2FA1869: mysql_execute_command(THD*, bool) (sql_parse.cc:4478)
==2597531== by 0x2FA41B6: mysql_parse(THD*, Parser_state*) (sql_parse.cc:5288)
==2597531== by 0x2F995D9: dispatch_command(THD*, COM_DATA const*, enum_server_command) (sql_parse.cc:1777)
==2597531== by 0x2F97B03: do_command(THD*) (sql_parse.cc:1275)
==2597531== Address 0x2865e200 is 3,536 bytes inside a block of size 12,080 free'd
==2597531== at 0x84F416D: operator delete(void*) (vg_replace_malloc.c:576)
==2597531== by 0x2EDEA48: THD::~THD() (sql_class.cc:1132)
==2597531== by 0x315AD56: handle_connection (connection_handler_per_thread.cc:326)
==2597531== by 0x4C103F3: pfs_spawn_thread (pfs.cc:2854)
==2597531== by 0x8706DD4: start_thread (in /usr/lib64/libpthread-2.17.so)
==2597531== by 0xA406EAC: clone (in /usr/lib64/libc-2.17.so)
==2597531== Block was alloc'd at
==2597531== at 0x84F3436: operator new(unsigned long, std::nothrow_t const&) (vg_replace_malloc.c:377)
==2597531== by 0x35F595D: Channel_info::create_thd() (channel_info.cc:45)
==2597531== by 0x316011A: Channel_info_local_socket::create_thd() (socket_connection.cc:166)
==2597531== by 0x315A9C1: init_new_thd(Channel_info*) (connection_handler_per_thread.cc:194)
==2597531== by 0x315AB86: handle_connection (connection_handler_per_thread.cc:263)
==2597531== by 0x4C103F3: pfs_spawn_thread (pfs.cc:2854)
==2597531== by 0x8706DD4: start_thread (in /usr/lib64/libpthread-2.17.so)
==2597531== by 0xA406EAC: clone (in /usr/lib64/libc-2.17.so)
Suggested fix:
We can move the status_var_aggregated flag into PFS instead of keeping it on the THD.
Alternatively, we can reuse the same trick in performance_schema.session_status (and related) tables, where we try to find and lock a THD pointer, to ensure it does not get deleted while we're reading from it.