Bug #99180 Accessing freed memory in perfschema when aggregating status vars
Submitted: 4 Apr 2020 21:31 Modified: 13 Jan 2022 15:59
Reporter: Manuel Ung Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S2 (Serious)
Version:8.0.19 OS:Any
Assigned to: Marc ALFF CPU Architecture:Any

[4 Apr 2020 21:31] Manuel Ung
Description:
When aggregate_thread_status is called for other threads, it's possible for that thread to have exited and freed the THD between the time we check that the thread was valid, until the time we call get_thd_status_var.

This causes us to access invalid memory.

How to repeat:
To make race more likely, add:

diff --git a/sql/mysqld.cc b/sql/mysqld.cc
index c30315d4702..88a6d92ccc9 100644
--- a/sql/mysqld.cc
+++ b/sql/mysqld.cc
@@ -1610,6 +1610,9 @@ ulong sql_rnd_with_mutex() {
 }

 struct System_status_var *get_thd_status_var(THD *thd, bool *aggregated) {
+  if (thd->thread_id() > 8) {
+    my_sleep(5000 * 1000);
+  }
   *aggregated = thd->status_var_aggregated;
   return &thd->status_var;
 }

Then run this in mtr with --valgrind:

create table t1 (a int);

let $count= 4000;
while ($count)
{
  connect (con1,localhost,root,,,,,);
  eval insert into t1 values ($count);
  dec $count;

  connect (con2,localhost,root,,,,,);
  eval insert into t1 values ($count);
  dec $count;

  connection default;
  --send truncate table performance_schema.status_by_account

  disconnect con1;
  disconnect con2;

  connection default;
  --reap
}

drop table t1;

Then you can check mysqld.1.err for errors during execution:

==2597531== Thread 39:
==2597531== Invalid read of size 1
==2597531==    at 0x2DDD88E: get_thd_status_var(THD*, bool*) (mysqld.cc:1616)
==2597531==    by 0x4C32F4D: aggregate_thread_status(PFS_thread*, PFS_account*, PFS_user*, PFS_host*) (pfs_instr.cc:1757)
==2597531==    by 0x4C4EF63: fct_reset_status_by_thread(PFS_thread*) (pfs_status.cc:86)
==2597531==    by 0x4C2B879: PFS_buffer_scalable_container<PFS_thread, 256, 256, PFS_thread_array, PFS_thread_allocator>::apply_all(void (*)(PFS_thread*)) (pfs_buffer_container.h:721)
==2597531==    by 0x4C4EF80: reset_status_by_thread() (pfs_status.cc:92)
==2597531==    by 0x4CA0757: table_status_by_account::delete_all_rows() (table_status_by_account.cc:110)
==2597531==    by 0x4C0CB7B: ha_perfschema::delete_all_rows() (ha_perfschema.cc:1699)
==2597531==    by 0x4C0CBE4: ha_perfschema::truncate(dd::Table*) (ha_perfschema.cc:1706)
==2597531==    by 0x330F505: handler::ha_truncate(dd::Table*) (handler.cc:4769)
==2597531==    by 0x35EAE03: handler_truncate_base(THD*, TABLE_LIST*, dd::Table*) (sql_truncate.cc:213)
==2597531==    by 0x35EBD79: Sql_cmd_truncate_table::truncate_base(THD*, TABLE_LIST*) (sql_truncate.cc:566)
==2597531==    by 0x35EC741: Sql_cmd_truncate_table::execute(THD*) (sql_truncate.cc:740)
==2597531==    by 0x2FA1869: mysql_execute_command(THD*, bool) (sql_parse.cc:4478)
==2597531==    by 0x2FA41B6: mysql_parse(THD*, Parser_state*) (sql_parse.cc:5288)
==2597531==    by 0x2F995D9: dispatch_command(THD*, COM_DATA const*, enum_server_command) (sql_parse.cc:1777)
==2597531==    by 0x2F97B03: do_command(THD*) (sql_parse.cc:1275)
==2597531==  Address 0x2865e200 is 3,536 bytes inside a block of size 12,080 free'd
==2597531==    at 0x84F416D: operator delete(void*) (vg_replace_malloc.c:576)
==2597531==    by 0x2EDEA48: THD::~THD() (sql_class.cc:1132)
==2597531==    by 0x315AD56: handle_connection (connection_handler_per_thread.cc:326)
==2597531==    by 0x4C103F3: pfs_spawn_thread (pfs.cc:2854)
==2597531==    by 0x8706DD4: start_thread (in /usr/lib64/libpthread-2.17.so)
==2597531==    by 0xA406EAC: clone (in /usr/lib64/libc-2.17.so)
==2597531==  Block was alloc'd at
==2597531==    at 0x84F3436: operator new(unsigned long, std::nothrow_t const&) (vg_replace_malloc.c:377)
==2597531==    by 0x35F595D: Channel_info::create_thd() (channel_info.cc:45)
==2597531==    by 0x316011A: Channel_info_local_socket::create_thd() (socket_connection.cc:166)
==2597531==    by 0x315A9C1: init_new_thd(Channel_info*) (connection_handler_per_thread.cc:194)
==2597531==    by 0x315AB86: handle_connection (connection_handler_per_thread.cc:263)
==2597531==    by 0x4C103F3: pfs_spawn_thread (pfs.cc:2854)
==2597531==    by 0x8706DD4: start_thread (in /usr/lib64/libpthread-2.17.so)
==2597531==    by 0xA406EAC: clone (in /usr/lib64/libc-2.17.so)

Suggested fix:
We can move the status_var_aggregated flag into PFS instead of keeping it on the THD.

Alternatively, we can reuse the same trick in performance_schema.session_status (and related) tables, where we try to find and lock a THD pointer, to ensure it does not get deleted while we're reading from it.
[6 Apr 2020 9:02] MySQL Verification Team
Hello Manuel Ung,

Thank you for the bug report and feedback.

regards,
Umesh
[17 Apr 2020 19:24] Manuel Ung
Simply moving the status_var_aggregated flag into PFS is probably not sufficient, since we have to read the actual status vars off the THD as well. This probably needs the 2nd approach to find + lock the THD for safety.
[13 Jan 2022 15:57] Marc ALFF
Fixed by:
  Bug #104447 	Contribution by Facebook: Fix freed memory access in performance schema tab ...
in MySQL 8.0.29