Bug #91012 exchage partition leads to SIGSEGV if storage engine returns error
Submitted: 24 May 2018 14:32 Modified: 28 May 2018 15:11
Reporter: Vlad Lesin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[24 May 2018 14:32] Vlad Lesin
Description:
Look at the following code.

bool Sql_cmd_alter_table_exchange_partition::
  exchange_partition(THD *thd, TABLE_LIST *table_list, Alter_info *alter_info)
{
...
  int ha_error= part_handler->exchange_partition(part_file_name,
                                                 swap_file_name,
                                                 swap_part_id,
                                                 part_table_def,
                                                 swap_table_def);

  if (ha_error)
  {
    part_table->file->print_error(ha_error, MYF(0));
    // Close TABLE instances which marked as old earlier.
    close_all_tables_for_name(thd, swap_table->s, false, NULL);
    close_all_tables_for_name(thd, part_table->s, false, NULL);
...
    if ((part_table->file->ht->flags & HTON_SUPPORTS_ATOMIC_DDL) &&
        part_table->file->ht->post_ddl)
      part_table->file->ht->post_ddl(thd);
    (void) thd->locked_tables_list.reopen_tables(thd);
    DBUG_RETURN(true);
  }
...
}

If handler::exchange_partition(...) returns error then close_all_tables_for_name(...) is invoked, what, in turns, leads to
part_table->file zeroing out. See the following backtrace:

(gdb) 
closefrm (table=0x7fff542c13f0, free_share=true) at ./sql/table.cc:3470
3470      table->file= 0;                               /* For easier errorchecking */
(gdb) bt
#0  closefrm (table=0x7fff542c13f0, free_share=true) at ./sql/table.cc:3470
#1  0x0000000002a3dae6 in intern_close_table (table=0x7fff542c13f0) at ./sql/sql_base.cc:1165
#2  0x0000000002a3ee46 in release_or_close_table (thd=0x7fff5435d990, table=0x7fff542c13f0) at ./sql/sql_base.cc:1858
#3  0x0000000002a3f0a3 in close_thread_table (thd=0x7fff5435d990, table_ptr=0x7fff5435d9f8) at ./sql/sql_base.cc:1904
#4  0x0000000002a3e698 in close_all_tables_for_name (thd=0x7fff5435d990, key=0x7fffe7efc520 "test", key_length=8, db=0x7fffe7efc520 "test", 
    table_name=0x7fffe7efc525 "t1", remove_from_locked_tables=false, skip_table=0x0) at ./sql/sql_base.cc:1557
#5  0x0000000002a3e79c in close_all_tables_for_name (thd=0x7fff5435d990, share=0x7fff543b3fa0, remove_from_locked_tables=false, skip_table=0x0)
    at ./sql/sql_base.cc:1588
#6  0x0000000003234d9b in Sql_cmd_alter_table_exchange_partition::exchange_partition (this=0x7fff540ea230, thd=0x7fff5435d990, table_list=0x7fff540e96d8, 
    alter_info=0x7fffe7efe130) at ./sql/sql_partition_admin.cc:512
#7  0x00000000032337d4 in Sql_cmd_alter_table_exchange_partition::execute (this=0x7fff540ea230, thd=0x7fff5435d990)
    at ./sql/sql_partition_admin.cc:113
#8  0x0000000002ae5ee6 in mysql_execute_command (thd=0x7fff5435d990, first_level=true) at ./sql/sql_parse.cc:4645
#9  0x0000000002ae84ea in mysql_parse (thd=0x7fff5435d990, parser_state=0x7fffe7eff390) at ./sql/sql_parse.cc:5441
#10 0x0000000002adde8c in dispatch_command (thd=0x7fff5435d990, com_data=0x7fffe7effcf0, command=COM_QUERY)
    at ./sql/sql_parse.cc:1730
#11 0x0000000002adc850 in do_command (thd=0x7fff5435d990) at ./sql/sql_parse.cc:1310
#12 0x0000000002e64ea7 in handle_connection (arg=0x8c6be90) at ./sql/conn_handler/connection_handler_per_thread.cc:335
#13 0x000000000434794e in pfs_spawn_thread (arg=0x8cb4ca0) at ./storage/perfschema/pfs.cc:2994
#14 0x00007ffff79be6ba in start_thread (arg=0x7fffe7f00700) at pthread_create.c:333
#15 0x00007ffff5d7341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

After that null pointer is dereferenced in the following code:

    if ((part_table->file->ht->flags & HTON_SUPPORTS_ATOMIC_DDL) &&
        part_table->file->ht->post_ddl)
      part_table->file->ht->post_ddl(thd);

How to repeat:
The short way is just code review. The more complex way - create storage engine which does not implement handler::exchange_partition_low(...) method.

Suggested fix:
Invoke close_all_tables_for_name() after part_table->file dereferencing.
[25 May 2018 12:38] MySQL Verification Team
Hi,

Thank you for your report. However, we can not proceed any further without additional info:

* What 8.0 release are you using 
* What is the source code file from which you extracted the code and what are the line numbers
* What is EXACTLY the storage engine used that exhibits this problem. This is because none of SE supported by us exhibits these problems
* Can you provide us even with a rough test case with InnoDB SE that would demonstrate the problem

Thank you in advance.
[28 May 2018 13:43] Vlad Lesin
Sorry, the bug was fixed in the following commit:

commit bbbd44a5d0ab918790d2d82187beb1f161ffd1db
Author: Dmitry Lenev <dmitry.lenev@oracle.com>
Date:   Tue Jan 9 15:23:46 2018 +0300

    Fix for bug #27320682 "EXCHANGE PARTITION FAILURE LEADS TO CRASH".
[28 May 2018 15:11] MySQL Verification Team
According last comment.
[28 May 2018 15:11] MySQL Verification Team
According last comment.