Bug #114086 Item_singlerow_subselect::store may lead to memory corruption
Submitted: 21 Feb 2024 19:41 Modified: 26 Feb 2024 11:00
Reporter: Eldor Bekpulatov Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[21 Feb 2024 19:41] Eldor Bekpulatov
Description:
while we try to compare fields we need to see if we can convert them to a Item class that can be compared. If we can fix the filed, then we do so. one such example is given by this query.

SELECT * FROM t1 
WHERE (`x1`,`x2`) = (SELECT `x1`,'T' FROM t1);

Here we compare literal 'T' with `x2` column. The thing is, when the table character set is different than the one of the literal, which is dictated by the system variable character_set_client, we try to fix it by swapping the <parse_tree_node> with a Item_func_conv_charset class.  

This all works fine up until when we need to send the results back to the client, until when we try to cache some items. See the function below when we are iterating over items and call Item_singlerow_subselect::store():

bool Query_result_scalar_subquery::send_data(List<Item> &items)
{
  ...
  Item_singlerow_subselect *it= (Item_singlerow_subselect *)item;
  ...
  List_iterator_fast<Item> li(items);
  Item *val_item;
  for (uint i= 0; (val_item= li++); i++)
    it->store(i, val_item);             <<---- HERE
  ...
}

store() is defined as follwoing, where each item in row[] must be of sub-type <Item_cache>

void Item_singlerow_subselect::store(uint i, Item *item)
{
  row[i]->store(item);
  row[i]->cache_value();
}

However, when we have to swap some items out to Item_func_conv_charset using function THD::change_item_tree(), we get Item_func_conv_charset. 

see gdb:
#0  Item_singlerow_subselect::store (this=0x7f11d8934238, i=1, item=0x7f11d8006b40) at /mysql-server/sql/item_subselect.cc:1173
(gdb) p row[i]
$14 = (Item_cache *) 0x7f11d8934f28
(gdb) p *row[i]
$15 = {<Item_basic_constant> = {<Item> = {<Parse_tree_node> = {
      _vptr.Parse_tree_node = 0x2b0b3e0 <vtable for Item_func_conv_charset+16>, contextualized = true,
      transitional = false}, is_expensive_cache = -1 '\377', rsize = 0, str_value = {m_ptr = 0x7f11d800fa20 "T",
      m_length = 1, m_charset = 0x2bb15e0 <my_charset_latin1>, m_alloced_length = 0, m_is_alloced = true},
    item_name = {<Name_string> = {<Simple_cstring> = {m_str = 0x0, m_length = 0}, <No data fields>},
      m_is_autogenerated = true}, orig_name = {<Name_string> = {<Simple_cstring> = {m_str = 0x0,
          m_length = 0}, <No data fields>}, m_is_autogenerated = true}, next = 0x7f11d8934c60, max_length = 1,
    marker = 0, decimals = 31 '\037', maybe_null = 1 '\001', null_value = 0 '\000', unsigned_flag = 0 '\000',
    with_sum_func = 0 '\000', fixed = 1 '\001', collation = {collation = 0x2bb15e0 <my_charset_latin1>,
      derivation = DERIVATION_IMPLICIT, repertoire = 3}, cmp_context = 4294967295, runtime_item = false,
    derived_used = false, with_subselect = 0 '\000', with_stored_program = 0 '\000', tables_locked_cache = false,
    is_parser_item = false}, used_table_map = 0}, example = 0x7f11d8934fd0, used_table_map = 139714624703584,
cached_field = 0x8f8f8f8f8f8f8f8f, cached_field_type = 2408550145, value_cached = true}

The problem is, Item_func_conv_charset is not binary compatible with Item_cache class (at least their vtables are different). Item_func_conv_charset does not have store() and cache_value() functions, therefore we end up calling a random function at some offsets within Item_func_conv_charset class. 

see which function each version leads to:
in 5.7.40 
  row[i]->store()      -> virtual table_map get_initial_pseudo_tables() const { return 0; }
  row[i]->cache_value()-> virtual optimize_type select_optimize() const { return OPTIMIZE_NONE; }

in 8.0.36
  row[i]->store()      -> virtual uint argument_count() const { return arg_count; }
  row[i]->cache_value()-> virtual enum Functype functype() const { return UNKNOWN_FUNC; }

How to repeat:
USE test;

show variables like "character_set_client";
-- verify that its something like utf8mb4

create table t1 (
`x1` varchar(20),
`x2` char(1)
) ENGINE=InnoDB 
CHARACTER SET latin1 
COLLATE latin1_swedish_ci;

insert into t1 values ('tmp1','T');

SELECT * FROM t1 
WHERE (`x1`,`x2`) = 
    (SELECT `x1`,'T' FROM t1);

-- OPEN GDB TO EXAMINE --
gdb attach -p $(pgrep mysqld)
b THD::change_item_tree
b Query_result_scalar_subquery::send_data
b Item_singlerow_subselect::store

Suggested fix:
  void Item_singlerow_subselect::store(uint i, Item *item) {
+   if (row[i]->type() != Item::CACHE_ITEM) return;
    row[i]->store(item);
    row[i]->cache_value();
  }

This is not a fix but a patch to avoid bad paths. IF condition avoids the problematic codepath.

Reason:
- all Item types support type() function
- only Item_cache and its sub classes support store() and cache_value()
- Item_cache class overrides the virtual type() function with CACHE_ITEM
- no child class of Item_cache overrides the type() with anything else
[22 Feb 2024 11:25] MySQL Verification Team
Hi Mr. Bekpulatov,

Thank you for your bug report.

However, when we run your test case, we can not repeat it with our 8.0.36 binary.

We get the following output from the select command:

mysql> SELECT * FROM t1
    -> WHERE (`x1`,`x2`) =
    ->     (SELECT `x1`,'T' FROM t1);
+------+------+
| x1   | x2   |
+------+------+
| tmp1 | T    |
+------+------+

We do not see what is wrong in the above result.

Can't repeat.
[22 Feb 2024 18:02] Eldor Bekpulatov
Please consider retrying the setup with GDB/debugger attached at those specified breakpoints. You will see the incorrect behavior.  

This is currently a non-issue, because v-table translations between Item_func_conv_charset and Item_cache classes are leading to "lucky" outcomes. meaning that they are calling functions that return a constant and it is discarded, but if you were to introduce any random function Item_func_conv_charset that modifies the object state, it would be lead to memory corruption. Please do look into it.
[23 Feb 2024 11:39] MySQL Verification Team
Hi Mr. Bekpulatov,

Thanks for the feedback.

However, let us inform you about this forum.

We could either observe wrong results or memory corruption problems.

We got correct results and we have also built a binary with full ASAN instrumentation for the memory checking and it returned no memory problems.

If you manage to get wrong results or reports with memory-checking software, we will be happy to reconsider this report.
[26 Feb 2024 11:00] MySQL Verification Team
Hi Mr. Bekpulatov,

Upon further analysis, we concluded that your code analysis is correct and hence, we are verifying it.

This is now a verified bug affecting version 8.0 and higher.
[29 Jan 9:11] Tor Didriksen
Posted by developer:
 
UBSAN on head of current trunk still says

sql/item_subselect.cc:1133:13: runtime error: member call on address 0x7fff18ac3a50 which does not point to an object of type 'Item_cache'
0x7fff18ac3a50: note: object is of type 'Item_string'
 8f 8f 8f 8f  b0 ca 8c 09 00 00 00 00  01 8f 8f 8f 8f 8f 8f 8f  00 00 00 00 00 00 00 00  00 00 00 00
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'Item_string'
[Detaching after fork from child process 4082279]
    #0 0x00000156ca8b in Item_singlerow_subselect::store(unsigned int, Item*) sql/item_subselect.cc:1133:13
    #1 0x00000156c5aa in Query_result_scalar_subquery::send_data(THD*, mem_root_deque<Item*> const&) sql/item_subselect.cc:890:9
    #2 0x000000bafa53 in Query_expression::ExecuteIteratorQuery(THD*) sql/sql_union.cc:1136:25
    #3 0x000000bb091e in Query_expression::execute(THD*) sql/sql_union.cc:1183:10
    #4 0x000001569baf in Item_subselect::exec(THD*) sql/item_subselect.cc:733:36
    #5 0x00000157225f in Item_singlerow_subselect::bring_value() sql/item_subselect.cc:1203:8
    #6 0x0000013178aa in Arg_comparator::compare_row() sql/item_cmpfunc.cc:2099:13
    #7 0x00000131db4d in Item_func_eq::val_int() sql/item_cmpfunc.cc:2586:25
    #8 0x000001bda831 in FilterIterator::Read() sql/iterators/composite_iterators.cc:100:33
    #9 0x000000baf87e in Query_expression::ExecuteIteratorQuery(THD*) sql/sql_union.cc:1121:36
    #10 0x000000bb091e in Query_expression::execute(THD*) sql/sql_union.cc:1183:10
    #11 0x000000a04398 in Sql_cmd_dml::execute_inner(THD*) sql/sql_select.cc:1128:15
    #12 0x0000009ff59f in Sql_cmd_dml::execute(THD*) sql/sql_select.cc:790:7
    #13 0x00000088360a in mysql_execute_command(THD*, bool) sql/sql_parse.cc:4743:29
    #14 0x000000877ef6 in dispatch_sql_command(THD*, Parser_state*, bool) sql/sql_parse.cc:5405:19
    #15 0x00000086a2d5 in dispatch_command(THD*, COM_DATA const*, enum_server_command) sql/sql_parse.cc:2137:7
    #16 0x0000008726c1 in do_command(THD*) sql/sql_parse.cc:1480:18
    #17 0x000000ddd4a3 in handle_connection(void*) sql/conn_handler/connection_handler_per_thread.cc:304:13
    #18 0x000004554ed7 in pfs_spawn_thread(void*) storage/perfschema/pfs.cc:3067:3
    #19 0x7ffff4f97147 in start_thread (/lib64/libc.so.6+0x71147) (BuildId: 8f0d04c433960a3a1d01ec9c4612545d44a9a405)
    #20 0x7ffff501b0cb in __GI___clone3 (/lib64/libc.so.6+0xf50cb) (BuildId: 8f0d04c433960a3a1d01ec9c4612545d44a9a405)
[29 Jan 9:23] Tor Didriksen
Posted by developer:
 
the suggested fix also fails for UBSAN, since we have: Item_cache **m_row{nullptr};
[29 Jan 11:22] MySQL Verification Team
Thank you, Tor.