Bug #56540 Exception (crash) in sql_show.cc during rqg_info_schema test on Windows
Submitted: 3 Sep 2010 14:27 Modified: 20 Nov 2010 17:49
Reporter: John Embretsen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Locking Severity:S1 (Critical)
Version:5.5.6-m3-Celosia OS:Windows
Assigned to: Konstantin Osipov CPU Architecture:Any
Tags: rqg_pb2

[3 Sep 2010 14:27] John Embretsen
Description:
During the Random Query Generator test "rqg_info_schema", which runs INFORMATION_SCHEMA queries in multiple clients at the same time, mysqld.exe throws an exception and a server crash is reported:

(...)
100903  9:26:42 [ERROR] G:/pb2/test/sb_1-2235555-1283494905.06/mysql-5.6.99-m5-win-x86-test/sql/debug/mysqld.exe: Can't find file: '.\z\2005@002d09@002d05@002020@003a05@003a38.frm' (errno: 22)
100903  9:27:39 - mysqld got exception 0xc0000005 ;
This could be because you hit a bug.

Backtrace from MySQL error log (mysql-next-mr-bugfixing):

mysqld.exe!?fill_schema_proc
mysqld.exe!?get_all_tables
mysqld.exe!?get_schema_tables_result
mysqld.exe!?exec
mysqld.exe!?mysql_select
mysqld.exe!?handle_select
mysqld.exe!?mysql_execute_command
mysqld.exe!?mysql_execute_command
mysqld.exe!?mysql_parse
mysqld.exe!?dispatch_command
mysqld.exe!?do_command
mysqld.exe!?do_handle_one_connection
mysqld.exe!handle_one_connection
mysqld.exe!pthread_create
mysqld.exe!my_strnncoll_gbk
mysqld.exe!my_strnncoll_gbk
kernel32.dll!GetModuleHandleA

thd->query at 0A409398=
SELECT * 
FROM (mysql . `db` AS table1 
  STRAIGHT_JOIN INFORMATION_SCHEMA . `VIEWS` AS table2 
  ON ( table2 . `CHECK_OPTION` = table1 . `Create_routine_priv` ) ) 
WHERE NOT ( table2 . `IS_UPDATABLE` = '2004-08-26 01:54:09' 
  OR table2 . `VIEW_DEFINITION` <> table1 . `User`)   
ORDER BY table1 . `Drop_priv` , 
         table1 . `Delete_priv` , 
         table1 . `Index_priv` , 
         table1 . `Drop_priv` , 
         table1 . `Execute_priv` , 
         table2 . `SECURITY_TYPE` , 
         table1 . `Insert_priv` 
LIMIT 2
thd->thread_id=18

CDB debugger says:

 FAULTING_IP: 
 mysqld!get_schema_views_record+44c [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\sql\sql_show.cc @ 4960]
 0046652c 8b517c          mov     edx,dword ptr [ecx+7Ch]
 
 EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
 ExceptionAddress: 0046652c (mysqld!get_schema_views_record+0x0000044c)
    ExceptionCode: c0000005 (Access violation)
   ExceptionFlags: 00000000
 NumberParameters: 2
     Parameter[0]: 00000000
     Parameter[1]: ddddde59
 Attempt to read from address ddddde59

STACK_TEXT:  
mysqld!get_schema_views_record+0x44c [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\sql\sql_show.cc @ 4960]
mysqld!get_all_tables+0x8d9 [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\sql\sql_show.cc @ 3665]
mysqld!get_schema_tables_result+0x274 [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\sql\sql_show.cc @ 6679]
mysqld!JOIN::exec+0x612 [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\sql\sql_select.cc @ 2675]
mysqld!mysql_select+0x2a4 [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\sql\sql_select.cc @ 3340]
mysqld!handle_select+0x171 [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\sql\sql_select.cc @ 323]
mysqld!execute_sqlcom_select+0x2f8 [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\sql\sql_parse.cc @ 4562]
mysqld!mysql_execute_command+0x723 [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\sql\sql_parse.cc @ 2166]
mysqld!mysql_parse+0x177 [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\sql\sql_parse.cc @ 5591]
mysqld!dispatch_command+0x77a [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\sql\sql_parse.cc @ 1130]
mysqld!do_command+0x209 [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\sql\sql_parse.cc @ 802]
mysqld!do_handle_one_connection+0x1df [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\sql\sql_connect.cc @ 1192]
mysqld!handle_one_connection+0x2c [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\sql\sql_connect.cc @ 1131]
mysqld!pthread_start+0x48 [g:\pb2\build\sb_2-2232641-1283444893.79\mysql-5.6.99-m5-win-x86\mysys\my_winthread.c @ 61]
mysqld!_callthreadstartex+0x53 [f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c @ 348]
mysqld!_threadstartex+0xa4 [f:\dd\vctools\crt_bld\self_x86\crt\src\threadex.c @ 331]
kernel32!BaseThreadStart+0x34
(...)

How to repeat:
set CODE=path\to\code

bzr branch lp:randgen
cd randgen

perl runall.pl --grammar=conf/runtime/information_schema.yy --queries=100000 --threads=10 --duration=300 --reporters=Deadlock,ErrorLog,Backtrace,Shutdown --basedir=G:/pb2/scripts/tmp-qauser/mysql-next-mr-bugfixing --mysqld=--log-output=file --mysqld=--loose-table-lock-wait-timeout=1 --mysqld=--loose-lock-wait-timeout=1 --mysqld=--loose-innodb-lock-wait-timeout=1

Observed on Windows Server 2003.
[13 Sep 2010 22:49] Christopher Powers
The failure is caused by a debug assertion raised in ha_partition::index_read_idx_map():

==========
int ha_partition::index_read_idx_map(...)
{
  [...]
  get_partition_set(table, buf, index, &m_start_key, &m_part_spec);

  /* How can it be more than one partition with the current use? */
  DBUG_ASSERT(m_part_spec.start_part == m_part_spec.end_part);

  for (part= m_part_spec.start_part; part <= m_part_spec.end_part; part++)
  {
    if (bitmap_is_set(&(m_part_info->used_partitions), part))
==========

The key is assumed to reside within a range of partitions defined by m_part_spec.start_part and end_part.

get_partition_set() looks for the key within this range of partitions. If the key cannot be found, it sets start_part = end_part + 1.

In this case, start_part == end_part == 0, but get_partitions_set() does not find the key, so it sets start_part = 1 which triggers the fatal assertion.

Is this a valid assertion?
[14 Sep 2010 8:18] Mattias Jonsson
The assertion is not correct, OK to change into:
    /* How can it be more than one partition with the current use? */
    DBUG_ASSERT(m_part_spec.start_part >= m_part_spec.end_part);

i.e. changing '==' to '>='.

Note that this is a DBUG_ASSERT only, so it does not affect a production binary.
[14 Sep 2010 8:35] Mattias Jonsson
the assert is already filed as bug#55146.
[21 Sep 2010 20:50] Christopher Powers
The debug assertion is addressed in Bug#55146. With this fix applied, however, I am still able to recreate a crash in mysql-5.5-bugfixing on WinXP x64, similar to the one originally reported.

The crash appears to be caused by a race condition in get_schema_views_record(), where multiple threads are iterating through the same view.

I will continue to debug this issue, and will post status updates and debugging details as they become available.
[24 Sep 2010 18:50] Christopher Powers
The crash on Windows XP 64 has a call stack similar to that of first one reported:

get_schema_views_record(thd=0x000000000ce61ec0, tables=0x000000000f077000,...)
get_all_tables(thd=0x000000000ce61ec0, tables=0x000000000cd779f0,...)
get_schema_tables_result(join=0x000000000d3fe600, executed_place=PROCESSED_BY_JOIN_EXEC)
JOIN::exec(...)
mysql_select(...)
[...]

In this case, the query was a complex join, however, I think this is incidental: 

SELECT * FROM (mysql.time_zone_leap_second AS table1 STRAIGHT_JOIN INFORMATION_SCHEMA.VIEWS AS table2 ON (table2.TABLE_SCHEMA = table1.Correction))
WHERE table2.TABLE_CATALOG != table1.Transition_time
ORDER BY table2.IS_UPDATABLE, table1.Transition_time, table2.CHECK_OPTION, table2.CHARACTER_SET_CLIENT, table2.TABLE_NAME;

The RQG test case conducts a concurrent mix of similar queries across 10 threads, in addition to various CREATE TABLE/VIEW operations.

The crash occurs because of a corrupted Field object associated with table 'r' encountered here:

get_schema_views_record(...)
[...]
  List<Item> *fields= &tables->view->select_lex.item_list;
  List_iterator<Item> it(*fields);
  Item *item;
  Item_field *field;
  /*  check that at least one column in view is updatable  */
  while ((item= it++))
  {
    if ((field= item->filed_for_view_update()) && field->field &&
        !field->field->table->pos_in_table_list->schema_table)
    {
      updatable_view= 1;
      break;
    }
  }
[...]

Note the invalid content of 'field->field':

Local Variables
+ item   0x000000000ea29eb0 {field=0x000000000ce30050 ...}
- field  0x000000000ea29eb0 {field=0x000000000ce30050 ...}
  + Item_ident {orig_db_name="test" orig_table_name="d" orig_field_name="pk" ...}
  - field       0x000000000ce30050 {ptr=0xdddddddddddddddd null_ptr=0xdddddddddddddddd...}
    + __vfptr         0xdddddddddddddddd
    + ptr             0xdddddddddddddddd
    + null_ptr        0xdddddddddddddddd
    + table           0xdddddddddddddddd
    + orig_table      0xdddddddddddddddd
    + table_name      0xdddddddddddddddd
    + field_name      0xdddddddddddddddd
    + comment         {str=0xdddddddddddddddd length=0xdddddddddddddddd}
    + key_start       {map=0xdddddddddddddddd}
    + part_of_key     {map=0xdddddddddddddddd}
    + part_of_key_not_clustered  {map=0xdddddddddddddddd}
    + part_of_sortkey {map=0xdddddddddddddddd}
      unireg_check     0xdddddddd
      field_length     0xdddddddd
      flags            0xdddddddd
      field_index      0xdddd
      null_bit         0xdd 'Ý'
      is_created_from_null_item  true
  + result_field    0x000000000ce30050 {ptr=0xdddddddddddddddd ...
  + item_equal      0x0000000000000000 {fields={...} const_item=??? eval_item=??? ...}
    no_const_subst   false
    have_privileges  0x00000000
    any_privileges   false
+ fields         0x000000000f077ec8     List<Item> *
[...]
+ db_name        0x000000000e6bc4c8 {str= "test"...}
+ table_name     0x000000000e6bc498 {str= "r" ...}
+ definer        0x000000000e6bb800 "ÌÌÌÌÌÌÌÌÌÌÌÌÌ...ÌÌ"

The larger issue is that table 'r' is in the table list and being processed, but table 'r' is invalid, reason unknown. The tabe list is generated by get_all_tables(), which simply performs a file system enumeration of the database directory, then dutifully iterates through the resulting list of table names, assuming that each name maps to a valid table.

A clue to the source of the corrupt table 'r' is found in the RQG output:

"Query: CREATE TABLE r LIKE `aa` failed: 1356 View 'test.r' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them."

Note that after a restart and recovery, two other tables were found to be similarly corrupt.

Next actions to stake:

1. Recreate error, examine mysql query log in detail.
2. Recreate crash on Windows XP 32.

Questions:
Has the InnoDB/MySQL metadata gone out of sync?
[30 Sep 2010 23:51] Christopher Powers
Confirmed failure in mysql-5.5-bugfixing on Windows XP 32.
[2 Oct 2010 0:57] Christopher Powers
Also fails on:
  - mysql-5.5-runtime
  - mysql-next-mr-runtime

Next: Examine query log (debug trace if necessary) to determine which concurrent operation left the table in an invalid (or transitional) state.

Possibly related: Bug#49938, "Failing assertion: inode or deadlock in fsp/fsp0fsp.c", where InnoDB is doing a TRUNCATE concurrently with an I_S query.
[2 Oct 2010 1:09] Christopher Powers
Possibly related:
   Bug#50561, "ALTER PARTITIONS does not have adequate lock, breaks with concurrent I_S query"
which includes a revealing comment from Mikael R.:
   "Don't accept a complex change in ALTER TABLE partitioning code to support a 'cheating' code in 
   INFORMATION_SCHEMA, change I_S to behave as a proper citizen instead. AFAIU this is also the 
   behaviour which MDL will apply in 5.5."

Hmmm. In addition, Bug#56541, "Assertion table->db_stat failed in sql_base.cc::open_table() during I_S queries" claims to be a regression caused by the fix for Bug#50561. So...this bug could be a regression, but clearly the I_S is behaving badly.
[7 Oct 2010 10:18] Davi Arnaut
Reduced test case:

create table t1 (a int, b int);
create view v1 as select t1.a, t1.b from t1;
alter table t1 change b c int;
# set table_cache_size=0 so table is not cached.
select * from information_schema.views;
[14 Oct 2010 16:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120787

3163 Konstantin Osipov	2010-10-14
      A fix and a test case for Bug#56540 "Exception (crash) in sql_show.cc 
      during rqg_info_schema test on Windows".
      
      Ensure we do not access freed memory when filling
      information_schema.views when one of the views
      could not be properly opened.
     @ mysql-test/r/information_schema.result
        Update results - a fix for Bug#56540.
     @ mysql-test/t/information_schema.test
        Add a test case for Bug#56540
     @ sql/sql_base.cc
        Produce an error when we return an error.
        This directs get_all_tables() to the execution
        branch which doesn't involve 'process_table()'
        when we no table/view was opened.
     @ sql/sql_show.cc
        Do not try to access underlying table fields
        when opening of a view failed. The underlying
        table is closed in that case, and accessing
        its fields may lead to dereferencing a damaged 
        pointer.
[14 Oct 2010 16:56] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120791

3163 Konstantin Osipov	2010-10-14
      A fix and a test case for Bug#56540 "Exception (crash) in 
      sql_show.cc during rqg_info_schema test on Windows".
      
      Ensure we do not access freed memory when filling
      information_schema.views when one of the views
      could not be properly opened.
     @ mysql-test/r/information_schema.result
        Update results - a fix for Bug#56540.
     @ mysql-test/t/information_schema.test
        Add a test case for Bug#56540
     @ sql/sql_base.cc
        Push an error into the Diagnostics area
        when we return an error.
        This directs get_all_tables() to the execution
        branch which doesn't involve 'process_table()'
        when no table/view was opened.
     @ sql/sql_show.cc
        Do not try to access underlying table fields
        when opening of a view failed. The underlying
        table is closed in that case, and accessing
        its fields may lead to dereferencing a damaged 
        pointer.
[14 Oct 2010 16:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120792

3166 Konstantin Osipov	2010-10-14
      A fix and a test case for Bug#56540 "Exception (crash) in 
      sql_show.cc during rqg_info_schema test on Windows".
      
      Ensure we do not access freed memory when filling
      information_schema.views when one of the views
      could not be properly opened.
     @ mysql-test/r/information_schema.result
        Update results - a fix for Bug#56540.
     @ mysql-test/t/information_schema.test
        Add a test case for Bug#56540
     @ sql/sql_base.cc
        Push an error into the Diagnostics area
        when we return an error.
        This directs get_all_tables() to the execution
        branch which doesn't involve 'process_table()'
        when no table/view was opened.
     @ sql/sql_show.cc
        Do not try to access underlying table fields
        when opening of a view failed. The underlying
        table is closed in that case, and accessing
        its fields may lead to dereferencing a damaged 
        pointer.
[18 Oct 2010 12:16] Konstantin Osipov
Queued into 5.5.7.
[13 Nov 2010 16:14] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:alexander.nozdrin@oracle.com-20101113152450-2zzcm50e7i4j35v7) (merge vers: 5.6.1-m4) (pib:21)
[13 Nov 2010 16:40] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (pib:21)
[20 Nov 2010 17:49] Paul DuBois
Noted in 5.5.8 changelog.

The server could crash as a result of accessing freed memory when
populating INFORMATION_SCHEMA.VIEWS if a view could not be opened
properly.
[16 Dec 2010 22:25] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)