Bug #58003 Segfault on CHECKSUM TABLE performance_schema.EVENTS_WAITS_HISTORY_LONG EXTENDED
Submitted: 4 Nov 2010 23:32 Modified: 6 Jan 2011 1:05
Reporter: Elena Stepanova Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Performance Schema Severity:S2 (Serious)
Version:5.6.99-m5-debug OS:Any
Assigned to: Marc ALFF CPU Architecture:Any
Tags: checksum table

[4 Nov 2010 23:32] Elena Stepanova
Description:
There was a very similar bug#56761, but it was closed as fixed some time ago, while we are still getting these errors.

5.6.99-m5-debug-log
mysqld got signal 11 ;

  [1] _lwp_kill(0x0, 0xffffffffffffffff, 0xfffffffffffffff1, 0x0, 0xffffffff7f04d574, 0xa), at 0xffffffff7eec8450
  [2] my_write_core(sig = 11), line 328 in "stacktrace.c"
  [3] handle_segfault(sig = 11), line 2507 in "mysqld.cc"
  [4] __sighndlr(0xb, 0x0, 0xffffffff5663bc60, 0x1001d9030, 0x12, 0x0), at 0xffffffff7eec3538
  ---- called from signal handler with signal 11 (SIGSEGV) ------
  [5] memcpy(0x11fc84018, 0xfffffffee037bfe8, 0x19, 0x18, 0x61, 0x0), at 0xffffffff7effcaf4
=>[6] table_events_waits_common::make_row(this = 0x11fc83f70, thread_own_wait = false, pfs_thread = 0x115deeb68, wait = 0x1171644f0), line 275 in "table_events_waits.cc"
  [7] table_events_waits_history_long::rnd_next(this = 0x11fc83f70), line 781 in "table_events_waits.cc"
  [8] ha_perfschema::rnd_next(this = 0x11cc12120, buf = 0x11cc12510 "\xf0\xfe\xd2&"), line 290 in "ha_perfschema.cc"
  [9] handler::ha_rnd_next(this = 0x11cc12120, buf = 0x11cc12510 "\xf0\xfe\xd2&"), line 2197 in "handler.cc"
  [10] mysql_checksum_table(thd = 0x121961190, tables = 0x122d83d20, check_opt = 0x121963a18), line 7321 in "sql_table.cc"
  [11] mysql_execute_command(thd = 0x121961190), line 2710 in "sql_parse.cc"
  [12] mysql_parse(thd = 0x121961190, rawbuf = 0x122d83ba0 "CHECKSUM TABLE performance_schema.EVENTS_WAITS_HISTORY_LONG EXTENDED", length = 68U, parser_state = 0xffffffff5663f7f0), line 5537 in "sql_parse.cc"
  [13] dispatch_command(command = COM_QUERY, thd = 0x121961190, packet = 0x123331301 "", packet_length = 68U), line 1056 in "sql_parse.cc"
  [14] do_command(thd = 0x121961190), line 796 in "sql_parse.cc"
  [15] do_handle_one_connection(thd_arg = 0x121961190), line 745 in "sql_connect.cc"
  [16] handle_one_connection(arg = 0x121961190), line 684 in "sql_connect.cc"
  [17] pfs_spawn_thread(arg = 0x121611830), line 1078 in "pfs.cc"

thd->query at 122d83ba0 = CHECKSUM TABLE performance_schema.EVENTS_WAITS_HISTORY_LONG EXTENDED
thd->thread_id=9883
thd->killed=NOT_KILLED

(dbx) print m_row.m_object_name_length
m_row.m_object_name_length = 25U

(dbx) print m_row.m_object_name
m_row.m_object_name = "tb0_logsng2e/pb2/test/sb_1-2480006-1288750780.53/mysql-5.6.99-m5-solaris10-sparc-64bit-test/vardirs/02_load_MBR_MyISAM_var/master-data/table_logs/tb0_logs.MYDchild.MYD81.frm2#P#part3.MYD"

255:  case WAIT_CLASS_TABLE:
256:    if (wait->m_object_type == OBJECT_TYPE_TABLE)
257:    {
258:      m_row.m_object_type= "TABLE";
259:      m_row.m_object_type_length= 5;
260:    }
261:    else
262:    {
263:      m_row.m_object_type= "TEMPORARY TABLE";
264:      m_row.m_object_type_length= 15;
265:    }
266:    m_row.m_object_schema_length= wait->m_schema_name_length;
267:    if (unlikely((m_row.m_object_schema_length == 0) ||
268:                 (m_row.m_object_schema_length > sizeof(m_row.m_object_schema))))
269:      return;
270:    memcpy(m_row.m_object_schema, wait->m_schema_name, m_row.m_object_schema_length);
271:    m_row.m_object_name_length= wait->m_object_name_length;
272:    if (unlikely((m_row.m_object_name_length == 0) ||
273:                 (m_row.m_object_name_length > sizeof(m_row.m_object_name))))
274:      return;
275:    memcpy(m_row.m_object_name, wait->m_object_name, m_row.m_object_name_length);
276:    safe_class= &global_table_class;
277:    break;
278:  case WAIT_CLASS_FILE:

How to repeat:
Happens sporadically in system stress tests; based on Marc's problem analysis in bug#56761, I guess we cannot hope for a deterministic functional test case.
[5 Nov 2010 6:12] Marc ALFF
Analysis
========

Based on the new observations reported for this issue,
the current situation seems to be:

1) The crash happened in:
memcpy(m_row.m_object_name, wait->m_object_name, m_row.m_object_name_length);
for WAIT_CLASS_TABLE

2) m_row.m_object_name_length is a valid length and within ranges,
so the bug is not related to the object length, which passed the sanitize checks.

3) m_row.m_object_name is a valid region of memory, which could be printed under the debugger.

The content of m_row.m_object_name shows multiple strings printed without a terminating 0, resulting in <...>.FRM<...>.FRM<...>.FRM for example for file names.
This is actually expected, and per design, so there is nothing wrong here.

4) The crash was a signal 11 (segmentation violation).

Given that the arguments given to memcpy are:
- aligned on 1 byte (char*), this is not a byte alignment problem.

Based on this, the remaining possible root cause is that the wait->m_object_name pointer itself is corrupted, and does not point to a valid region of memory.

Considering that this crash has been consistently reported for TABLE EVENTS_WAITS_HISTORY_LONG, but never for TABLE EVENTS_WAITS_CURRENT,
I suspect that the root cause is located in copy_events_waits().
The code uses a memcpy, which may copy different bytes of the 64 bits wait->m_object_name pointer at different times, leaving the pointer to an unsafe state.

The solution is to sanitize the data even more, to make sure that wait->m_object_schema and wait->m_object_name are valid before de referencing these pointers.
[11 Nov 2010 11:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/123590

3120 Marc Alff	2010-11-11
      Bug#58003 Segfault on CHECKSUM TABLE performance_schema.EVENTS_WAITS_HISTORY_LONG EXTENDED
      
      This fix is a follow up on the fix for similar issue 56761.
      
      When sanitizing data read from the events_waits_history_long table,
      the code needs also to sanitize the schema_name / object_name / file_name pointers,
      because such pointers could also hold invalid values.
      Checking the string length alone was required but not sufficient.
      
      This fix verifies that:
      - the table schema and table name used in table io events
      - the file name used in file io events
      are valid pointers before dereferencing these pointers.
[15 Nov 2010 2:13] Christopher Powers
Patch approved.
[16 Nov 2010 6:45] Marc ALFF
Patch queued into:
- mysql-5.5-bugteam
- mysql-trunk-bugfixing
[5 Dec 2010 12:38] Bugs System
Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (version source revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (merge vers: 5.6.1) (pib:23)
[11 Dec 2010 17:01] Paul DuBois
But not present in any 5.6.x release.

Setting report to Need Merge pending push into 5.5.x.
[16 Dec 2010 22:28] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)
[4 Jan 2011 8:17] Marc ALFF
Fix present in mysql-5.5.9, in branch mysql-5.5
[6 Jan 2011 1:05] Paul DuBois
Noted in 5.5.8 changelog.

The server could crash inside memcpy() when reading certain
Performance Schema tables.