MySQL Bugs: #37565: Crash in Field

Bug #37565	Crash in Field_blob::pack
Submitted:	21 Jun 2008 17:23	Modified:	26 May 2010 17:46
Reporter:	Philip Stoev	Email Updates:
Status:	Unsupported	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S1 (Critical)
Version:	6.0	OS:	Any
Assigned to:	Christopher Powers	CPU Architecture:	Any
Tags:	F_ONLINE ALTER

Description:
When running a workload containing various queries, mysqld crashed as follows:

#0  0x00110416 in __kernel_vsyscall ()
#1  0x00581c78 in pthread_kill () from /lib/libpthread.so.0
#2  0x08449bb1 in write_core (sig=11) at stacktrace.c:302
#3  0x082a4070 in handle_segfault (sig=11) at mysqld.cc:2626
#4  <signal handler called>
#5  0x00432f5a in memcpy () from /lib/libc.so.6
#6  0x0827ad5b in Field_blob::pack (this=0x9669ad0, to=0x976904a '╔' <repeats 200 times>..., from=0xa5a5a5a5 <Address 0xa5a5a5a5 out of bounds>,
    max_length=255, low_byte_first=true) at field.cc:7832
#7  0x083a1d18 in pack_row (table=0x958e7e0, cols=0x97ac998, row_data=0x9769020 "╔═\003", record=0x9669830 "і═\003") at rpl_record.cc:111
#8  0x08288cbc in THD::binlog_update_row (this=0xaef02238, table=0x958e7e0, is_trans=true, before_record=0x9669830 "і═\003", after_record=0x96697c8 "Ч═\003")
    at sql_class.cc:3501
#9  0x083d863d in Update_rows_log_event::binlog_row_logging_function (thd=0xaef02238, table=0x958e7e0, is_transactional=true,
    before_record=0x9669830 "і═\003", after_record=0x96697c8 "Ч═\003") at log_event.h:3652
#10 0x083cf3c3 in binlog_log_row (table=0x958e7e0, before_record=0x9669830 "і═\003", after_record=0x96697c8 "Ч═\003",
    log_func=0x83d8609 <Update_rows_log_event::binlog_row_logging_function(THD*, st_table*, bool, unsigned char const*, unsigned char const*)>)
    at handler.cc:5266
#11 0x083d02f0 in handler::ha_update_row (this=0x9669600, old_data=0x9669830 "і═\003", new_data=0x96697c8 "Ч═\003") at handler.cc:5350
#12 0x083452f2 in write_record (thd=0xaef02238, table=0x958e7e0, info=0xad444320) at sql_insert.cc:1520
#13 0x08349a3a in mysql_insert (thd=0xaef02238, table_list=0x9629c20, fields=@0xaef038a8, values_list=@0xaef038cc, update_fields=@0xaef038c0,
    update_values=@0xaef038b4, duplic=DUP_REPLACE, ignore=false) at sql_insert.cc:819
#14 0x082b7a6a in mysql_execute_command (thd=0xaef02238) at sql_parse.cc:2951
#15 0x082bdfd5 in mysql_parse (thd=0xaef02238,
    inBuf=0x9629ad0 "REPLACE INTO view1 (t1_autoinc, t1_uuid) VALUES (\n@id1,\n(SELECT t2_uuid FROM view2 WHERE t2_uuid = @uuid2)\n)", length=108,
    found_semicolon=0xad445260) at sql_parse.cc:5782
#16 0x082bea71 in dispatch_command (command=COM_QUERY, thd=0xaef02238,
    packet=0xaef50411 "REPLACE INTO view1 (t1_autoinc, t1_uuid) VALUES (\n@id1,\n(SELECT t2_uuid FROM view2 WHERE t2_uuid = @uuid2)\n)", packet_length=108)
    at sql_parse.cc:1059
#17 0x082bfd32 in do_command (thd=0xaef02238) at sql_parse.cc:732
#18 0x082ad446 in handle_one_connection (arg=0xaef02238) at sql_connect.cc:1134
#19 0x0057d32f in start_thread () from /lib/libpthread.so.0
#20 0x0049a27e in clone () from /lib/libc.so.6

The crash is here:

(gdb) list
7827        Store the actual blob data, which will occupy 'length' bytes.
7828       */
7829      if (length > 0)
7830      {
7831        get_ptr((uchar**) &from);
7832        memcpy(to+packlength, from,length);
7833      }
7834      ptr=save;                                     // Restore org row pointer
7835      DBUG_DUMP("packed", to, packlength + length);
7836      DBUG_RETURN(to+packlength+length);

(gdb) print to+packlength
$1 = (uchar *) 0x976904b '╔' <repeats 200 times>...
(gdb) print from
$2 = (const uchar *) 0xa5a5a5a5 <Address 0xa5a5a5a5 out of bounds>
(gdb) print length
$3 = 165

How to repeat:
If this crash is observed again, then a reproducible test case will be prepared. In the meantime, the core file and the mysqld binary are available.

Test case for bug 37565

Attachment: bug37565.zip (application/x-zip-compressed, text), 2.25 KiB.

To reproduce this bug, please place the .txt files in mysql-test and the .test files in mysql-test/t. Then run:

$ perl ./mysql-test-run.pl --stress --stress-init-file=bug37565_init.txt \
--stress-test-file=bug37565_run.txt --stress-test-duration=7200 \
--stress-threads=10

Some observations:

* No replication is involved, however the crash is in the code that handles the binary log;
* This test performs ADD/DROP KEY/COLUMN concurrently with INSERT/REPLACE
* This test uses ENGINE=Falcon to cause the crash. ENGINE=MyISAM causes a deadlock (to be filed as a separate bug)
* About 10 minutes of runtime are required to cause the crash. This translates into a considerable quantity of ALTER TABLEs.
* My very wild guess would be that some pointer creeps by a small ammount for every ALTER until it causes a segmentation fault.
* Further reducing the number of participating queries did not result in a shorter time-to-crash.

To reproduce under MTRv2:

$ cd mysql-test
$ touch r/bug37565_add.result
$ touch r/bug37565_drop.result
$ touch r/bug37565_init.result
$ touch r/bug37565_insert_replace.result

$ MTR_VERSION=1 perl ./mysql-test-run.pl --stress --stress-init-file=bug37565_init.txt --stress-test-file=bug37565_run.txt --stress-test-duration=7200 --stress-threads=10

Note that an unrelated Falcon crash may also happen, so please run the test case a few times.

During the analysis, I sometimes hit another failure (does not seem related to binary logging):

   Output of mysqltest:
   mysqltest: Logging to '/home/lsoares/Workspace/mysql-server/bugfix/b37565/6.0-rpl/mysql-test/var/stress/r/r3/bug37565_insert_replace.log'.
 mysqltest: Results saved in '/home/lsoares/Workspace/mysql-server/bugfix/b37565/6.0-rpl/mysql-test/var/stress/r/r3/bug37565_insert_replace.result'.
 mysqltest: Connecting to server localhost:3306 (socket /tmp/VGpWvelUGD/master.sock) as 'root', connection 'default', attempt 0 ...
 mysqltest: ... Connected.
 mysqltest: Start processing test commands from '/home/lsoares/Workspace/mysql-server/bugfix/b37565/6.0-rpl/mysql-test/var/stress/t/bug37565_insert_replace.test' ...
 mysqltest: At line 16: query 'INSERT INTO view2 (t1_uuid) SELECT (SELECT t1_uuid FROM view1 WHERE t1_uuid = @uuid1)' failed: 2013: Lost connection to MySQL server during query

From core, I get the following stack trace:

(gdb) where
#0  0x00007f6e0e9fe1f6 in pthread_kill () from /lib/libpthread.so.0
#1  0x0000000000de7e7c in my_write_core (sig=11) at stacktrace.c:309
#2  0x00000000007717f5 in handle_segfault (sig=11) at mysqld.cc:2710
#3  <signal handler called>
#4  0x000000000068f8ac in Field::set_null (this=0x30, row_offset=0) at ../../sql/field.h:280
#5  0x0000000000a7b1e9 in StorageInterface::decodeRecord (this=0x39b1170, buf=0x39b1448 "�") at ha_falcon.cpp:3197
#6  0x0000000000a7f608 in StorageInterface::index_next (this=0x39b1170, buf=0x39b1448 "�") at ha_falcon.cpp:1893
#7  0x0000000000a7f888 in StorageInterface::index_read (this=0x39b1170, buf=0x39b1448 "�", keyBytes=0x399d9f0 "", key_len=39, find_flag=HA_READ_KEY_EXACT) at ha_falcon.cpp:1482
#8  0x00000000008d93a8 in handler::index_read_map (this=0x39b1170, buf=0x39b1448 "�", key=0x399d9f0 "", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at ../../sql/handler.h:1783
#9  0x00000000007fe35c in join_read_always_key (tab=0x399d750) at sql_select.cc:16960
#10 0x000000000080133d in sub_select (join=0x3968430, join_tab=0x399d750, end_of_records=false) at sql_select.cc:16280
#11 0x000000000080f0f9 in do_select (join=0x3968430, fields=0x7f6dfc3948f0, table=0x0, procedure=0x0) at sql_select.cc:15844
#12 0x00000000008290e1 in JOIN::exec (this=0x3968430) at sql_select.cc:2886
#13 0x0000000000823abb in mysql_select (thd=0x7f6dfc3928f8, rref_pointer_array=0x7f6dfc3949d0, tables=0x3676780, wild_num=0, fields=@0x7f6dfc3948f0, conds=0x3676f08, og_num=0, order=0x0, group=0x0, 
    having=0x0, proc_param=0x0, select_options=2149337600, result=0x399cbb8, unit=0x7f6dfc394380, select_lex=0x7f6dfc3947e8) at sql_select.cc:3067
#14 0x00000000008293ff in handle_select (thd=0x7f6dfc3928f8, lex=0x7f6dfc3942e0, result=0x399cbb8, setup_tables_done_option=0) at sql_select.cc:310
#15 0x0000000000783240 in execute_sqlcom_select (thd=0x7f6dfc3928f8, all_tables=0x3676780) at sql_parse.cc:4949
#16 0x0000000000784b35 in mysql_execute_command (thd=0x7f6dfc3928f8) at sql_parse.cc:2157
#17 0x000000000078d0c7 in mysql_parse (thd=0x7f6dfc3928f8, inBuf=0x3676340 "SELECT t1_uuid FROM view2 WHERE t1_uuid = @uuid1", length=48, found_semicolon=0x7f6e02db8b30) at sql_parse.cc:5964
#18 0x000000000078dcc0 in dispatch_command (command=COM_QUERY, thd=0x7f6dfc3928f8, packet=0x7f6dfc39d369 "", packet_length=48) at sql_parse.cc:1049
#19 0x000000000078f247 in do_command (thd=0x7f6dfc3928f8) at sql_parse.cc:731
#20 0x000000000077b571 in handle_one_connection (arg=0x7f6dfc3928f8) at sql_connect.cc:1146
#21 0x00007f6e0e9f93ba in start_thread () from /lib/libpthread.so.0
#22 0x00007f6e0d967fcd in clone () from /lib/libc.so.6
#23 0x0000000000000000 in ?? ()

Related: Bug#39445 "Falcon: Update fails following online add column"

A bit of explanation:

1) Falcon maintains a mapping of fields (columns) between the server and Falcon.

2) Falcon uses the field map to encode/decode records as they are passed to/from the server.

3) The test case invokes thousands of concurrent online ADD/DROP COLUMN/KEY operations. Each ALTER forces an update to the field map.

4) The test case also issues thousands of INSERT/DELETE operations concurrent with the ALTERs. Each INSERT/DELETE requires an encode/decode using the field map.

If the field map goes out of sync, even briefly, then an error could certainly occur.

I have not been able to reproduce the failure on the latest codebase, even after increasing the concurrent threads from 10 to 30.

I can, however, tighten up the encode/decode operations to better guard against an inconsistent field map.

Christopher,

Which tree you used?

BTW, here goes the test case that i used to reproduce it some time ago:

=== modified file 'sql/sql_select.cc'
--- sql/sql_select.cc    2009-05-14 21:49:53 +0000
+++ sql/sql_select.cc    2009-05-20 16:10:13 +0000
@@ -31,7 +31,7 @@
 #include "mysql_priv.h"
 #include "sql_select.h"
 #include "sql_cursor.h"
-
+#include "debug_sync.h"
 #include <m_ctype.h>
 #include <my_bit.h>
 #include <hash.h>
@@ -3005,6 +3005,9 @@ mysql_select(THD *thd, Item ***rref_poin

   select_lex->context.resolve_in_select_list= TRUE;
   JOIN *join;
+
+  DEBUG_SYNC(thd, "mysql_select");
+
   if (select_lex->join != 0)
   {
     join= select_lex->join;

=== modified file 'sql/sql_table.cc'
--- sql/sql_table.cc    2009-05-15 08:00:35 +0000
+++ sql/sql_table.cc    2009-05-20 15:58:24 +0000
@@ -6021,6 +6021,8 @@ mysql_fast_or_online_alter_table(THD *th
                                         ha_alter_flags))
       DBUG_RETURN(1);

+    DEBUG_SYNC(thd, "mysql_phase2");
+
     /*
       Make the metadata lock available to open_table() called to
       reopen the table down the road.

Test case:

--source include/have_falcon.inc

connect (con1,localhost,root,,test,,);
connect (con2,localhost,root,,test,,);
connect (con3,localhost,root,,test,,);

CREATE TABLE inter1 (
  t1_autoinc INTEGER NOT NULL AUTO_INCREMENT,
  t1_uuid CHAR(36),
  t2_uuid CHAR(36),
  t1_blob MEDIUMTEXT,
  PRIMARY KEY (t1_autoinc),
  KEY (t1_uuid),
  KEY (t2_uuid)
) ENGINE = Falcon;

INSERT INTO inter1 VALUES (1, 'aaa', 'bbb', 'ccc');

connection con1;
set debug_sync='mysql_select SIGNAL parked WAIT_FOR go';
--send select * from inter1;

connection default;
--echo connection: default
set debug_sync= 'now WAIT_FOR parked';

connection con2;
set debug_sync='mysql_phase2 SIGNAL go WAIT_FOR done';
--send ALTER TABLE inter1 ADD COLUMN filler TINYTEXT;

connection con1;
--reap
set debug_sync= 'now SIGNAL done';

connection con2;
--reap

I have recreated the failure using the method described above, and expect to have a fix available within the next day.

This should not be sr54rc since it is a Falcon problem.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/76979

2741 Christopher Powers	2009-06-23
      Bug #37565 "Crash in Field_blob::pack"
      
      Online ALTER ADD COLUMN creates a new record format that has
      more fields than the existing rows.
      
      Previously, the disparity between the old record formats and
      the newer, larger record format was handled strictly by
      StorageInterface::decodeRecord(), which set the undefined
      fields to NULL prior to returning the decoded record to
      the server. 
      
      This was generally sufficient except when the older records
      were updated. Although Table::update() assigned the new format
      to the record and Record::setEncodedRecord() resized the field
      vector, the new fields in the record were never initialized to NULL.
      
      StorageInterface::encodeRecord() now initializes undefined fields
      to NULL prior to storing the encoded record.
      modified:
        storage/falcon/ha_falcon.cpp
        storage/falcon/ha_falcon.h

[21 Jun 2008 22:56] Philip Stoev
> ENGINE=MyISAM causes a deadlock (to be filed as a separate bug)

Hi Philip, 
was a bug logged for this? Is InnoDB affected in any way?

Roel, with next-mr, I am unable to reproduce a deadlock with either MyISAM or InnoDB. My guess is that the metadata locking feature in next-mr has resolved any deadlock issues.