| Bug #37565 | Crash in Field_blob::pack | ||
|---|---|---|---|
| Submitted: | 21 Jun 2008 17:23 | Modified: | 26 May 2010 17:46 |
| Reporter: | Philip Stoev | Email Updates: | |
| Status: | Unsupported | Impact on me: | |
| Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
| Version: | 6.0 | OS: | Any |
| Assigned to: | Christopher Powers | CPU Architecture: | Any |
| Tags: | F_ONLINE ALTER | ||
[21 Jun 2008 20:50]
Philip Stoev
Test case for bug 37565
Attachment: bug37565.zip (application/x-zip-compressed, text), 2.25 KiB.
[21 Jun 2008 20:56]
Philip Stoev
To reproduce this bug, please place the .txt files in mysql-test and the .test files in mysql-test/t. Then run: $ perl ./mysql-test-run.pl --stress --stress-init-file=bug37565_init.txt \ --stress-test-file=bug37565_run.txt --stress-test-duration=7200 \ --stress-threads=10 Some observations: * No replication is involved, however the crash is in the code that handles the binary log; * This test performs ADD/DROP KEY/COLUMN concurrently with INSERT/REPLACE * This test uses ENGINE=Falcon to cause the crash. ENGINE=MyISAM causes a deadlock (to be filed as a separate bug) * About 10 minutes of runtime are required to cause the crash. This translates into a considerable quantity of ALTER TABLEs. * My very wild guess would be that some pointer creeps by a small ammount for every ALTER until it causes a segmentation fault. * Further reducing the number of participating queries did not result in a shorter time-to-crash.
[29 Apr 2009 11:15]
Philip Stoev
To reproduce under MTRv2: $ cd mysql-test $ touch r/bug37565_add.result $ touch r/bug37565_drop.result $ touch r/bug37565_init.result $ touch r/bug37565_insert_replace.result $ MTR_VERSION=1 perl ./mysql-test-run.pl --stress --stress-init-file=bug37565_init.txt --stress-test-file=bug37565_run.txt --stress-test-duration=7200 --stress-threads=10 Note that an unrelated Falcon crash may also happen, so please run the test case a few times.
[13 May 2009 14:04]
Luis Soares
During the analysis, I sometimes hit another failure (does not seem related to binary logging):
Output of mysqltest:
mysqltest: Logging to '/home/lsoares/Workspace/mysql-server/bugfix/b37565/6.0-rpl/mysql-test/var/stress/r/r3/bug37565_insert_replace.log'.
mysqltest: Results saved in '/home/lsoares/Workspace/mysql-server/bugfix/b37565/6.0-rpl/mysql-test/var/stress/r/r3/bug37565_insert_replace.result'.
mysqltest: Connecting to server localhost:3306 (socket /tmp/VGpWvelUGD/master.sock) as 'root', connection 'default', attempt 0 ...
mysqltest: ... Connected.
mysqltest: Start processing test commands from '/home/lsoares/Workspace/mysql-server/bugfix/b37565/6.0-rpl/mysql-test/var/stress/t/bug37565_insert_replace.test' ...
mysqltest: At line 16: query 'INSERT INTO view2 (t1_uuid) SELECT (SELECT t1_uuid FROM view1 WHERE t1_uuid = @uuid1)' failed: 2013: Lost connection to MySQL server during query
From core, I get the following stack trace:
(gdb) where
#0 0x00007f6e0e9fe1f6 in pthread_kill () from /lib/libpthread.so.0
#1 0x0000000000de7e7c in my_write_core (sig=11) at stacktrace.c:309
#2 0x00000000007717f5 in handle_segfault (sig=11) at mysqld.cc:2710
#3 <signal handler called>
#4 0x000000000068f8ac in Field::set_null (this=0x30, row_offset=0) at ../../sql/field.h:280
#5 0x0000000000a7b1e9 in StorageInterface::decodeRecord (this=0x39b1170, buf=0x39b1448 "�") at ha_falcon.cpp:3197
#6 0x0000000000a7f608 in StorageInterface::index_next (this=0x39b1170, buf=0x39b1448 "�") at ha_falcon.cpp:1893
#7 0x0000000000a7f888 in StorageInterface::index_read (this=0x39b1170, buf=0x39b1448 "�", keyBytes=0x399d9f0 "", key_len=39, find_flag=HA_READ_KEY_EXACT) at ha_falcon.cpp:1482
#8 0x00000000008d93a8 in handler::index_read_map (this=0x39b1170, buf=0x39b1448 "�", key=0x399d9f0 "", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at ../../sql/handler.h:1783
#9 0x00000000007fe35c in join_read_always_key (tab=0x399d750) at sql_select.cc:16960
#10 0x000000000080133d in sub_select (join=0x3968430, join_tab=0x399d750, end_of_records=false) at sql_select.cc:16280
#11 0x000000000080f0f9 in do_select (join=0x3968430, fields=0x7f6dfc3948f0, table=0x0, procedure=0x0) at sql_select.cc:15844
#12 0x00000000008290e1 in JOIN::exec (this=0x3968430) at sql_select.cc:2886
#13 0x0000000000823abb in mysql_select (thd=0x7f6dfc3928f8, rref_pointer_array=0x7f6dfc3949d0, tables=0x3676780, wild_num=0, fields=@0x7f6dfc3948f0, conds=0x3676f08, og_num=0, order=0x0, group=0x0,
having=0x0, proc_param=0x0, select_options=2149337600, result=0x399cbb8, unit=0x7f6dfc394380, select_lex=0x7f6dfc3947e8) at sql_select.cc:3067
#14 0x00000000008293ff in handle_select (thd=0x7f6dfc3928f8, lex=0x7f6dfc3942e0, result=0x399cbb8, setup_tables_done_option=0) at sql_select.cc:310
#15 0x0000000000783240 in execute_sqlcom_select (thd=0x7f6dfc3928f8, all_tables=0x3676780) at sql_parse.cc:4949
#16 0x0000000000784b35 in mysql_execute_command (thd=0x7f6dfc3928f8) at sql_parse.cc:2157
#17 0x000000000078d0c7 in mysql_parse (thd=0x7f6dfc3928f8, inBuf=0x3676340 "SELECT t1_uuid FROM view2 WHERE t1_uuid = @uuid1", length=48, found_semicolon=0x7f6e02db8b30) at sql_parse.cc:5964
#18 0x000000000078dcc0 in dispatch_command (command=COM_QUERY, thd=0x7f6dfc3928f8, packet=0x7f6dfc39d369 "", packet_length=48) at sql_parse.cc:1049
#19 0x000000000078f247 in do_command (thd=0x7f6dfc3928f8) at sql_parse.cc:731
#20 0x000000000077b571 in handle_one_connection (arg=0x7f6dfc3928f8) at sql_connect.cc:1146
#21 0x00007f6e0e9f93ba in start_thread () from /lib/libpthread.so.0
#22 0x00007f6e0d967fcd in clone () from /lib/libc.so.6
#23 0x0000000000000000 in ?? ()
[3 Jun 2009 22:15]
Christopher Powers
Related: Bug#39445 "Falcon: Update fails following online add column" A bit of explanation: 1) Falcon maintains a mapping of fields (columns) between the server and Falcon. 2) Falcon uses the field map to encode/decode records as they are passed to/from the server. 3) The test case invokes thousands of concurrent online ADD/DROP COLUMN/KEY operations. Each ALTER forces an update to the field map. 4) The test case also issues thousands of INSERT/DELETE operations concurrent with the ALTERs. Each INSERT/DELETE requires an encode/decode using the field map. If the field map goes out of sync, even briefly, then an error could certainly occur. I have not been able to reproduce the failure on the latest codebase, even after increasing the concurrent threads from 10 to 30. I can, however, tighten up the encode/decode operations to better guard against an inconsistent field map.
[3 Jun 2009 23:02]
Davi Arnaut
Christopher,
Which tree you used?
BTW, here goes the test case that i used to reproduce it some time ago:
=== modified file 'sql/sql_select.cc'
--- sql/sql_select.cc 2009-05-14 21:49:53 +0000
+++ sql/sql_select.cc 2009-05-20 16:10:13 +0000
@@ -31,7 +31,7 @@
#include "mysql_priv.h"
#include "sql_select.h"
#include "sql_cursor.h"
-
+#include "debug_sync.h"
#include <m_ctype.h>
#include <my_bit.h>
#include <hash.h>
@@ -3005,6 +3005,9 @@ mysql_select(THD *thd, Item ***rref_poin
select_lex->context.resolve_in_select_list= TRUE;
JOIN *join;
+
+ DEBUG_SYNC(thd, "mysql_select");
+
if (select_lex->join != 0)
{
join= select_lex->join;
=== modified file 'sql/sql_table.cc'
--- sql/sql_table.cc 2009-05-15 08:00:35 +0000
+++ sql/sql_table.cc 2009-05-20 15:58:24 +0000
@@ -6021,6 +6021,8 @@ mysql_fast_or_online_alter_table(THD *th
ha_alter_flags))
DBUG_RETURN(1);
+ DEBUG_SYNC(thd, "mysql_phase2");
+
/*
Make the metadata lock available to open_table() called to
reopen the table down the road.
Test case:
--source include/have_falcon.inc
connect (con1,localhost,root,,test,,);
connect (con2,localhost,root,,test,,);
connect (con3,localhost,root,,test,,);
CREATE TABLE inter1 (
t1_autoinc INTEGER NOT NULL AUTO_INCREMENT,
t1_uuid CHAR(36),
t2_uuid CHAR(36),
t1_blob MEDIUMTEXT,
PRIMARY KEY (t1_autoinc),
KEY (t1_uuid),
KEY (t2_uuid)
) ENGINE = Falcon;
INSERT INTO inter1 VALUES (1, 'aaa', 'bbb', 'ccc');
connection con1;
set debug_sync='mysql_select SIGNAL parked WAIT_FOR go';
--send select * from inter1;
connection default;
--echo connection: default
set debug_sync= 'now WAIT_FOR parked';
connection con2;
set debug_sync='mysql_phase2 SIGNAL go WAIT_FOR done';
--send ALTER TABLE inter1 ADD COLUMN filler TINYTEXT;
connection con1;
--reap
set debug_sync= 'now SIGNAL done';
connection con2;
--reap
[5 Jun 2009 5:40]
Christopher Powers
I have recreated the failure using the method described above, and expect to have a fix available within the next day.
[11 Jun 2009 15:56]
Kevin Lewis
This should not be sr54rc since it is a Falcon problem.
[24 Jun 2009 0:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76979 2741 Christopher Powers 2009-06-23 Bug #37565 "Crash in Field_blob::pack" Online ALTER ADD COLUMN creates a new record format that has more fields than the existing rows. Previously, the disparity between the old record formats and the newer, larger record format was handled strictly by StorageInterface::decodeRecord(), which set the undefined fields to NULL prior to returning the decoded record to the server. This was generally sufficient except when the older records were updated. Although Table::update() assigned the new format to the record and Record::setEncodedRecord() resized the field vector, the new fields in the record were never initialized to NULL. StorageInterface::encodeRecord() now initializes undefined fields to NULL prior to storing the encoded record. modified: storage/falcon/ha_falcon.cpp storage/falcon/ha_falcon.h
[14 Jan 2010 23:12]
Roel Van de Paar
[21 Jun 2008 22:56] Philip Stoev > ENGINE=MyISAM causes a deadlock (to be filed as a separate bug) Hi Philip, was a bug logged for this? Is InnoDB affected in any way?
[14 May 2010 8:07]
Philip Stoev
Roel, with next-mr, I am unable to reproduce a deadlock with either MyISAM or InnoDB. My guess is that the metadata locking feature in next-mr has resolved any deadlock issues.

Description: When running a workload containing various queries, mysqld crashed as follows: #0 0x00110416 in __kernel_vsyscall () #1 0x00581c78 in pthread_kill () from /lib/libpthread.so.0 #2 0x08449bb1 in write_core (sig=11) at stacktrace.c:302 #3 0x082a4070 in handle_segfault (sig=11) at mysqld.cc:2626 #4 <signal handler called> #5 0x00432f5a in memcpy () from /lib/libc.so.6 #6 0x0827ad5b in Field_blob::pack (this=0x9669ad0, to=0x976904a '╔' <repeats 200 times>..., from=0xa5a5a5a5 <Address 0xa5a5a5a5 out of bounds>, max_length=255, low_byte_first=true) at field.cc:7832 #7 0x083a1d18 in pack_row (table=0x958e7e0, cols=0x97ac998, row_data=0x9769020 "╔═\003", record=0x9669830 "і═\003") at rpl_record.cc:111 #8 0x08288cbc in THD::binlog_update_row (this=0xaef02238, table=0x958e7e0, is_trans=true, before_record=0x9669830 "і═\003", after_record=0x96697c8 "Ч═\003") at sql_class.cc:3501 #9 0x083d863d in Update_rows_log_event::binlog_row_logging_function (thd=0xaef02238, table=0x958e7e0, is_transactional=true, before_record=0x9669830 "і═\003", after_record=0x96697c8 "Ч═\003") at log_event.h:3652 #10 0x083cf3c3 in binlog_log_row (table=0x958e7e0, before_record=0x9669830 "і═\003", after_record=0x96697c8 "Ч═\003", log_func=0x83d8609 <Update_rows_log_event::binlog_row_logging_function(THD*, st_table*, bool, unsigned char const*, unsigned char const*)>) at handler.cc:5266 #11 0x083d02f0 in handler::ha_update_row (this=0x9669600, old_data=0x9669830 "і═\003", new_data=0x96697c8 "Ч═\003") at handler.cc:5350 #12 0x083452f2 in write_record (thd=0xaef02238, table=0x958e7e0, info=0xad444320) at sql_insert.cc:1520 #13 0x08349a3a in mysql_insert (thd=0xaef02238, table_list=0x9629c20, fields=@0xaef038a8, values_list=@0xaef038cc, update_fields=@0xaef038c0, update_values=@0xaef038b4, duplic=DUP_REPLACE, ignore=false) at sql_insert.cc:819 #14 0x082b7a6a in mysql_execute_command (thd=0xaef02238) at sql_parse.cc:2951 #15 0x082bdfd5 in mysql_parse (thd=0xaef02238, inBuf=0x9629ad0 "REPLACE INTO view1 (t1_autoinc, t1_uuid) VALUES (\n@id1,\n(SELECT t2_uuid FROM view2 WHERE t2_uuid = @uuid2)\n)", length=108, found_semicolon=0xad445260) at sql_parse.cc:5782 #16 0x082bea71 in dispatch_command (command=COM_QUERY, thd=0xaef02238, packet=0xaef50411 "REPLACE INTO view1 (t1_autoinc, t1_uuid) VALUES (\n@id1,\n(SELECT t2_uuid FROM view2 WHERE t2_uuid = @uuid2)\n)", packet_length=108) at sql_parse.cc:1059 #17 0x082bfd32 in do_command (thd=0xaef02238) at sql_parse.cc:732 #18 0x082ad446 in handle_one_connection (arg=0xaef02238) at sql_connect.cc:1134 #19 0x0057d32f in start_thread () from /lib/libpthread.so.0 #20 0x0049a27e in clone () from /lib/libc.so.6 The crash is here: (gdb) list 7827 Store the actual blob data, which will occupy 'length' bytes. 7828 */ 7829 if (length > 0) 7830 { 7831 get_ptr((uchar**) &from); 7832 memcpy(to+packlength, from,length); 7833 } 7834 ptr=save; // Restore org row pointer 7835 DBUG_DUMP("packed", to, packlength + length); 7836 DBUG_RETURN(to+packlength+length); (gdb) print to+packlength $1 = (uchar *) 0x976904b '╔' <repeats 200 times>... (gdb) print from $2 = (const uchar *) 0xa5a5a5a5 <Address 0xa5a5a5a5 out of bounds> (gdb) print length $3 = 165 How to repeat: If this crash is observed again, then a reproducible test case will be prepared. In the meantime, the core file and the mysqld binary are available.