Bug #37565 | Crash in Field_blob::pack | ||
---|---|---|---|
Submitted: | 21 Jun 2008 17:23 | Modified: | 26 May 2010 17:46 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Unsupported | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
Version: | 6.0 | OS: | Any |
Assigned to: | Christopher Powers | CPU Architecture: | Any |
Tags: | F_ONLINE ALTER |
[21 Jun 2008 17:23]
Philip Stoev
[21 Jun 2008 20:50]
Philip Stoev
Test case for bug 37565
Attachment: bug37565.zip (application/x-zip-compressed, text), 2.25 KiB.
[21 Jun 2008 20:56]
Philip Stoev
To reproduce this bug, please place the .txt files in mysql-test and the .test files in mysql-test/t. Then run: $ perl ./mysql-test-run.pl --stress --stress-init-file=bug37565_init.txt \ --stress-test-file=bug37565_run.txt --stress-test-duration=7200 \ --stress-threads=10 Some observations: * No replication is involved, however the crash is in the code that handles the binary log; * This test performs ADD/DROP KEY/COLUMN concurrently with INSERT/REPLACE * This test uses ENGINE=Falcon to cause the crash. ENGINE=MyISAM causes a deadlock (to be filed as a separate bug) * About 10 minutes of runtime are required to cause the crash. This translates into a considerable quantity of ALTER TABLEs. * My very wild guess would be that some pointer creeps by a small ammount for every ALTER until it causes a segmentation fault. * Further reducing the number of participating queries did not result in a shorter time-to-crash.
[29 Apr 2009 11:15]
Philip Stoev
To reproduce under MTRv2: $ cd mysql-test $ touch r/bug37565_add.result $ touch r/bug37565_drop.result $ touch r/bug37565_init.result $ touch r/bug37565_insert_replace.result $ MTR_VERSION=1 perl ./mysql-test-run.pl --stress --stress-init-file=bug37565_init.txt --stress-test-file=bug37565_run.txt --stress-test-duration=7200 --stress-threads=10 Note that an unrelated Falcon crash may also happen, so please run the test case a few times.
[13 May 2009 14:04]
Luis Soares
During the analysis, I sometimes hit another failure (does not seem related to binary logging): Output of mysqltest: mysqltest: Logging to '/home/lsoares/Workspace/mysql-server/bugfix/b37565/6.0-rpl/mysql-test/var/stress/r/r3/bug37565_insert_replace.log'. mysqltest: Results saved in '/home/lsoares/Workspace/mysql-server/bugfix/b37565/6.0-rpl/mysql-test/var/stress/r/r3/bug37565_insert_replace.result'. mysqltest: Connecting to server localhost:3306 (socket /tmp/VGpWvelUGD/master.sock) as 'root', connection 'default', attempt 0 ... mysqltest: ... Connected. mysqltest: Start processing test commands from '/home/lsoares/Workspace/mysql-server/bugfix/b37565/6.0-rpl/mysql-test/var/stress/t/bug37565_insert_replace.test' ... mysqltest: At line 16: query 'INSERT INTO view2 (t1_uuid) SELECT (SELECT t1_uuid FROM view1 WHERE t1_uuid = @uuid1)' failed: 2013: Lost connection to MySQL server during query From core, I get the following stack trace: (gdb) where #0 0x00007f6e0e9fe1f6 in pthread_kill () from /lib/libpthread.so.0 #1 0x0000000000de7e7c in my_write_core (sig=11) at stacktrace.c:309 #2 0x00000000007717f5 in handle_segfault (sig=11) at mysqld.cc:2710 #3 <signal handler called> #4 0x000000000068f8ac in Field::set_null (this=0x30, row_offset=0) at ../../sql/field.h:280 #5 0x0000000000a7b1e9 in StorageInterface::decodeRecord (this=0x39b1170, buf=0x39b1448 "�") at ha_falcon.cpp:3197 #6 0x0000000000a7f608 in StorageInterface::index_next (this=0x39b1170, buf=0x39b1448 "�") at ha_falcon.cpp:1893 #7 0x0000000000a7f888 in StorageInterface::index_read (this=0x39b1170, buf=0x39b1448 "�", keyBytes=0x399d9f0 "", key_len=39, find_flag=HA_READ_KEY_EXACT) at ha_falcon.cpp:1482 #8 0x00000000008d93a8 in handler::index_read_map (this=0x39b1170, buf=0x39b1448 "�", key=0x399d9f0 "", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at ../../sql/handler.h:1783 #9 0x00000000007fe35c in join_read_always_key (tab=0x399d750) at sql_select.cc:16960 #10 0x000000000080133d in sub_select (join=0x3968430, join_tab=0x399d750, end_of_records=false) at sql_select.cc:16280 #11 0x000000000080f0f9 in do_select (join=0x3968430, fields=0x7f6dfc3948f0, table=0x0, procedure=0x0) at sql_select.cc:15844 #12 0x00000000008290e1 in JOIN::exec (this=0x3968430) at sql_select.cc:2886 #13 0x0000000000823abb in mysql_select (thd=0x7f6dfc3928f8, rref_pointer_array=0x7f6dfc3949d0, tables=0x3676780, wild_num=0, fields=@0x7f6dfc3948f0, conds=0x3676f08, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2149337600, result=0x399cbb8, unit=0x7f6dfc394380, select_lex=0x7f6dfc3947e8) at sql_select.cc:3067 #14 0x00000000008293ff in handle_select (thd=0x7f6dfc3928f8, lex=0x7f6dfc3942e0, result=0x399cbb8, setup_tables_done_option=0) at sql_select.cc:310 #15 0x0000000000783240 in execute_sqlcom_select (thd=0x7f6dfc3928f8, all_tables=0x3676780) at sql_parse.cc:4949 #16 0x0000000000784b35 in mysql_execute_command (thd=0x7f6dfc3928f8) at sql_parse.cc:2157 #17 0x000000000078d0c7 in mysql_parse (thd=0x7f6dfc3928f8, inBuf=0x3676340 "SELECT t1_uuid FROM view2 WHERE t1_uuid = @uuid1", length=48, found_semicolon=0x7f6e02db8b30) at sql_parse.cc:5964 #18 0x000000000078dcc0 in dispatch_command (command=COM_QUERY, thd=0x7f6dfc3928f8, packet=0x7f6dfc39d369 "", packet_length=48) at sql_parse.cc:1049 #19 0x000000000078f247 in do_command (thd=0x7f6dfc3928f8) at sql_parse.cc:731 #20 0x000000000077b571 in handle_one_connection (arg=0x7f6dfc3928f8) at sql_connect.cc:1146 #21 0x00007f6e0e9f93ba in start_thread () from /lib/libpthread.so.0 #22 0x00007f6e0d967fcd in clone () from /lib/libc.so.6 #23 0x0000000000000000 in ?? ()
[3 Jun 2009 22:15]
Christopher Powers
Related: Bug#39445 "Falcon: Update fails following online add column" A bit of explanation: 1) Falcon maintains a mapping of fields (columns) between the server and Falcon. 2) Falcon uses the field map to encode/decode records as they are passed to/from the server. 3) The test case invokes thousands of concurrent online ADD/DROP COLUMN/KEY operations. Each ALTER forces an update to the field map. 4) The test case also issues thousands of INSERT/DELETE operations concurrent with the ALTERs. Each INSERT/DELETE requires an encode/decode using the field map. If the field map goes out of sync, even briefly, then an error could certainly occur. I have not been able to reproduce the failure on the latest codebase, even after increasing the concurrent threads from 10 to 30. I can, however, tighten up the encode/decode operations to better guard against an inconsistent field map.
[3 Jun 2009 23:02]
Davi Arnaut
Christopher, Which tree you used? BTW, here goes the test case that i used to reproduce it some time ago: === modified file 'sql/sql_select.cc' --- sql/sql_select.cc 2009-05-14 21:49:53 +0000 +++ sql/sql_select.cc 2009-05-20 16:10:13 +0000 @@ -31,7 +31,7 @@ #include "mysql_priv.h" #include "sql_select.h" #include "sql_cursor.h" - +#include "debug_sync.h" #include <m_ctype.h> #include <my_bit.h> #include <hash.h> @@ -3005,6 +3005,9 @@ mysql_select(THD *thd, Item ***rref_poin select_lex->context.resolve_in_select_list= TRUE; JOIN *join; + + DEBUG_SYNC(thd, "mysql_select"); + if (select_lex->join != 0) { join= select_lex->join; === modified file 'sql/sql_table.cc' --- sql/sql_table.cc 2009-05-15 08:00:35 +0000 +++ sql/sql_table.cc 2009-05-20 15:58:24 +0000 @@ -6021,6 +6021,8 @@ mysql_fast_or_online_alter_table(THD *th ha_alter_flags)) DBUG_RETURN(1); + DEBUG_SYNC(thd, "mysql_phase2"); + /* Make the metadata lock available to open_table() called to reopen the table down the road. Test case: --source include/have_falcon.inc connect (con1,localhost,root,,test,,); connect (con2,localhost,root,,test,,); connect (con3,localhost,root,,test,,); CREATE TABLE inter1 ( t1_autoinc INTEGER NOT NULL AUTO_INCREMENT, t1_uuid CHAR(36), t2_uuid CHAR(36), t1_blob MEDIUMTEXT, PRIMARY KEY (t1_autoinc), KEY (t1_uuid), KEY (t2_uuid) ) ENGINE = Falcon; INSERT INTO inter1 VALUES (1, 'aaa', 'bbb', 'ccc'); connection con1; set debug_sync='mysql_select SIGNAL parked WAIT_FOR go'; --send select * from inter1; connection default; --echo connection: default set debug_sync= 'now WAIT_FOR parked'; connection con2; set debug_sync='mysql_phase2 SIGNAL go WAIT_FOR done'; --send ALTER TABLE inter1 ADD COLUMN filler TINYTEXT; connection con1; --reap set debug_sync= 'now SIGNAL done'; connection con2; --reap
[5 Jun 2009 5:40]
Christopher Powers
I have recreated the failure using the method described above, and expect to have a fix available within the next day.
[11 Jun 2009 15:56]
Kevin Lewis
This should not be sr54rc since it is a Falcon problem.
[24 Jun 2009 0:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76979 2741 Christopher Powers 2009-06-23 Bug #37565 "Crash in Field_blob::pack" Online ALTER ADD COLUMN creates a new record format that has more fields than the existing rows. Previously, the disparity between the old record formats and the newer, larger record format was handled strictly by StorageInterface::decodeRecord(), which set the undefined fields to NULL prior to returning the decoded record to the server. This was generally sufficient except when the older records were updated. Although Table::update() assigned the new format to the record and Record::setEncodedRecord() resized the field vector, the new fields in the record were never initialized to NULL. StorageInterface::encodeRecord() now initializes undefined fields to NULL prior to storing the encoded record. modified: storage/falcon/ha_falcon.cpp storage/falcon/ha_falcon.h
[14 Jan 2010 23:12]
Roel Van de Paar
[21 Jun 2008 22:56] Philip Stoev > ENGINE=MyISAM causes a deadlock (to be filed as a separate bug) Hi Philip, was a bug logged for this? Is InnoDB affected in any way?
[14 May 2010 8:07]
Philip Stoev
Roel, with next-mr, I am unable to reproduce a deadlock with either MyISAM or InnoDB. My guess is that the metadata locking feature in next-mr has resolved any deadlock issues.