Bug #26433 | Falcon: crash in procedure after error 1025 | ||
---|---|---|---|
Submitted: | 16 Feb 2007 0:09 | Modified: | 20 May 2007 5:16 |
Reporter: | Peter Gulutzan | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S2 (Serious) |
Version: | 5.2.4-falcon-alpha-debug | OS: | Linux (SUSE 10.0 / 64-bit) |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
[16 Feb 2007 0:09]
Peter Gulutzan
[16 Feb 2007 1:54]
Hakan Küçükyılmaz
Verified as described. The loop count on my machine could not even get to 1000 before crashing +------+---------------------------------------------------------------------+ | v | @x | +------+---------------------------------------------------------------------+ | 969 | update t1 set b = repeat(0x1096,481), a =969, c = repeat(0x22,2294) | +------+---------------------------------------------------------------------+ 1 row in set (9 min 17.00 sec) +----------------------+ | error, probably 1025 | +----------------------+ | error, probably 1025 | +----------------------+ 1 row in set (9 min 17.00 sec) +---+ | 1 | +---+ | 1 | +---+ 1 row in set (9 min 17.02 sec) +---+ | 2 | +---+ | 2 | +---+ 1 row in set (9 min 17.07 sec) +---+ | 3 | +---+ | 3 | +---+ 1 row in set (9 min 17.36 sec) ERROR 2013 (HY000): Lost connection to MySQL server during query 5.2.4-falcon-alpha-debug Best regards, Hakan
[16 Feb 2007 2:05]
Hakan Küçükyılmaz
Added test case falcon_bug_26433.test and pushed to mysql-5.1-falcon tree. Backtrace is: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1098242992 (LWP 20927)] 0x08426d60 in Transaction::addRecord (this=0x4050d4e8, record=0x4059adec) at Transaction.cpp:300 300 *recordPtr = record; (gdb) bt #0 0x08426d60 in Transaction::addRecord (this=0x4050d4e8, record=0x4059adec) at Transaction.cpp:300 #1 0x0842229b in Table::update (this=0x404ffba4, transaction=0x4050d4e8, oldRecord=0x406dc0cc, numberFields=1, updateFields=0x4052aaec, values=0x40510794) at Table.cpp:1005 #2 0x0848412d in NUpdate::evalStatement (this=0x4050f324, statement=0x404fc744) at NUpdate.cpp:143 #3 0x084befd2 in Nfs::Statement::start (this=0x404fc744, node=0x4050f324) at Statement.cpp:449 #4 0x08486a95 in PreparedStatement::executeUpdate (this=0x404fc744) at PreparedStatement.cpp:86 #5 0x08465394 in Index::rename (this=0x40529ad0, newName=0x41759c70 "#SQL2-51B9-2$1") at Index.cpp:654 #6 0x08412097 in StorageDatabase::renameTable (this=0x404fa5d4, storageConnection=0x404fa514, table=0x40524798, tableName=0x41759ed4 "#SQL2-51B9-2", schemaName=0x41759dd4 "TEST") at StorageDatabase.cpp:549 #7 0x084172c9 in StorageTableShare::renameTable (this=0x43341a9c, storageConnection=0x404fa514, newName=0x4175a44f "./test/#sql2-51b9-2") at StorageTableShare.cpp:170 #8 0x0840b63d in NfsStorageTable::rename_table (this=0x89c8eb8, from=0x4175a64f "./test/t1", to=0x4175a44f "./test/#sql2-51b9-2") at ha_falcon.cpp:1087 #9 0x08370f5d in mysql_rename_table (base=0x893ae20, old_db=0x89c1318 "test", old_name=0x89c1148 "t1", new_db=0x89c1318 "test", new_name=0x4175b888 "#sql2-51b9-2", flags=2) at sql_table.cc:3708 #10 0x08377147 in mysql_alter_table (thd=0x896a470, new_db=0x89c1318 "test", new_name=0x89c1148 "t1", lex_create_info=0x89b989c, table_list=0x89c1170, fields=@0x89b97b4, keys=@0x89b97a8, order_num=0, order=0x0, ignore=false, alter_info=0x89b9a28, do_send_ok=true) at sql_table.cc:6525 #11 0x0827d9c3 in mysql_execute_command (thd=0x896a470) at sql_parse.cc:3213 #12 0x083dae1f in sp_instr_stmt::exec_core (this=0x89c13a8, thd=0x896a470, nextp=0x4175c444) at sp_head.cc:2576 #13 0x083dabd9 in sp_lex_keeper::reset_lex_and_exec_core (this=0x89c13d0, thd=0x896a470, nextp=0x4175c444, open_tables=false, instr=0x89c13a8) at sp_head.cc:2445 #14 0x083de9bd in sp_instr_stmt::execute (this=0x89c13a8, thd=0x896a470, nextp=0x4175c444) at sp_head.cc:2527 #15 0x083d88c9 in sp_head::execute (this=0x89943c0, thd=0x896a470) at sp_head.cc:1078 #16 0x083dd0e9 in sp_head::execute_procedure (this=0x89943c0, thd=0x896a470, args=0x896a9d0) at sp_head.cc:1711 #17 0x082822e1 in mysql_execute_command (thd=0x896a470) at sql_parse.cc:4698 #18 0x0828411c in mysql_parse (thd=0x896a470, inBuf=0x8996460 "call p1()", length=9) at sql_parse.cc:6162 #19 0x08284b27 in dispatch_command (command=COM_QUERY, thd=0x896a470, packet=0x898e401 "", packet_length=10) at sql_parse.cc:1857 #20 0x08285b8e in do_command (thd=0x896a470) at sql_parse.cc:1626 #21 0x0828603f in handle_one_connection (arg=0x896a470) at sql_parse.cc:1232 #22 0x40284297 in start_thread () from /lib/tls/libpthread.so.0 #23 0x4021937e in clone () from /lib/tls/libc.so.6 #24 0x4175dbb0 in ?? () (gdb) f 0 #0 0x08426d60 in Transaction::addRecord (this=0x4050d4e8, record=0x4059adec) at Transaction.cpp:300 300 *recordPtr = record; (gdb) p *recordPtr Cannot access memory at address 0xeeeeeeee (gdb) p record $3 = (class RecordVersion *) 0x4059adec (gdb) Regards, Hakan
[19 Feb 2007 15:09]
Hakan Küçükyılmaz
Modified test case to use a seed for rand() to get better reproducibility. DROP TABLE IF EXISTS t1; DROP PROCEDURE IF EXISTS p1; # Seed for rand() to get more predictable code path. SET @a = 707; DELIMITER // CREATE PROCEDURE p1 () BEGIN DECLARE v int default 0; DECLARE v1000 int; DECLARE v999 int; DECLARE v255 int; DECLARE v99 int; DECLARE CONTINUE HANDLER FOR SQLEXCEPTION BEGIN DECLARE vx int; SELECT 'error, probably 1025'; DROP TABLE t1; SELECT '1'; CREATE TABLE t1 ( a int, b varchar(500) character set latin1, c varchar(500) character set latin2 ); SET vx = 0; WHILE vx < 20 do INSERT INTO t1 VALUES (vx, null, null); SET vx = vx + 1; END WHILE; SELECT '2'; CREATE INDEX i1 ON t1 (b, a); CREATE INDEX i2 ON t1 (c, a); SELECT '3'; END; SELECT 'a'; CREATE TABLE t1 ( a int, b varchar(500) character set latin1, c varchar(500) character set latin2 ); SELECT 'b'; SET v = 0; WHILE v < 20 do INSERT INTO t1 VALUES (v, null, null); SET v = v + 1; END WHILE; SELECT 'c'; CREATE INDEX i1 ON t1 (b, a); CREATE INDEX i2 ON t1 (c, a); select 'd'; SET v = 0; while v < 2500 do SET v1000 = rand(@a) * 1000; SET v999 = rand(@a) * 999; SET v255 = rand(@a) * 255; SET v99 = rand(@a) * 99; SET @x = concat('update t1 set b = repeat(0x', hex(v1000),',',v999,'), a =', v,', c = repeat(0x', hex(v255), ',',v99,')'); /* SELECT v, @x; */ PREPARE stmt1 FROM @x; EXECUTE stmt1; IF v mod 2 = 0 THEN ALTER TABLE t1 MODIFY COLUMN b varchar(500) character set latin2; ALTER TABLE t1 MODIFY COLUMN c varchar(500) character set latin1; ELSE ALTER TABLE t1 MODIFY COLUMN b varchar(500) character set latin1; ALTER TABLE t1 MODIFY COLUMN c varchar(500) character set latin2; END IF; SET v = v + 1; END WHILE; END// CALL p1()//
[14 Apr 2007 1:50]
Hakan Küçükyılmaz
Does not crash anymore. Regards, Hakan
[15 Apr 2007 20:45]
Peter Gulutzan
I tried again today with the latest build ChangeSet@1.2598, 2007-04-15 It crashed after 1518 iterations.
[15 Apr 2007 21:02]
Hakan Küçükyılmaz
Can't reproduce: Test(s) which will be run though they are marked as disabled: falcon_bug_26433 : Bug#26433 2007-02-16 hakank Currently failing TEST RESULT TIME (ms) ------------------------------------------------------- falcon_bug_26433 [ pass ] 412027 ------------------------------------------------------- Stopping All Servers All 1 tests were successful. The servers where restarted 1 times Spent 412.027 seconds actually executing testcases Linux 64-bit, change set 1.2599, 2007-04-15 13:58:53-04:00, jas@fluffy.netfrastructure.com +1 -0
[16 Apr 2007 1:23]
Peter Gulutzan
It's still crashing. Linux 64-bit, change set 1.2599, 2007-04-15 13:58:53-04:00, jas@fluffy.netfrastructure.com +1 -0 Perhaps someone could try to run the original script for several hours.
[16 Apr 2007 8:23]
Hakan Küçükyılmaz
Crashes now with random seed of 909. Program received signal SIGSEGV, Segmentation fault. Index::getDeferredIndex (this=0x2aaaaacfd418, transaction=0x2aaaaacbc4b8) at Index.cpp:222 222 if ((deferredIndex->index == this) && (deferredI ndex->virtualOffset == 0)) (gdb) bt #0 Index::getDeferredIndex (this=0x2aaaaacfd418, transaction=0x2aaaaacbc4b8) at Index.cpp:222 #1 0x00000000007cba10 in Index::insert (this=0x2aaaaacfd418, key=0x4087d280, recordNumber=18463, transaction=0x2aaaaacbc4b8) at Index.cpp:189 #2 0x00000000007cba7e in Index::update (this=0x2aaaaacfd418, oldRecord=0x2aaaadb74210, record=0x2aaaadb75800, transaction=0x2aaaaacbc4b8) at Index.cpp:499 #3 0x000000000079d46b in Table::update (this=0x2aaaaacfce38, transaction=0x2aaaaacbc4b8, oldRecord=0x2aaaadb74210, numberFields=1, updateFields=0x2aaaaad366b8, values=0x2aaaaacb7888) at Table.cpp:1021 #4 0x00000000007e27bb in NUpdate::evalStatement (this=0x2aaaaad400c8, statement=0x2aaaaad48bc8) at NUpdate.cpp:143 #5 0x00000000008127ac in Nfs::Statement::start (this=0x2aaaaad48bc8, node=0x2aaaaad400c8) at Statement.cpp:451 #6 0x00000000007e469d in PreparedStatement::executeUpdate ( this=0x2aaaaacfd418) at PreparedStatement.cpp:86 #7 0x00000000007cc15e in Index::rename (this=0x2aaaaad4a678, newName=0x4087dac0 "T1$0") at Index.cpp:700 #8 0x00000000007901f6 in StorageDatabase::renameTable ( this=<value optimized out>, storageConnection=0x2aaaaacf6c08, table=0x2aaaaad37e00, tableName=0x2aaaaacaf374 "T1", schemaName=0x2aaaaad37bcc "TEST") at StorageDatabase.cpp:595 (gdb) f 0 #0 Index::getDeferredIndex (this=0x2aaaaacfd418, transaction=0x2aaaaacbc4b8) at Index.cpp:222 222 if ((deferredIndex->index == this) && (deferredIndex->virtualOffset == 0)) (gdb) l 217 { 218 for (deferredIndex = transaction->deferredIndexes; 219 deferredIndex; 220 deferredIndex = deferredIndex->nextInTransaction) 221 { 222 if ((deferredIndex->index == this) && (deferredIndex->virtualOffset == 0)) 223 break; 224 } 225 } 226 (gdb) p this $1 = (Index * const) 0x2aaaaacfd418 (gdb) p deferredIndex $2 = (DeferredIndex *) 0xeeeeeeeeeeeeeeee (gdb)
[21 Apr 2007 4:08]
Kevin Lewis
I finally figured out why the transaction was being removed in the middle of being used. It was the system transaction and the scavenger thread was calling commitSystemTransaction while another thread was doing a renameTable(). Specifically, Index::rename() was not protected with a shared lock on Database::syncSysConnection. The other two rename functions in renameTable, Table::rename() and Sequence::rename(), were being protected. Instead of adding that lock on Index::rename(), I put the lock up higher, in StorageDatabase::renameTable(). Then I deleted the lock from Table::rename() and Sequence::rename(). This allows the entire StorageDatabase::renameTable() to be done in a single transaction. In addition, I am using incrementing the Transaction::useCount when the transaction is assigned to a Statement object, and decrementing it when that object is destroyed. Then at the bottom of Transaction::commit, where unused Transactions are destroyed, I don't allow the transaction to be destroyed until the useCount gets back down to 1. In this way, the Transaction is not destroyed while there are any other references to it, at least from a Statement object.
[30 Apr 2007 10:20]
Hakan Küçükyılmaz
Passes now with random seed of 909, too. Best regards, Hakan
[30 Apr 2007 10:25]
Hakan Küçükyılmaz
Passes now with random seed of 909, too. Best regards, Hakan
[30 Apr 2007 14:08]
MC Brown
A note has been added to the 6.0.0 changelog.
[2 May 2007 10:05]
Hakan Küçükyılmaz
Peter, I followed your advise and run the your initial test. It does not crash the server anymore but it get's slower and slower: +-------+-------------------------------------------------------------------------+ | v | @x | +-------+-------------------------------------------------------------------------+ | 38650 | update tq set s1 = repeat(0x2395,291), s0=38650, s2 = repeat(0xB9,7458) | +-------+-------------------------------------------------------------------------+ 1 row in set (1 day 1 hour 9 min 59.74 sec) +-------+-------------------------------------------------------------------------+ | v | @x | +-------+-------------------------------------------------------------------------+ | 38651 | update tq set s1 = repeat(0x15C6,549), s0=38651, s2 = repeat(0x13,7264) | +-------+-------------------------------------------------------------------------+ 1 row in set (1 day 1 hour 10 min 4.20 sec) +-------+------------------------------------------------------------------------+ | v | @x | +-------+------------------------------------------------------------------------+ | 38652 | update tq set s1 = repeat(0xFEC,858), s0=38652, s2 = repeat(0x12,7768) | +-------+------------------------------------------------------------------------+ 1 row in set (1 day 1 hour 10 min 8.67 sec) +-------+------------------------------------------------------------------------+ | v | @x | +-------+------------------------------------------------------------------------+ | 38653 | update tq set s1 = repeat(0x1A48,33), s0=38653, s2 = repeat(0x25,6356) | +-------+------------------------------------------------------------------------+ Segmentation fault After one day mysql client crashed. If you think the performance or the client crash is a bug, feel free to open a new bug about it. Best regards, Hakan
[2 May 2007 18:59]
Peter Gulutzan
I now encounter a 'recovery failure' problem using this procedure. How to repeat: Create procedure and call it as originally described. Let it run for at least 5 seconds. Elsewhere, find out the mysqld process number and kill it, e.g. " linux:/home/pgulutzan # ps -A | grep mysqld 12471 pts/1 00:00:02 mysqld linux:/home/pgulutzan # kill -9 12471 " Restart mysqld. Restart mysql client. Say 'use (databasename)' and/or 'select * from tq;'. I tried that 3 times, and got a crash 3 times. There are already bugs about recovery failure, e.g. Bug#25017, but I like this example because I think/hope it will be easy to reproduce.
[9 May 2007 16:44]
Kevin Lewis
This bug has been fixed, mutated, and re-opened so many times I don't knwo what the problem is. There seems to be a complaint about a performance problem and a recovery issue. Please open individual bugs for each issue.
[20 May 2007 5:17]
MC Brown
Already documented in the 6.0.0 changelog.