Bug #25017 Falcon: crash recovery failure with alterations
Submitted: 12 Dec 2006 18:52 Modified: 13 Nov 2007 18:11
Reporter: Peter Gulutzan Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:5.1.13-falcon-alpha-debug OS:Linux (SUSE 10.0 / 64-bit)
Assigned to: CPU Architecture:Any

[12 Dec 2006 18:52] Peter Gulutzan
Description:
If I force the server to go down abruptly,
then start it again, the database is
corrupt -- attempts to select from it cause
crashes, or (more rarely) cause 'table not
found' errors.

Thus the effect is similar to bug#22153 "Crash recovery
failure" but the test case is quite different.

For this failure, I have a procedure
that contains alters and inserts. I
have to run the procedure for a minute on
three connections, before killing the server.

How to repeat:
I used five terminal programs (consoles).

On console 0, start the mysqld server.

On console 2 and 3 and 4, start the mysql client
as usual, and 'use' to the same database.

On console 2, say:

delimiter //
create table t (s1 int primary key, s2 int) engine=falcon//
create procedure p ()
begin
  declare v int default 0;
  declare continue handler for sqlexception select 'error';
  while v < 10000 do
    alter table t partition by hash (s1);
    select v;
    insert into t values (v,null);
    alter table t remove partitioning;
    set v = v + 1;
    end while;
  end//
delimiter ;

On console 2 and 3 and 4, say:
call p();

Let the three mysql client jobs run the procedure for about a minute.

On console 1, say:

On console 1, say:
su root
ps -A | grep mysqld
... This tells me the process number of mysqld

kill -9 X
... where X is the process number of mysqld

This simulates a crash, but a fairly mild one,
since the operating-system buffers aren't cleared.

On console 0, start the mysqld server again.

On console 2, say
select count(*) from t;

On my machine, the above SELECT fails reliably.
[13 Dec 2006 16:48] Peter Gulutzan
For
#25017 Falcon: crash recovery failure with alterations

I fiddled with the test case until I had something
that failed (for me) five times in a row. Either I
saw a crash, or I saw a 'table not found' error.

The new test case is:

delimiter //
delete from mysql.proc//
create database d50//
use d50//
create table ty (s1 int) engine=myisam//
create procedure py ()
 begin
  declare v int default 0;
  drop table ty;
  create table ty (s1 int, s2 varchar(0)) engine=falcon;
  create index iy on ty (s1);
  while v < 10000 do
   select v; insert into ty values (v,repeat('a',v));
   set @v = concat('alter table ty modify column s2 varchar(',v,')');
   prepare stmt1 from @v;
   execute stmt1;
   update ty set s1 = s1 - rand()*10000;
   set v = v + 1;
  end while;
 end//
call py()//

/*
   Wait till v > 200.
   Say 'kill -9' on other console. */
   Start mysqld again.
*/

delimiter //
create database d51//
use d51//
create table ty (s1 int) engine=myisam//
create procedure py ()
 begin
  declare v int default 0;
  drop table ty;
  create table ty (s1 int, s2 varchar(0)) engine=falcon;
  create index iy on ty (s1);
  while v < 10000 do
   select v; insert into ty values (v,repeat('a',v));
   set @v = concat('alter table ty modify column s2 varchar(',v,')');
   prepare stmt1 from @v;
   execute stmt1;
   update ty set s1 = s1 - rand()*10000;
   set v = v + 1;
  end while;
 end//
call py()//

/*
   Wait till v > 200.
   Say 'kill -9' on other console. */
   Start mysqld again.
*/

select count(s2) from d50.ty//
select count(s2) from d51.ty//
drop table d50.ty//
drop table d51.ty//
[27 Feb 2007 15:20] MySQL Verification Team
Thank you for the bug report. Verified with last test case.
[20 Mar 2007 14:59] Jim Starkey
Problem was in the delete data section code executing a drop table.  In specific, if the server was stopped during the section page tree deletion, it would restart at recovery time, leading to bad results.  The fix was to log a slot change (page number to zero) before deleting a section page.

Nice bug, very well documented, and easily reproduceable.  I greatly appreciate the time and effort that went into reducing a rare event to predictable script.  Thanks.
[15 Apr 2007 19:42] Hakan Küçükyılmaz
Not fixed. I get following error:

delimiter //
delete from mysql.proc//
create database d50//
use d50//
create table ty (s1 int) engine=myisam//
create procedure py ()
 begin
  declare v int default 0;
  drop table ty;
  create table ty (s1 int, s2 varchar(0)) engine=falcon;
  create index iy on ty (s1);
  while v < 10000 do
   select v; insert into ty values (v,repeat('a',v));
   set @v = concat('alter table ty modify column s2 varchar(',v,')');
   prepare stmt1 from @v;
   execute stmt1;
   update ty set s1 = s1 - rand()*10000;
   set v = v + 1;
  end while;
 end//
call py()//

/*
   Wait till v > 200.
   Say 'kill -9' on other console.
   Start mysqld again.
*/

delimiter //
create database d51//
use d51//
create table ty (s1 int) engine=myisam//
create procedure py ()
 begin
  declare v int default 0;
  drop table ty;
  create table ty (s1 int, s2 varchar(0)) engine=falcon;
  create index iy on ty (s1);
  while v < 10000 do
   select v; insert into ty values (v,repeat('a',v));
   set @v = concat('alter table ty modify column s2 varchar(',v,')');
   prepare stmt1 from @v;
   execute stmt1;
   update ty set s1 = s1 - rand()*10000;
   set v = v + 1;
  end while;
 end//
call py()//

/*
   Wait till v > 200.
   Say 'kill -9' on other console.
   Start mysqld again.
*/

select count(s2) from d50.ty//
[21:39] root@d51>select count(s2) from d50.ty//
ERROR 1030 (HY000): Got error 157 from storage engine
[31 May 2007 17:36] Kevin Lewis
This is similar to Bug#28351
[26 Jun 2007 18:29] Jim Starkey
Rather three clients in parallel for a full minute, twice.  Successful
recoveries each time.  The test was run on a four processor SuSE 64 bit
Linux.
[26 Jun 2007 20:55] Hakan Küçükyılmaz
Still fails. Hit's assertion. I used following procedure with three clients and killed mysqld.

delimiter //
create table t (s1 int primary key, s2 int) engine=falcon//
create procedure p ()
begin
  declare v int default 0;
  declare continue handler for sqlexception select 'error';
  while v < 10000 do
    alter table t partition by hash (s1);
    select v;
    insert into t values (v,null);
    alter table t remove partitioning;
    set v = v + 1;
    end while;
  end//
delimiter ;

[Switching to LWP 20367]
0x00002b516d717fed in ?? ()

(gdb) f 4
#4  0x00000000007a6f45 in ~SyncObject (this=0x2b516ede5128) at SyncObject.cpp:119
119             ASSERT(lockState == 0);
(gdb) p lockState

(gdb) bt
#0  0x00002b516d717fed in ?? ()
#1  0x00000000007d3af2 in Error::debugBreak () at Error.cpp:92
#2  0x00000000007d3be8 in Error::error (string=<value optimized out>) at Error.cpp:69
#3  0x00000000007d3c48 in Error::assertionFailed (fileName=0x4 <Address 0x4 out of bounds>, line=20367)
    at Error.cpp:76
#4  0x00000000007a6f45 in ~SyncObject (this=0x2b516ede5128) at SyncObject.cpp:119
#5  0x000000000082e34f in ~Bdb (this=0x2b516ede50c0) at BDB.cpp:65
#6  0x0000000000830180 in ~Cache (this=0x2b516ef80198) at Cache.cpp:117
#7  0x00000000007c96eb in ~Database (this=0x2b516ed7eb98) at Database.cpp:572
#8  0x00000000007c02b5 in Connection::getDatabase (this=0x2b516ef7f9b0, dbName=0x2b516ef7f6ac "falcon_master",
    dbFileName=0x4007e1d0 "/data/mysql-sap/falcon_master.fts", threads=0x2b516ef7f718) at Connection.cpp:1656
#9  0x00000000007c0f51 in Connection::openDatabase (this=0x2b516ef7f9b0, dbName=0x2b516ef7f6ac "falcon_master",
    filename=<value optimized out>, account=0x9f29fc "mysql", password=0x9f29fc "mysql", address=0x0,
    parent=0x2b516ef7f718) at Connection.cpp:928
#10 0x000000000079f7d3 in StorageDatabase::getOpenConnection (this=0x2b516ef7f538) at StorageDatabase.cpp:127
#11 0x00000000007a0ff7 in StorageHandler::initialize (this=0x2b516ed7e048) at StorageHandler.cpp:913
#12 0x00000000007a1f82 in StorageHandler::getStorageConnection (this=0x2b516ed7e048, tableShare=0x2b516ef7f1c8,
    mySqlThread=0xf4b618, mySqlThdId=1, createFlag=OpenDatabase, tableSpaceMode=0) at StorageHandler.cpp:576
#13 0x000000000079abba in StorageInterface::open (this=0xfa3828, name=0xf967f8 "./test/t",
    mode=<value optimized out>, test_if_locked=<value optimized out>) at ha_falcon.cpp:402
#14 0x00000000006ef81a in handler::ha_open (this=0xfa3828, table_arg=0xf95bb8, name=0xf967f8 "./test/t",
    mode=2, test_if_locked=<value optimized out>) at handler.cc:1555
#15 0x0000000000660225 in open_table_from_share (thd=0xf4b618, share=0xf96478, alias=<value optimized out>,
    db_stat=39, prgflag=<value optimized out>, ha_open_flags=0, outparam=0xf95bb8, is_create_table=false)
    at table.cc:1625
#16 0x00000000006572e8 in open_unireg_entry (thd=0xf4b618, entry=0xf95bb8, table_list=0x40080d20,
    alias=0xfa5440 "t", cache_key=0x40080330 "test", cache_key_length=7, mem_root=0x40080560, flags=0)
    at sql_base.cc:3301
[7 Sep 2007 16:42] Kevin Lewis
WL#3726 is targeted for MySQL 5.2.
[12 Oct 2007 17:24] Kevin Lewis
Miguel, Can you re-verify this against mysql-6.0-falcon?
[13 Nov 2007 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[13 Nov 2007 18:11] Peter Gulutzan
I tried this again today, using the last script by Hakan Kuecuekyilmaz.
I cannot repeat the buggy behaviour.