MySQL Bugs: #33177: Table creation fails after error 305 and tablespace change

Bug #33177	Table creation fails after error 305 and tablespace change
Submitted:	12 Dec 2007 17:10	Modified:	15 May 2009 17:04
Reporter:	Giuseppe Maxia	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S2 (Serious)
Version:	6.0.4	OS:	Any (Linux and Mac OS X)
Assigned to:	Christopher Powers	CPU Architecture:	Any
Tags:	CREATE TABLE, F_MEMORY, Tablespace

Description:
After the following actions:
* an error of memory exhaustion
* a removal of a table

Falcon can't re-create the removed table in a different tablespace, due to an error.

For example:

insert into t2 select * from t1

Query OK, 500000 rows affected (18.06 sec)
Records: 500000  Duplicates: 0  Warnings: 0

alter table t2 drop primary key
ERROR 1296 (HY000): Got error 305 'record memory is exhausted' from Falcon

alter table t2 add key (c4)
ERROR 1005 (HY000): Can't create table 'test.#sql-656_1' (errno: 156)

How to repeat:
use test;
set global falcon_record_memory_max=1024*1024*128; 

drop tablespace ts1 engine=falcon;

create tablespace ts1 add datafile 'ts1' engine=falcon;

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
  c1 int(11) NOT NULL,
  c2 char(3) NOT NULL,
  c3 char(3) NOT NULL,
  c4 date NOT NULL,
  c5 char(5) NOT NULL,
  c6 char(8) NOT NULL,
  c7 double NOT NULL,
  c8 datetime NOT NULL,
  c9 double NOT NULL,
  c10 datetime NOT NULL,
  c11 char(3) NOT NULL,
  c12 char(3) NOT NULL,
  c13 int(11) NOT NULL,
  c14 char(1) DEFAULT 'n',
  PRIMARY KEY (c4,c1,c3,c2,c5,c11,c10,c12)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

delimiter //
drop procedure if exists fill_table //
create procedure fill_table(nrecords int)
deterministic
begin
    declare i int default 0;
    FILL:
    loop
        set i = i + 1;
        if i > nrecords then
            leave FILL;
        end if;
        insert into t1 values (i, 'xxx', 'xxx', 
                @dt:= date('1900-01-01' + interval i second),
                'xxxx', 'xxxxxxxx', i, @dt, 
                i/2, @dt - interval 1 hour, 'xxx', 'xxx', 1000, 'n'); 
    end loop; 
end//

delimiter ;

# increase the amount of records until you get an exceeded record memory error
call fill_table(500000);

create table t2 like t1;
alter table t2 engine=falcon;
insert into t2 select * from t1;
alter table t2 drop primary key;
alter table t2 add key (c4);
truncate t2;
drop table t2;

create table t2 like t1;
alter table t2 engine=falcon tablespace ts1;
insert into t2 select * from t1;
alter table t2 drop primary key;
alter table t2 add key (c4);
truncate t2;
drop table t2;

create table t2 like t1;
alter table t2 engine=falcon;
insert into t2 select * from t1;
alter table t2 drop primary key;
alter table t2 add key (c4);
truncate t2;
drop table t2;

drop table t1;

Verified also on Mac OS X 10.4 and 10.5.

Sergey Vojtovich wrote;

The problem occurs during "alter table t2 drop primary key" query:
- alter table code creates temporary falcon table (to avoid confusion: here and below I use term "temporary table", but afaics from an engine point of view it is regular table);
- alter table code reads data from real table and writes it to a temporary table;
- at some point (on write_row) falcon gets out of record memory limit and returns an error;
- alter table code detects this error, stops copying data and attempts to remove temporary table;
- when falcon drops a table, it removes related information from system tables (e.g. system.fields);
- to remove information from system table, it must allocate record buffer;
- as we're out of record memory, alloc fails => falcon fails to drop a table.

As a result temporary table is not dropped.

If I understand this problem correctly it is a kind of chicken and egg issue. I believe it could be solved by not applying falcon_record_memory_max limit for system tables handling.

Jim says he want to look at this problem.  Why cann't the second drop table complete?  The forced scavenge aught to work.

Sergey,  I think the core issue here is not that an error 305 occurs from a drop table, because that is an offline alter.  But it seems like the server and Falcon get mixed up about what tables still exist.  You have fixed errors like this before with a well placed rollback.  So maybe you can have a look at this.

Sergey Vojtovich wrote;

I still think that I'm not a person who could fix this one. Below are some
explanations.

> > [25 Apr 2008 17:50] Kevin Lewis
> >
> > Jim says he want to look at this problem.  Why cann't the second drop table
> > complete?  The forced scavenge aught to work.

If I understand correctly, scavanger is intended to release unused blocks.
But when offline ALTER TABLE is issued, it needs memory for all records. And
if table copy requires more memory than record_memory_max, we will end up
with an error.

So the scavenger seems to be useles here.

Anyway an error during ALTER TABLE is absolutely acceptable.

> > [25 Nov 2008 17:59] Kevin Lewis
> >
> > Sergey,  I think the core issue here is not that an error 305 occurs from a
> > drop table, because that is an offline alter.  But it seems like the server
> > and Falcon get mixed up about what tables still exist.  You have fixed errors
> > like this before with a well placed rollback.  So maybe you can have a look
> > at this.
Yes and no. First of all, just to clarify, this error occurs not from the DROP
TABLE statement, but from the ALTER TABLE when it is attempting to copy
data.

If we get this error with DROP TABLE, it will be detected by the server. In
this case server will refuse to drop, that is preserve the .frm file,
keep the table and return an error.

With offline ALTER TABLE it is slightly different. Server is creating
non-temporary altered table in Falcon with some special name, like
'#sql-656_1'. There is no .frm file for this table. This table is a kind of
non-temporary temporary table (sorry for jam).

When copying data to this table, at some point Falcon gets out of record
memory and returns an error. Server detects this error, aborts alter
table process and cleaning up.

Everything is fine until this point. The problem occurs when the server is
cleaning up, it is attempting to remove the table with special name. At
this point server doesn't expect that table deletion may fail. But Falcon
cannot delete this table because it is out of record memory even for
operations on "system" tables. As a result table with name '#sql-656_1' is
preserved.

When doing next offline ALTER TABLE, the server is attempting to create
another table with the same "special" name - '#sql-656_1'. And it fails
because this table still exists in Falcon.

Now, how I think it may be solved:
- probably chill/thaw may help, if I understand correctly, this writes pages
  to disk, when we're running out of record memory.
- probably do not apply record memory limit for system tables.
- (server) do not reset a counter for special table names. That is with the
  second alter table create a table with name '#sql-656_2'. Which is evil -
  I don't think we want to preserve trash in Falcon datafiles.
- (server) keep trying to drop a table until it gets dropped. Dummy - may be
  a dead loop.
- (server) keep table names list, that we failed to drop during ALTER TABLE
  and drop them later. Dummy as above.

Regards,
Sergey

Chris has found an imcmompatibility between anything that holds SyncSysDDL and the scavenger.  At the top of Database::scavenge() is a call to updateCardinalities().  This function gets a shared lock on SyncSysDDL which makes the scavenger wait for any DDL activity to finish before records can get scavenged. This prevents the creation of ANY large indexes. 

He is currently implementing a solution in which updateCardinalities() is moved into its own thread with its own schedule.  It needs to be independent of the scavenger, becasue it is not related.  That was just a convenient place to put it.

That activity might make this bug moot, by preventing an error 305 during almost all DDLs.  BUT, what if it does still occur?  I still do not understand the chicken and egg issue.  If an error occurs on a drop temporary table, why can't the server retry it or avoid using that temporary name again?

Another option is to change temporary tables naming rules in ALTER TABLE. Currently it's something like '#sql-' process id '_' thread id. Either add a counter to this name, or keep creating a table with different temporary names until succeeded.

This would also require trash detector in an engine, as ALTER TABLE may leave unreferenced tables. It may be executed on engine startup/shutdown and remove all tables starting with '#sql'.

I believe this would be more than just a bug fix.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/68093

3046 Christopher Powers	2009-03-03
      Bug #42651 "Regression: falcon_bug_22169-big started to fail with error 305"
      Bug #33177 "Table creation fails after error 305 and tablespace change"
      Bug #32838 "Falcon; error 1296 : Got error 305 'record memory is exhausted'"
      
      The fix for these bugs is the first of several improvements
      to Falcon's memory management (Worklog TBD).
      
      Falcon out-of-memory errors are caused by a combination of things.
      Recent improvements to the Scavenger and to the Backlogging subsystem
      (Bug#42592) have contributed to the resolution of these bugs, however,
      certain operations can still fill the record cache to the point where
      scavenging is ineffective.
      
      Scavenging efficiency will be greatly improved by allocating record
      data and metadata separately. The record cache now stores only
      actual record data, and Record and RecordVersion objects (metadata)
      are allocated from separate memory pools.
      
      The metadata memory pools are completely homogeneous, with no memory
      fragmentation. The record cache will also be far less fragmented,
      because large blocks of record data will no longer be interspersed
      with very small blocks of object data.
      
      Decoupling the data and metadata will also greatly reduce the number of
      out-of-memory conditions--typically seen during large inserts and
      updates--because the memory pools are allowed to grow independently.
      
      These memory pools may flucuate considerably during massive transactions,
      depending upon the record makeup and type of operation. This flucuation,
      however, serves only to emphasize the value managing these memory pools
      separately.
      
      One side-effect of this change is that, while the record cache max size
      remains fixed, the record metadata caches can grow unbounded. Although
      this is not unprecedented (Falcon's general purpose memory pool has
      always been unbounded), one remaining challenge is to ensure that
      the Falcon memory manager releases resources back to the system as
      soon as possible.

Pushed into 6.0.11-alpha (revid:hky@sun.com-20090402144811-yc5kp8g0rjnhz7vy) (version source revid:christopher.powers@sun.com-20090303070929-ig36zlo3luoxrm2t) (merge vers: 6.0.11-alpha) (pib:6)

A note has been added to the 6.0.11 changelog: 

If Falcon runs out of memory while inserting records and you try to alter the affected table, you may get a record memory is exhausted error, and the table can no longer be used or accessed.