Bug #25665 Creating primary key index on Falcon table uses too much disk space
Submitted: 17 Jan 2007 4:33 Modified: 27 Apr 2009 2:58
Reporter: Mark Callaghan Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:5.2 Falcon OS:Any
Assigned to: Christopher Powers CPU Architecture:Any
Tags: disk, F_INDEX, space

[17 Jan 2007 4:33] Mark Callaghan
Description:
Insert 10M rows into a Falcon table with a primary key index define and the Falcon datafile is ~420MB. Repeat the test without the primary key index defined and then create the index after all inserts are done. At the end, the Falcon datafile is ~800MB.

How to repeat:
Test table without the PK index. The first part of the test requires a PK index.
CREATE TABLE C (
  ChId bigint(20) NOT NULL default '0',
  Timestamp datetime NOT NULL default '0000-00-00 00:00:00',
  CuId int(11) NOT NULL default '0',
  CaId int(11) NOT NULL default '0',
  AId int(11) NOT NULL default '0',
  UId int(11) NOT NULL default '0',
  IpAddress varchar(15) NOT NULL default ''
) engine=Falcon;

Generate data with this python
import random
 
a=[]
for i in xrange(5000000): a.append(i)
random.shuffle(a)
 
for i in a:
  j = i + 1000000
  print "%d,2006-01-01,%d,%d,%d,%d,12.34.56.78" % (i,j,j,j,j)

Load the test data with: 
load data infile 'o.gen' into table C fields terminated by ',';

To repeat:
0) remove Falcon database files
1) start database
2) create table with primary key index defined on ChId
3) load test data
4) wait for Falcon serial log to flush to Falcon data file
5) look at size of Falcon serial log (should be ~417MB)
6) shutdown

0) remove Falcon database files
1) start database
2) create table without primary key index defined
3) load test data
4) alter table C add primary key (ChId);
5) wait for Falcon serial log to flush to Falcon data file
6) look at size of Falcon serial log (should be > 800MB)

mysql> select version();
+--------------------+
| version()          |
+--------------------+
| 5.2.0-falcon-alpha |
+--------------------+

Server built using --without-debug
[17 Jan 2007 18:54] Kevin Lewis
This is to be expected given the way in which the MySQL server does an alter table.  It creates a new table, copies the records, and then deletes the old table.  In falcon, both of these tables are found in the same OS file, which increases in size but does not decrease.  So the end result is an OS file that is about 50% unused.  Innodb would have this same problem.  But MyISAM does not, since it deletes the OS file containing the old table version.
[17 Jan 2007 19:00] Kevin Lewis
There is a pending task for Falcon for the next release.
[17 Jan 2007 19:11] Kevin Lewis
Sorry, the pending task is for Falcon to be able to drop and add an index within the storage engine.
[17 Jan 2007 19:14] Mark Callaghan
Thanks for the explanation, but I am confused. Why does Falcon need to copy (reinsert) all rows from the table to create an index? Falcon tables are not index organized.
[17 Jan 2007 19:22] Kevin Lewis
Falcon does not do it this way, the MySQL server does.  The storage engine gets calls to create a new table with the index, read first record from old table, write record to new table, read next record form old table, write ... , and finally, drop the old table.
[17 Jan 2007 19:31] Mark Callaghan
Are you claiming this is done for all indexes or only for primary key indexes?
[17 Jan 2007 19:42] Kevin Lewis
All indexes.  This is the way MySQL implements CREATE INDEX for those storage engine which do not yet support the call directly.  Falcon will support that in the near future.
[26 Mar 2009 17:34] Hakan Küçükyılmaz
Mark,

with the latest Falcon (to be released as source in few days) I get following scenario.

I used your Python script and modified it to generate 10 mio rows.

Settings:
falcon_record_memory_max = 400M
falcon_page_cache_size = 1000M

hakan@lu0011:~/work/mysql/falcon$ more 25665.py
import random

a=[]
for i in xrange(10000000): a.append(i)
random.shuffle(a)

for i in a:
        j = i + 1000000
        print "%d,2006-01-01,%d,%d,%d,%d,12.34.56.78" % (i,j,j,j,j)

CREATE TABLE t1 (
  ChId bigint(20) NOT NULL default '0',
  Timestamp datetime NOT NULL default '0000-00-00 00:00:00',
  CuId int(11) NOT NULL default '0',
  CaId int(11) NOT NULL default '0',
  AId int(11) NOT NULL default '0',
  UId int(11) NOT NULL default '0',
  IpAddress varchar(15) NOT NULL default ''
) Engine Falcon;

[18:15] root@test>LOAD DATA INFILE '/home/hakan/work/mysql/falcon/10.gen' INTO TABLE t1 FIELDS TERMINATED BY ',';
Query OK, 10000000 rows affected (1 min 31.23 sec)
Records: 10000000  Deleted: 0  Skipped: 0  Warnings: 0

-- shutdown mysqld

root@lu0011:/data/mysql# du -h falcon_*
301M    falcon_master.fl1
169M    falcon_master.fl2
900K    falcon_master.fts
32K     falcon_temporary.fts
537M    falcon_user.fts

-- start mysqld

[18:30] root@test>ALTER TABLE t1 ADD PRIMARY KEY (ChId);
Query OK, 0 rows affected (2 min 43.86 sec)
Records: 0  Duplicates: 0  Warnings: 0

-- shutdown mysqld

root@lu0011:/data/mysql# du -h falcon_*
302M    falcon_master.fl1
190M    falcon_master.fl2
924K    falcon_master.fts
32K     falcon_temporary.fts
604M    falcon_user.fts

I think this is a lot better than before, because we support ONLINE ALTER now in Falcon. What do you think?
[26 Mar 2009 17:44] Hakan Küçükyılmaz
Loading 10 mio rows into the table with the primary key defined gives:

CREATE TABLE t1 (
  ChId bigint(20) NOT NULL default '0',
  Timestamp datetime NOT NULL default '0000-00-00 00:00:00',
  CuId int(11) NOT NULL default '0',
  CaId int(11) NOT NULL default '0',
  AId int(11) NOT NULL default '0',
  UId int(11) NOT NULL default '0',
  IpAddress varchar(15) NOT NULL default '',
  PRIMARY KEY (ChId)
) Engine Falcon;

[18:36] root@test>LOAD DATA INFILE '/home/hakan/work/mysql/falcon/10.gen' INTO TABLE t1 FIELDS TERMINATED BY ',';
Query OK, 10000000 rows affected (3 min 58.60 sec)
Records: 10000000  Deleted: 0  Skipped: 0  Warnings: 0

-- shutdown mysqld

root@lu0011:/data/mysql# du -h falcon_*
139M    falcon_master.fl1
97M     falcon_master.fl2
900K    falcon_master.fts
32K     falcon_temporary.fts
638M    falcon_user.fts
[26 Apr 2009 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[27 Apr 2009 2:53] Kevin Lewis
Moving to Documenting.  Now that Falcon supports this online, ALTER TABLE t1 ADD PRIMARY KEY, it does not create a second table and thus waste a lot of space.