Bug #41840 Unbounded tablespace growth during recovery
Submitted: 3 Jan 2009 13:57 Modified: 15 May 2009 14:11
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0-falcon-team OS:Any
Assigned to: Vladislav Vaintroub CPU Architecture:Any
Tags: F_RECOVERY
Triage: Triaged: D2 (Serious)

[3 Jan 2009 13:57] Philip Stoev
Description:
When recovering from the attached tablespace and logs, the tablespace proceeds to grow to many times its original size. The recovery thread is in the following position:

#0  0x000000315b00e6d7 in fsync () from /lib64/libpthread.so.0
#1  0x00000000008bcfac in IO::pwrite (buffer=<value optimized out>, length=<value optimized out>, offset=<value optimized out>, this=<value optimized out>)
    at IO.cpp:651
#2  IO::writePages (this=0x7f862de94108, pageNumber=<value optimized out>, length=<value optimized out>, data=<value optimized out>, type=1) at IO.cpp:364
#3  0x0000000000883480 in Cache::writePage (this=0x7f862e0b2ff0, bdb=0x7f862e0c8bd8, type=1) at Cache.cpp:556
#4  0x000000000088395c in Cache::findBuffer (this=0x7f862e0b2ff0, dbb=0x7f862de94100, pageNumber=67226, lockType=Exclusive) at Cache.cpp:445
---Type <return> to continue, or q <return> to quit---
#5  0x0000000000883b5b in Cache::fakePage (this=0x7f862e0b2ff0, dbb=0x7f862de94100, pageNumber=67226, type=PAGE_data_overflow, transId=117987)
    at Cache.cpp:365
#6  0x000000000089c131 in Dbb::allocPage (this=0x6, pageType=769507328, transId=2) at Dbb.cpp:249
#7  0x00000000008e63ab in Section::storeTail (this=0x7f862a4c2780, stream=0x7fff375a2080, maxRecord=15, pLength=0x7fff375a1f6c, transId=117987,
    earlyWrite=false) at Section.cpp:755
#8  0x0000000000924607 in DataPage::updateRecord (this=0x7f862dc44000, section=0x7f862a4c2780, lineNumber=111, stream=0x7fff375a2080, transId=117987,
    earlyWrite=false) at DataPage.cpp:118
#9  0x00000000008e7831 in Section::updateRecord (this=0x7f862a4c2780, recordNumber=<value optimized out>, stream=0x7fff375a2080, transId=117987,
    earlyWrite=false) at Section.cpp:564
#10 0x000000000089be85 in Dbb::updateRecord (this=0x7f862de94100, sectionId=<value optimized out>, recordId=13430, stream=0x7fff375a2080, transId=117987,
    earlyWrite=false) at Dbb.cpp:350
#11 0x000000000090d762 in SRLUpdateRecords::redo (this=0x7fff375a2b98) at SRLUpdateRecords.cpp:337
#12 0x00000000008f2cb7 in SerialLog::recover (this=0x7f862e1ccb48) at SerialLog.cpp:368
#13 0x0000000000897202 in Database::openDatabase (this=0x7f862e08abb8, filename=0x7fff375a2fd0 "/build/bugs/master-data-copy/falcon_master.fts")
    at Database.cpp:733
#14 0x000000000088b463 in Connection::getDatabase (this=0x7f862de89700, dbName=0x7f862de8937c "FALCON_MASTER",
    dbFileName=0x7fff375a2fd0 "/build/bugs/master-data-copy/falcon_master.fts", threads=0x7f862de893d8) at Connection.cpp:1651
#15 0x000000000088cab4 in Connection::openDatabase (this=0x7f862de89700, dbName=0x7f862de8937c "FALCON_MASTER", filename=<value optimized out>,
    account=0xba8331 "mysql", password=0xba8331 "mysql", address=0x0, parent=0x7f862de893d8) at Connection.cpp:934
#16 0x0000000000850e27 in StorageDatabase::getOpenConnection (this=0x7f862de891f0) at StorageDatabase.cpp:136
#17 0x0000000000853a06 in StorageHandler::initialize (this=0x7f862e08a040) at StorageHandler.cpp:987
#18 0x0000000000848c76 in StorageInterface::falcon_init (p=<value optimized out>) at ha_falcon.cpp:245
#19 0x00000000007535af in ha_initialize_handlerton (plugin=0x350b778) at handler.cc:440
#20 0x00000000007daa02 in plugin_initialize (plugin=0x6) at sql_plugin.cc:1006
#21 0x00000000007dd974 in plugin_init (argc=0x10ac688, argv=0x34eab78, flags=<value optimized out>) at sql_plugin.cc:1213
#22 0x0000000000657f79 in init_server_components () at mysqld.cc:4112
#23 0x000000000065b008 in main (argc=<value optimized out>, argv=<value optimized out>) at mysqld.cc:4574

How to repeat:
The tablespace will be provided shortly.
[25 Jan 2009 18:59] Philip Stoev
Here is the log of one such recovery:

# 20:48:05 New table space FALCON_USER, id 1, type 0, filename falcon_user.fts
# 20:48:05 New table space FALCON_TEMPORARY, id 2, type 0, filename falcon_temporary.fts
# 20:48:05 Recovering database /tmp/vardir/master-data_recovery/falcon_master.fts ...
# 20:48:06 /nFirst recovery block is 5696
# 20:48:06 Last recovery block is 7582
# 20:48:06 Recovery read block is 6886
# 20:48:06 Recovery phase 1...
# 20:48:06 Processed:    50913
# 20:48:06
# 20:48:06 Recovery phase 2...
# 20:48:06 Processed:    50913
# 20:48:06
# 20:48:06 Recovery phase 3...
# 20:48:06 === Scavenge Cycle 1 - FALCON_MASTER - 0 seconds
# 20:48:06 4: Activity on /tmp/vardir/master-data_recovery/falcon_master.fts: 10 fetches, 5 reads, 0 writes, 1 flushWrites
# 20:48:06 4: Activity on /tmp/vardir/master-data_recovery/falcon_temporary.fts: 0 fetches, 1 reads, 0 writes, 1 flushWrites
# 20:48:06 4: Activity on /tmp/vardir/master-data_recovery/falcon_user.fts: 8878 fetches, 32 reads, 0 writes, 113 flushWrites
# 20:55:26 Processed:    50913
# 20:55:26 Recovery complete
# 20:55:26 443: Cache flush: 106 pages, 32 writes in 0 seconds (106 pps)
# 20:55:26 443: Commit System Transaction 2
# 20:55:26 443: Commit System Transaction 4
# 20:55:26 443: CommitNoUpdates transaction 6
# 20:55:26 === Scavenge Cycle 2 - FALCON_MASTER - 0 seconds
# 20:55:26 443: Transactions: 3 committed, 0 rolled back, 1 active, 8 available, 2 post-commit, oldest 0 seconds
# 20:55:26 443: SerialLog: 0 reads, 4 writes, 0 transactions, 0 completed, 0 stalls, 2 blocks, 1 windows
# 20:55:26 443: Activity on /tmp/vardir/master-data_recovery/falcon_master.fts: 2668 fetches, 71 reads, 0 writes, 9 flushWrites
# 20:55:26 443: Activity on /tmp/vardir/master-data_recovery/falcon_user.fts: 897340 fetches, 1155 reads, 0 writes, 38880 flushWrites

Notice "897340 fetches, 1155 reads, 0 writes, 38880 flushWrites". Is this OK?
[25 Jan 2009 19:31] Vladislav Vaintroub
You did not say what you have done before you killed mysqld. The answer to "is this OK?" is then maybe it is ok and it depends on your workload. If you have not killed the app but just stopped the activity of the client application, it alsocould take hours before mysqld stops writing. This happens when gopher threads lag behind the client threads, which is presumably in your case. recovery has to recover ~51000 log records, so it might take both time and space
[26 Jan 2009 16:08] Philip Stoev
This was observed when running this RQG command:

$ perl runall.pl \
  --mysqld=--falcon-page-size=8K \
  --vardir=/tmp/vardir \
  --rows=1000 \
  --threads=32 \
  --mask=8 \
  --queries=1000000 \
  --duration=300 \
  --basedir=/build/bzr/6.0-falcon-team \
  --engine=Falcon \
  --grammar=conf/combinations.yy \
  --gendata=conf/combinations.zz \
  --validator=DatabaseConsistency \
  --reporter=Deadlock,ErrorLog,Backtrace,Recovery \
  --mysqld=--loose-falcon-lock-wait-timeout=1 \
  --mysqld=--log-output=none

Note that the --mask parameter means that only DELETEs are being generated.

Data directory before recovery looks like this:

-rw-rw----  1 philips philips  75492352 2009-01-03 18:29 falcon_master.fl1
-rw-rw----  1 philips philips  86007808 2009-01-03 18:29 falcon_master.fl2
-rw-rw----  1 philips philips   1835008 2009-01-03 18:29 falcon_master.fts
-rw-rw----  1 philips philips     65536 2009-01-03 18:29 falcon_temporary.fts
-rw-rw----  1 philips philips 144130048 2009-01-03 18:29 falcon_user.fts

the tablespace is 144 mb, log files are about 80mb each. After recovery:

-rw-rw----  1 philips philips   75492352 2009-01-03 18:28 falcon_master.fl1
-rw-rw----  1 philips philips   86007808 2009-01-03 18:28 falcon_master.fl2
-rw-rw----  1 philips philips    1835008 2009-01-03 18:28 falcon_master.fts
-rw-rw----  1 philips philips      65536 2009-01-03 18:29 falcon_temporary.fts
-rw-rw----  1 philips philips 3103191040 2009-01-03 18:29 falcon_user.fts

Log files unchanged, the falcon_user tablespace is now 3.1 gigabytes.
[26 Jan 2009 16:29] Philip Stoev
Another result from a grammar that uses transaction, update, select, replace, delete:

$ perl runall.pl \
  --mysqld=--falcon-page-size=8K \
  --vardir=/tmp/vardir \
  --rows=100 \
  --threads=16 \
  --mask=59 \
  --queries=1000000 \
  --duration=300 \
  --basedir=/build/bzr/6.0-falcon-team \
  --engine=Falcon \
  --grammar=conf/combinations.yy \
  --gendata=conf/combinations.zz \
  --reporter=Deadlock,ErrorLog,Backtrace,Recovery \
  --mysqld=--loose-falcon-lock-wait-timeout=1 \
  --mysqld=--log-output=none

Before recovery, the tablespace is 309 mb.
 -rw-rw---- 1 philips philips  73779712 2009-01-03 17:15 falcon_master.fl1
 -rw-rw---- 1 philips philips  70248960 2009-01-03 17:15 falcon_master.fl2
 -rw-rw---- 1 philips philips   1835008 2009-01-03 17:15 falcon_master.fts
 -rw-rw---- 1 philips philips     65536 2009-01-03 17:15 falcon_temporary.fts
 -rw-rw---- 1 philips philips 309608448 2009-01-03 17:15 falcon_user.fts

After recovfery, it is 2.9 gigabytes

-rw-rw---- 1 philips philips   73779712 2009-01-03 17:14 falcon_master.fl1
-rw-rw---- 1 philips philips   70248960 2009-01-03 17:14 falcon_master.fl2
-rw-rw---- 1 philips philips    1835008 2009-01-03 17:14 falcon_master.fts
-rw-rw---- 1 philips philips      65536 2009-01-03 17:15 falcon_temporary.fts
-rw-rw---- 1 philips philips 2973466624 2009-01-03 17:15 falcon_user.fts
[20 Apr 2009 18:41] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72525

3129 Vladislav Vaintroub	2009-04-20
      Bug #41840 Unbounded tablespace growth during recovery
      Sometimes, after recovery the tablespaces get huge, orders of
      magnitude larger than original tablespace.
      
      The problem was that page overflow pages were not deleted 
      during recovery. If a record was modified several times during 
      recovery, the overflow pages that were created during recovery
      also leaked.
      
      The fix is to remove overflow pages , when it is possible.
      - remove all SRLOverflowPages found in course of recovery
      - just like during normal runtime,when record gets modified in 
      recovery,  remove overflow pages if the removal is safe.
      Safe means,  that page referenced is a valid dataoverflow page.
      Validity is guaranteed, if page is either not touched by recovery, 
      or the page was allocated during recovery and not yet freed.
[21 Apr 2009 10:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72569

3131 Vladislav Vaintroub	2009-04-21
      Addendum to Bug #41840 Unbounded tablespace growth during recovery
      
      Problem: blob recovery might access invalid (free) page
      The reason is that page allocations are not recorded during recovery and recovery might assume that the last state (incarnation) of the page was "free", even if a piece of blob was written to it.
      Solution is to record blob pages (data or overflow pages or written with earlyWrite flag) but do nothing in recovery except incrementing the page incarnation. This way the page cannot not be accidentially freed.
[21 Apr 2009 10:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72572

3131 Vladislav Vaintroub	2009-04-21
      Addendum to Bug #41840 Unbounded tablespace growth during recovery
      
      Problem: blob recovery might access invalid (free) page
      The reason is that page allocations are not recorded during recovery and recovery might assume that the last state (incarnation) of the page was "free", even if a piece of blob was written to it.
      Solution is to record blob pages (data or overflow pages or written with earlyWrite flag) but do nothing in recovery except incrementing the page incarnation. This way the page cannot not be accidentially freed.
[23 Apr 2009 7:22] Bugs System
Pushed into 6.0.11-alpha (revid:alik@sun.com-20090423071213-afmyrzvolemph7mz) (version source revid:hky@sun.com-20090421195958-j33v1cuo3yer9niu) (merge vers: 6.0.11-alpha) (pib:6)
[15 May 2009 14:11] MC Brown
An entry has been added to the 6.0.11 changelog: 

When creating an index on a Falcon table with a very large dataset, mysqld would crash.