Bug #39672 Possible memory leak running ALTER TABLE
Submitted: 26 Sep 2008 14:29 Modified: 9 Jan 2009 13:47
Reporter: Hakan Küçükyılmaz Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Kevin Lewis CPU Architecture:Any
Tags: F_SCAVENGER

[26 Sep 2008 14:29] Hakan Küçükyılmaz
Description:
Possible memory leak running ALTER TABLE.

How to repeat:
CREATE TABLE t1 (a int) Engine MyISAM;
INSERT INTO t1 VALUES (1);
INSERT INTO t1 SELECT * FROM t1;
----
-- Repeat until you have > 10 million rows.
----
ALTER TABLE t1 Engine FALCON;
-- Watch memory usage

Suggested fix:
Don't use that much memory.
[26 Sep 2008 16:44] Hakan Küçükyılmaz
[17:01] root@test>CREATE TABLE t1 (a int) Engine MyISAM;
Query OK, 0 rows affected (0.00 sec)

[17:02] root@test>INSERT INTO t1 VALUES (1);
Query OK, 1 row affected (0.00 sec)

[17:02] root@test>INSERT INTO t1 SELECT * FROM t1;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

[17:05] root@test>INSERT INTO t1 SELECT * FROM t1;
Query OK, 67108864 rows affected (2 min 14.94 sec)
Records: 67108864  Duplicates: 0  Warnings: 0

[17:07] root@test>select count(*) from t1;
+-----------+
| count(*)  |
+-----------+
| 134217728 |
+-----------+
1 row in set (0.13 sec)

[17:08] root@test>ALTER TABLE t1 Engine Falcon;
Query OK, 134217728 rows affected (1 hour 25 min 58.85 sec)
Records: 134217728  Duplicates: 0  Warnings: 0

At this point memory usage went up to 1.2 GB
[26 Sep 2008 16:44] Hakan Küçükyılmaz
Output of 
hakan@lu0011:~$ while (true); do echo "select sum(space_in_use) from information_schema.falcon_system_memory_detail" | mysql -uroot -P9306 -h127.0.0.1 test >> alter_table_memory_leak.txt; sleep 30; done

sum(space_in_use)
85538056
...
...
sum(space_in_use)
125119776
sum(space_in_use)
127448864
sum(space_in_use)
132565232
sum(space_in_use)
135357064
sum(space_in_use)
139317816
sum(space_in_use)
141328888
[26 Sep 2008 16:47] Hakan Küçükyılmaz
[18:46] root@test>select file, line, space_in_use from information_schema.falcon_system_memory_detail order by space_in_use desc;
+------------------------+------+--------------+
| file                   | line | space_in_use |
+------------------------+------+--------------+
| RecordGroup.cpp        |   99 |    650566496 |
| Bitmap.cpp             |  675 |     22019424 |
| SerialLog.cpp          |  190 |     20972064 |
| SectorCache.cpp        |   37 |     20054048 |
| RecordGroup.cpp        |  101 |     11495488 |
| Cache.cpp              |  147 |      4198432 |
| Cache.cpp              |  923 |      2048064 |
| Bitmap.cpp             |  529 |      1097232 |
| DeferredIndex.cpp      |   98 |       524928 |
| Cache.cpp              |  135 |       417832 |
| Index.cpp              |  277 |       198848 |
| Inversion.cpp          |   56 |       160032 |
| Database.cpp           | 1042 |        80264 |
| Cache.cpp              |  105 |        73768 |
| SectorCache.cpp        |   39 |        58600 |
| Dbb.cpp                |  197 |        24024 |
| StorageConnection.cpp  |   94 |        23056 |
| Table.cpp              |  203 |        20064 |
| Table.cpp              |  192 |        18768 |
| Database.cpp           |  505 |        17920 |
| Dbb.cpp                |  381 |        13224 |
| SymbolManager.cpp      |   54 |        10080 |
| Cache.cpp              |  113 |         8408 |
| DenseArray.h           |   36 |         6528 |
| SparseArray.h          |   95 |         6480 |
| Connection.cpp         | 1060 |         4872 |
| Table.cpp              |  843 |         4800 |
| Format.cpp             |   54 |         4584 |
[26 Sep 2008 17:13] Hakan Küçükyılmaz
I used default settings for this test.

./mysql-test-run.pl --start-and-exit
[26 Sep 2008 22:37] Kevin Lewis
In the test Hakan gives, the ALTER statement is creating a Falcon fine with 134217728 records.  By far, the largest 'memory leak' is in RecordGroup.cpp, line 99.  This code allocates a new RecordLeaf object which is used to find records. These objects form a btree where each node contains 100 slots.  The upper nodes are called RecordGroups.  The slot is a pointer, which is 8 bytes on a 64 bit machine and 4 bytes otherwise.   

The listing from information_schema.falcon_system_memory_detail shows 650,566,496 bytes in use, which is 4.8 bytes per record.  On a 64 bit computer, this would be about 1.2 Gb.

I have confirmed this steady memory growth in my own tests.  I did not run it that long, but on my system, the memory allocated at RecordGroup.cpp, line 99 is just over 4 bytes per record entered. 

I also confirmed that the scavenger is working regularly and that the load is broken down into 10,000 record transactions.  The scavenger will traverse all the RecordGroup objects after each scavenge cycle or forced scavenge to look for RecordLeaf objects that are empty.  Then it will delete that object.  This is in fact working.  In my own test, I had a MyISAM file of only 2,097,152 records.  In order to refer to that many records, there must have been at least 20972 RecordLeaf objects.  But at the completion of the ALTER, there were only 16110 RecordLeaf objects totaling 8,506,080 bytes in use.  

So I conclude that the scavenger is working correctly.  But this type of test is leaving too many RecordLeaf objects that are not empty, and thus cannot be scavenged.    I need to investigate this a little further to see if there can be anything done about this.

Note that this test is a worst case scenario.  Tiny records which allow the most number of records in the record cache.  And each of these must be tracked by a RecordLeaf object which has 100 other slots.
[28 Sep 2008 6:55] Hakan Küçükyılmaz
Running falcon_bug_34351_C show steady grow in memory usage. To reproduce please run it like:

 ./mysql-test-run.pl --force --skip-ndb --suite=falcon_team falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C falcon_bug_34351_C
[9 Oct 2008 4:56] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/55860

2859 Kevin Lewis	2008-10-08
      Bug#39672 - Using BitMap::nextSet does not work correctly when you
      clear the previous set bit. It skips a bit every loop.  This caused 
      half of the RecordLeafs to remain in memory.  So use nextSet(0).
      Also, do not spread out the numbers in the emptySections bitmap by
      setting record numbers.  Use section numbers for a tighter bitmap.
[16 Oct 2008 1:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56320

2865 Kevin Lewis	2008-10-15
      Bug#39672 - Using Bitmap::nextSet() failed after searching
      from last bit cleard and crossing multiples of 1024.
      
      To show the problem, I added a unit test that can be 
      activated by "SET GLOBAL FALCON_DEBUG_TRACE=64".  The unit
      test contains 4 different tests that print "** Error" when
      something goes wrong.
[16 Oct 2008 2:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56325

2866 Kevin Lewis	2008-10-15
      Bug#39672 - Using BitMap::nextSet does not work correctly
      when you clear the previous set bit.  This was fixed by
      the other changeset for this bug.  This caused half of 
      the RecordLeafs to remain in memory.  In order to process 
      the bitmap more reliably and make this loop like others 
      in the Falcon code, we use nextSet(0).  
      
      Also, do not spread out the numbers in the emptySections 
      bitmap by setting record numbers.  Use section numbers 
      for a tighter bitmap.
      
      Also, add some reporting features to analyze the RecordLeaf
      objects and their contents.
      
      Also, use RecordSection::anyActiveRecords() instead of
      RecordSection::countActiveRecords() when the count is not 
      important. This will improve speed.
[16 Oct 2008 20:13] Kevin Lewis
Jim provided the fix to the Bitmap class with this comment;
---------------------
Last week Kevin claimed he saw a bug in Bitmap.cpp.  He did, but his description missed the edge case.  This raises two questions.  One is what was the bug.  The other is why it went undiscovered for so long.

The actual bug is in Bitmap::nextSet.  The bitmap is actually a fixed width, variable depth free of bitmap segments, each containing 32 words of 32 bit.  The operating starts by decomposing the start bit number into a vector of indexes, where index[0] is the bit number, index[1] is clump (i.e. word) number, index[2] is the slot into a first level vector of bitmap segments pointers, etc.  The bug is around line 247:

for (index = indexes[lvl]; index < BITMAP_VECTOR_SIZE && !vec [index]; +index)
    ;

where the method is looking for a non-null pointer to a bitmap segment.  In specific, the bug is the failure to reset the bit number and clump number to zero before checking the next bitmap segment.  The code should have been something like:

for (index = indexes[lvl]; index < BITMAP_VECTOR_SIZE && !vec [index]; ++index)
    for (int n = 0; n < lvl; ++n)
        indexes[n] = 0;

The reason why this ugliness hasn't reared it head in the past is a slight different in bitmap searching code.  Other code has typically been of the form:

for (int n = 0; (n = bitmap->nextSet(n)) >= 0; ++n)
    yada yada yada

where next bit number of bumped to avoid a wasted nano-second skipped over an already processed number (also to avoid infinite loop in the non-destructive case).  Chris, I believe, code his loop something like:

   for (int n = 0; (n = bitmap->nextSet(n)) >= 0;)
       ...
       bitmap->clear(n);

Now, if Chris had done a minisculy more efficient coding and / or Kevin had been less diligent in running bugs to earth, we wouldn't have caught this.
--------------------

Also, Jim approved the concept of using sectorNumbers instead of recordNumbers to track bits in Table::emptySections.
[28 Oct 2008 8:10] Bugs System
Pushed into 6.0.8-alpha  (revid:klewis@mysql.com-20081016024008-rltsr5gmmn7pvzxh) (version source revid:hky@sun.com-20081016142045-objh98j6xbjd7xz0) (pib:5)
[9 Jan 2009 13:47] MC Brown
Internal only; No documentation needed.