Bug #36700 | Running falcon_bug_28048 shows increasing memory usage and run time | ||
---|---|---|---|
Submitted: | 13 May 2008 21:26 | Modified: | 7 May 2009 16:20 |
Reporter: | Hakan Küçükyılmaz | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S2 (Serious) |
Version: | OS: | Linux | |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
Tags: | F_SCAVENGER, timeout |
[13 May 2008 21:26]
Hakan Küçükyılmaz
[13 May 2008 21:27]
Hakan Küçükyılmaz
oprofile report running this test case: root@lu0011:/usr/local/src/oprofile-0.9.3# /usr/local/bin/opreport --merge=tgid --symbols --threshold 0.5 /home/hakan/work/mysql/mysql-6.0-falcon-team/sql/mysqld warning: [vdso] (tgid:22655 range:0x7fffc13fd000-0x7fffc13ff000) could not be found. CPU: Core 2, speed 2333 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % symbol name 1700706 19.7339 SyncObject::lock(Sync*, LockType, int) 1627218 18.8812 Record::addRef() 882786 10.2433 SyncObject::unlock(Sync*, LockType) 605266 7.0231 Record::poke() 518096 6.0117 Table::fetch(int) 487969 5.6621 RecordVersion::fetchVersionRecursive(Transaction*) 283922 3.2945 Context::fetchIndexed(Nfs::Statement*) 281339 3.2645 RecordVersion::fetchVersion(Transaction*) 267140 3.0997 RecordVersion::getTransactionId() 266918 3.0971 IndexRootPage::scanIndex(Dbb*, int, int, IndexKey*, IndexKey*, int, unsigned int, Bitmap*) 246691 2.8624 Record::release() 210897 2.4471 RecordLeaf::fetch(int) 191311 2.2199 Bitmap::nextSet(int) 171830 1.9938 Bitmap::set(int) 150635 1.7479 Transaction::getRelativeState(Record*, unsigned int) 142155 1.6495 Sync::lock(LockType) 134698 1.5629 Sync::~Sync() 96583 1.1207 Sync::Sync(SynchronizationObject*, char const*) 77957 0.9046 Bitmap::decompose(int, unsigned int*) 77457 0.8988 Table::checkUniqueRecordVersion(int, Index*, Transaction*, RecordVersion*, Sync*)
[19 May 2008 22:56]
MySQL Verification Team
Thank you for the bug report.
[21 May 2008 15:59]
Kevin Lewis
Olav Sandstrå, Please look into this.
[14 Jul 2008 16:01]
Hakan Küçükyılmaz
Still shows increasing run time and memory usage. Also leads to time outs on Pushbuild Windows machines. I moved the test to falcon_team suite. Setting to P1, because it is a test case regression.
[14 Jul 2008 16:48]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/49697 2745 Hakan Kuecuekyilmaz 2008-07-14 Moved out falcon_bug_28048.test to falcon_team suite due to Bug#36700.
[14 Jul 2008 17:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/49700 2745 Hakan Kuecuekyilmaz 2008-07-14 Moved out falcon_bug_28048.test to falcon_team suite due to Bug#36700.
[26 Sep 2008 16:08]
John Embretsen
This test still shows increasing run time and memory usage. Keeping the test in falcon_team suite until things improve.
[17 Nov 2008 14:44]
Kevin Lewis
This bug shows the same increase in the size of the record cache as Bug#34893. That is due to the inability of the Falcon scavenger to prune old invisible records from the record cache until the scavenger threshold is reached. I am working on that now. This bug has the additional complaint that the performance of the test gets worse over time. This may be due to the proliferation of unscavenged and un-garbage-collected stuff hanging around, especially in the system tables, since the file is created and dropped thousands of times. So once the scavenger is pruning invisible records more efficiently, I will re-test the performance aspect.
[5 Dec 2008 19:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/60770 2927 Kevin Lewis 2008-12-05 Bug#36700 & Bug#34893 Improved Scavenger. This patch is the first collection of the changes needed to improve the scavenger. They include; 1) Separate the two tasks involved in scavenging into A. Pruning old invisible record versions. 1. As part of this task, a survey is taken of records in age groups so that the next step can be done in one pass B. Retiring least-recently-used records from the record cache 2) Watch how much records space is added to the record cache and start a new age group at regular intervals. 3) Put the scavenger into a separate thread that can be signaled when needed. Future and known issues; (need input) 1) Adjust the default scavenge threshold and floor to a higher percentage of the record cache. 2) Adjust the current dependency upon the memory manager to report how much memory is allocated at an address. Each record should know this. 3) During the scavenge, the current age group is not incremented since the memory is being both scavenged and added to by client threads. So those age groups can have 2 to 3 times more records in them.
[7 Jan 2009 15:43]
Kevin Lewis
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/62571 2958 Kevin Lewis 2009-01-07 Bug#34893, Bug#36700, Bug#40342, Bug#40801 New scavenger code, See WL#4606, WL#4477 & WL#4478 for more info. This is the second full patch of the new scavenger. 1) Separate the two tasks involved in scavenging into A. Pruning old invisible record versions. 1. As part of this task, a survey is taken of records in age groups so that the next step can be done in one pass B. Retiring least-recently-used records from the record cache 2) Watch how much record space is added to the record cache and start a new age group at regular intervals. 3) Put the scavenger into a separate thread that can be signaled when needed. This patch contains many improvements over the last patch including; 1) forcedRecordScavenge is now signalScavenge and the loop is increased to 5 times attempting to get memory from the record cache. 2) Unused code is deleted 3) Integrated with new dependency manager code so that it now uses Transaction::commitId to determine if a record is old enough to be scavenged.
[11 Jan 2009 10:47]
Hakan Küçükyılmaz
I still see a steady increase in run time and memory usage. Saving snapshot of installed databases ======================================================= TEST RESULT TIME (ms) ------------------------------------------------------- falcon.falcon_bug_28048 [ pass ] 4573 falcon.falcon_bug_28048 [ pass ] 6349 falcon.falcon_bug_28048 [ pass ] 11333 falcon.falcon_bug_28048 [ pass ] 6474 ... ... ... falcon.falcon_bug_28048 [ pass ] 18362 falcon.falcon_bug_28048 [ pass ] 18749 falcon.falcon_bug_28048 [ pass ] 19029 falcon.falcon_bug_28048 [ pass ] 18059 falcon.falcon_bug_28048 [ pass ] 18863 falcon.falcon_bug_28048 [ pass ] 19005 Memory usage started at around 200MB and ended at around 400MB.
[13 Jan 2009 8:39]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/63064 2973 Kevin Lewis 2009-01-13 Bug#36700 - Found while running the new scavenger with falcon_bug_28048. The variable neededByAnyActiveTrans in RecordVersion::retire() was set incorrectly. So retireRecords did not work correctly.
[13 Jan 2009 8:51]
Kevin Lewis
Hakan, I ran this test 50 times and the run times did not increase; TEST RESULT TIME (ms) ------------------------------------------------------- falcon.falcon_bug_28048 [ pass ] 39984 falcon.falcon_bug_28048 [ pass ] 39750 falcon.falcon_bug_28048 [ pass ] 37937 falcon.falcon_bug_28048 [ pass ] 36094 falcon.falcon_bug_28048 [ pass ] 37485 falcon.falcon_bug_28048 [ pass ] 35469 falcon.falcon_bug_28048 [ pass ] 35937 falcon.falcon_bug_28048 [ pass ] 37171 falcon.falcon_bug_28048 [ pass ] 35734 falcon.falcon_bug_28048 [ pass ] 35297 falcon.falcon_bug_28048 [ pass ] 38235 falcon.falcon_bug_28048 [ pass ] 34547 falcon.falcon_bug_28048 [ pass ] 34891 falcon.falcon_bug_28048 [ pass ] 34547 falcon.falcon_bug_28048 [ pass ] 34266 falcon.falcon_bug_28048 [ pass ] 34781 falcon.falcon_bug_28048 [ pass ] 35594 falcon.falcon_bug_28048 [ pass ] 34313 falcon.falcon_bug_28048 [ pass ] 36281 falcon.falcon_bug_28048 [ pass ] 35328 falcon.falcon_bug_28048 [ pass ] 35172 falcon.falcon_bug_28048 [ pass ] 36203 falcon.falcon_bug_28048 [ pass ] 36156 falcon.falcon_bug_28048 [ pass ] 36171 falcon.falcon_bug_28048 [ pass ] 35812 falcon.falcon_bug_28048 [ pass ] 35938 falcon.falcon_bug_28048 [ pass ] 35672 falcon.falcon_bug_28048 [ pass ] 36422 falcon.falcon_bug_28048 [ pass ] 36703 falcon.falcon_bug_28048 [ pass ] 36531 falcon.falcon_bug_28048 [ pass ] 36360 falcon.falcon_bug_28048 [ pass ] 34625 falcon.falcon_bug_28048 [ pass ] 37468 falcon.falcon_bug_28048 [ pass ] 35109 falcon.falcon_bug_28048 [ pass ] 35297 falcon.falcon_bug_28048 [ pass ] 35656 falcon.falcon_bug_28048 [ pass ] 34657 falcon.falcon_bug_28048 [ pass ] 35906 falcon.falcon_bug_28048 [ pass ] 36078 falcon.falcon_bug_28048 [ pass ] 36093 falcon.falcon_bug_28048 [ pass ] 35484 falcon.falcon_bug_28048 [ pass ] 35625 falcon.falcon_bug_28048 [ pass ] 35891 falcon.falcon_bug_28048 [ pass ] 35297 falcon.falcon_bug_28048 [ pass ] 36109 falcon.falcon_bug_28048 [ pass ] 38047 falcon.falcon_bug_28048 [ pass ] 36656 falcon.falcon_bug_28048 [ pass ] 36640 falcon.falcon_bug_28048 [ pass ] 37657 falcon.falcon_bug_28048 [ pass ] 37735 ------------------------------------------------------- Stopping All Servers All 50 tests were successful. The servers were restarted 1 times Spent 1806.811 of 1851 seconds executing testcases Even though the run times held steady, the memory consumed by mysqld increased from 42Mb to 83.7 Mb during the 50 runs. 38Mb of that was due to new RecordVersion and new RecordLeaf objects which reflect an increased record cache. FILE LINE OBJECTS_IN_USE SPACE_IN_USE Table.cpp 3,620 343,901 38,529,536 RecordGroup.cpp 99 3,438 1,815,264 After debugging a call to Database::pruneRecords() and its call to RecordScavenge::inventoryRecord(), I found that all these unscavenged RecordVersion objects are deleted records in the FILES table. Each run of this test creates table t1 1000 times and drops it 1000 times. It looks like the scavenger is not getting rid of old deleted records during the 'pruning' stage. But it will scavenge these records during the 'retire' stage when the falcon_record_scavenge_threshold is reached. (and the patch just committed in this bug is used). So this increase in record cache size is due to old deleted records in the FILES system table. It is not a high priority problem because these records do get retired eventually. The low priority task/improvement is to scavenge these records during pruneRecords(). But this task may not be worth doing. We need to consider whether pruneRecords() should be able to retire deleted records whose transactions are no longer visible to any active transactions. Hakan, please retry this with falcon-team to see if the run times still increase for you. If so, how many runs do you need to do? Also, set the record cache to about 25Mb. As noted above, 50 runs will create about 38Mb of these deleted records. A 25Mb record cache will cause Database:retireRecords() to scavenge them. You should see the memory use by mysqld stabilize after around 22 runs if falcon_record_scavenge_floor=70. Use this to track the size of these records; use information_schema; select * from FALCON_RECORD_CACHE_DETAIL;
[13 Jan 2009 9:54]
Hakan Küçükyılmaz
The increase in run time shows after 100 - 150 runs ... I will test your latest patch right now.
[13 Jan 2009 13:23]
Hakan Küçükyılmaz
I tested with non-debug build and I run > 300 times with ./mysql-test-run.pl --force --mysqld=--falcon_record_memory_max=25MB --suite=falcon falcon_bug_28048 ... Memory usage was stable around 300 - 302 MB and run time was between 5 - 7 seconds. Using debug build and falcon_record_memory_max=25MB memory usage is stable around 228 - 230 MB and run time between 13 - 16 seconds.
[13 Feb 2009 7:25]
Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version source revid:olav@sun.com-20090113103017-41jbad7qlvlwpwxh) (merge vers: 6.0.10-alpha) (pib:6)
[7 May 2009 16:20]
MC Brown
Internal/test fix. No changelog entry required.