Bug #25966 | 2MB per second endless memory consumption after LOCK TABLE ... WRITE | ||
---|---|---|---|
Submitted: | 31 Jan 2007 7:03 | Modified: | 16 Mar 2007 17:52 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Locking | Severity: | S1 (Critical) |
Version: | <=5.0.36, 5.1 | OS: | Windows (windows, linux) |
Assigned to: | Dmitry Lenev | CPU Architecture: | Any |
Tags: | bfsm_2007_02_15, bfsm_2007_03_01, lock table write, memory leak |
[31 Jan 2007 7:03]
Shane Bester
[31 Jan 2007 8:29]
MySQL Verification Team
now I find it harder to repeat each time. However it is a bug, as I've encountered it a few times in the past. A clue about the memory consumption: When the problem happened, in the processlist there was 1 SELECT query whose state was continuously changing between values of 'Locked', 'NULL', and 'Opening tables'. So I think it was that query waiting for release of write-lock, allocating memory in a loop probably.
[11 Feb 2007 18:14]
MySQL Verification Team
verified testcase for 5.0BK on windows + linux to cause 1 - 3MB/s memory leak ----------------------------------------------------------------------------- Note: the merge isn't needed, as we've seen this without merge. However, I found it easier to reproduce with the merge in place. #connection 1 drop table if exists t1; drop table if exists m1; create table t1(id int) engine=myisam; create table m1(id int) engine=merge union=(t1) insert_method=last; #connection 2 lock table t1 write; #connection 1 insert into m1 values (1); #connection 2 repair table t1; #now watch memory using task manager or top command. also, check processlist of mysqld, the state of the insert query will be changing from Opening tables to NULL to Locked....
[11 Feb 2007 21:15]
MySQL Verification Team
Additional information: I reproduced this bug on the following versions (didn't have them all installed locally so couldn't test every one) at least 5.1.11, 5.0.36, 5.0.27, 5.0.26, 5.0.24a, 5.0.22, 5.0.13-rc are affected by this memory leak bug. 4.1.21 didn't leak memory, but the cpu was at 100%
[13 Feb 2007 6:43]
MySQL Verification Team
5.0.6 doesn't run out of memory, instead it acts like 4.1.21 and just floors the CPU.
[4 Mar 2007 22:50]
MySQL Verification Team
some debug info. repeated on small tables, using debug build
Attachment: small_repeated_20070304.txt (text/plain), 154.04 KiB.
[4 Mar 2007 22:58]
MySQL Verification Team
dmitri, i caught debug build in debugger. but it didn't leak memory. only floored the cpu in the same code as release binary leaked memory. maybe it's a safemalloc thing - but the problem still shows. the code ends up looping in that for(;;) loop in open_and_lock_tables() for the one thread. See attached as much debug info I could get. I saved a 'core file' using visual studio - hopefully you have access to a windows machine to study it. I repeated this once in 4 hours with release build, and once in 2 two hours with debug build. i will upload the corefile which contains all the local variables for each thread.
[5 Mar 2007 17:14]
Dmitry Lenev
Further investigation shows that the problem which was originally reported occurs in situations when thread which serviced some connection to which KILL command was applied (or its internal equivalent) is reused for servicing other connection (via thread cache) and this new connection has to wait for some table lock. One possible scenario (which probably happens in customer's case) is when thread which provides binlog dump to replication slave is implicitly/automatically killed when the same slave reconnects and starts pulling data through different thread/connection. Possible work around for this problem is to disable thread cache.
[6 Mar 2007 10:38]
Dmitry Lenev
Since the problem that was originally reported (one that involves explicit or implicit KILL + thread cache) and the problem demonstrated by the test case involving REPAIR, LOCK TABLES and merge table, allbeit show different symptoms, have different root causes, I've reported the latter separately as bug #26867. Please use http://bugs.mysql.com/bug.php?id=26867 for its further discussion and to track its progress.
[7 Mar 2007 7:04]
MySQL Verification Team
we have a confirmation that disabling thread_cache_size makes the testcase no longer reproducible on one server.
[13 Mar 2007 21:32]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/21850 ChangeSet@1.2479, 2007-03-14 00:35:18+03:00, dlenev@mockturtle.local +3 -0 Fix for bug #25966 "2MB per second endless memory consumption after LOCK TABLE ... WRITE". When thread which serviced some connection to which KILL command was applied (or its internal equivalent) was reused for servicing some other connection (via thread cache), this new connection started to hog CPU and memory once it had to wait for some table lock. One possible scenario which exposed this problem was when thread which provided binlog dump to replication slave was implicitly/automatically killed when the same slave reconnected and started pulling data through different thread/connection. This problem also occured if one simply killed particular query instead of whole connection and when connection in which query was killed had to wait for some table lock. This problem was caused by the fact that THD::mysys_var::abort member, which is actually is not member of THD but is a sort of thread-specific variable and which indicates that waiting operations on mysys layer should be aborted (this includes waiting for table locks), was set by kill operation but was never reset back. So this value was "inherited" by the following statements or even other connections (which reused the same physical thread). Indeed having this flag set without actual kill operation (and thus without setting THD::killed flag) broke logic on SQL-layer and caused CPU and memory hogging. This patch tries to fix this problem by properly resetting this member. There is no test-case associated with this patch since it is hard to test for memory/CPU hogging conditions in our test-suite.
[15 Mar 2007 7:56]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/21975 ChangeSet@1.2479, 2007-03-15 10:55:03+03:00, dlenev@mockturtle.local +3 -0 Fix for bug #25966 "2MB per second endless memory consumption after LOCK TABLE ... WRITE". Memory and CPU hogging occured when connection which had to wait for table lock was serviced by thread which previously serviced connection that was killed (note that connections can reuse threads if thread cache is enabled). One possible scenario which exposed this problem was when thread which provided binlog dump to replication slave was implicitly/automatically killed when the same slave reconnected and started pulling data through different thread/connection. The problem also occured when one killed particular query in connection (using KILL QUERY) and later this connection had to wait for some table lock. This problem was caused by the fact that thread-specific mysys_var::abort variable, which indicates that waiting operations on mysys layer should be aborted (this includes waiting for table locks), was set by kill operation but was never reset back. So this value was "inherited" by the following statements or even other connections (which reused the same physical thread). Such discrepancy between this variable and THD::killed flag broke logic on SQL-layer and caused CPU and memory hogging. This patch tries to fix this problem by properly resetting this member. There is no test-case associated with this patch since it is hard to test for memory/CPU hogging conditions in our test-suite.
[15 Mar 2007 8:30]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/21977 ChangeSet@1.2618, 2007-03-15 11:30:17+03:00, dlenev@mockturtle.local +1 -0 Fix for bug #25966 "2MB per second endless memory consumption after LOCK TABLE ... WRITE". CPU hogging occured when connection which had to wait for table lock was serviced by thread which previously serviced connection that was killed (note that connections can reuse threads if thread cache is enabled). One possible scenario which exposed this problem was when thread which provided binlog dump to replication slave was implicitly/automatically killed when the same slave reconnected and started pulling data through different thread/connection. In 5.* versions memory hogging was added to CPU hogging. Moreover in those versions the problem also occured when one killed particular query in connection (using KILL QUERY) and later this connection had to wait for some table lock. This problem was caused by the fact that thread-specific mysys_var::abort variable, which indicates that waiting operations on mysys layer should be aborted (this includes waiting for table locks), was set by kill operation but was never reset back. So this value was "inherited" by the following statements or even other connections (which reused the same physical thread). Such discrepancy between this variable and THD::killed flag broke logic on SQL-layer and caused CPU and memory hogging. This patch tries to fix this problem by properly resetting this member. There is no test-case associated with this patch since it is hard to test for memory/CPU hogging conditions in our test-suite.
[15 Mar 2007 8:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/21978 ChangeSet@1.2479, 2007-03-15 11:51:35+03:00, dlenev@mockturtle.local +3 -0 Fix for bug #25966 "2MB per second endless memory consumption after LOCK TABLE ... WRITE". Memory and CPU hogging occured when connection which had to wait for table lock was serviced by thread which previously serviced connection that was killed (note that connections can reuse threads if thread cache is enabled). One possible scenario which exposed this problem was when thread which provided binlog dump to replication slave was implicitly/automatically killed when the same slave reconnected and started pulling data through different thread/connection. The problem also occured when one killed particular query in connection (using KILL QUERY) and later this connection had to wait for some table lock. This problem was caused by the fact that thread-specific mysys_var::abort variable, which indicates that waiting operations on mysys layer should be aborted (this includes waiting for table locks), was set by kill operation but was never reset back. So this value was "inherited" by the following statements or even other connections (which reused the same physical thread). Such discrepancy between this variable and THD::killed flag broke logic on SQL-layer and caused CPU and memory hogging. This patch tries to fix this problem by properly resetting this member. There is no test-case associated with this patch since it is hard to test for memory/CPU hogging conditions in our test-suite.
[15 Mar 2007 21:17]
Dmitry Lenev
Fix pushed into trees marked as 5.0.38, 5.1.17. It was not pushed into 4.1 yet.
[16 Mar 2007 16:54]
Paul DuBois
Noted in 5.0.38, 5.1.17 changelogs. If a thread previously serviced a connection that was killed, excessive memory and CPU use by the thread occurred if it later serviced a connection that had to wait for a table lock. Resetting report to Patch queued pending the push to 4.1.x.
[16 Mar 2007 17:06]
Dmitry Lenev
Fix was pushed into tree tagged as 4.1.23
[16 Mar 2007 17:52]
Paul DuBois
Noted in 4.1.23 changelog.