Bug #44884 | KILL of FLUSH TABLES does not remove TDC version lock on tables in use | ||
---|---|---|---|
Submitted: | 14 May 2009 18:50 | ||
Reporter: | Shawn Green | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Locking | Severity: | S3 (Non-critical) |
Version: | 5.0.51a,5.0.66a,5.1.31sp1 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[14 May 2009 18:50]
Shawn Green
[2 Feb 2010 1:14]
Erik Jones
Hello, I have just run up against this issue and have a simpler way to replicate it that only requires 3 sessions: Prep: Load a large table, we'll call it big_table below. Steps to reproduce: Session 1: SELECT COUNT(*) FROM big_table; Session 2: FLUSH TABLES WITH READ LOCK; -- this will hang, as expected Session 3: SELECT * FROM big_table LIMIT 1; -- this will hang, as expected Session 2: <ctrl-c> At this point the SELECT query in Session 3 should complete since the blocking FLUSH TABLES WITH READ LOCK request is gone, but it does not until the query in Session 1 completes or is killed.
[16 Mar 2010 17:00]
Konstantin Osipov
The synopsis is wrong. There is no such thing as "status flag". FLUSH TABLES increases the table cache version. As such, this operation is instant. Then it waits for all tables with the old versions to go away. This can take a very long time. KILLing a blocked FLUSH TABLES will abort the wait -- but well, to no effect, the FLUSH process has been initiated. Note, that FLUSH TABLES <list> WITH READ LOCK in 5.5.3 does not have this effect.
[5 May 2010 4:18]
Roel Van de Paar
Any updates?
[5 May 2010 7:53]
Konstantin Osipov
This will not be fixed in 5.5.
[21 May 2010 15:15]
Konstantin Osipov
The first thing FLUSH TABLES does it to mark TABLE_SHARE::version of all tables in the cache as invalid. This is done by increasing the global table cache version. Tables with old TABLE_SHARE::version can not be used by new connections/transactions. Killing FLUSH TABLE does not undo invalidation of TABLE_SHARE::version of tables in use: we can't know how many invalidations has happened since and how far we need to undo. When the table is no longer in use it is removed from the cache and receives a new version.
[25 May 2010 3:11]
James Day
Workaround: KILL of the long-running query or queries that are preventing other queries from starting is the workaround for this. Pick the one with longest run time, KILL it, repeat until the problem goes away. Some possible bits of work related to this: 1. Some indication in SHOW PROCESSLIST output for FLUSH TABLES saying "waiting for older queries to end" may be helpful and could be referenced in the manual to reduce confusion. This might be sufficient to eliminate trouble reports, though not the trouble itself. 2. I suppose we could introduce a maximum wait time and KILL queries started before the FLUSH TABLES after that wait has expired. 3. I suppose that flushing only old entries and allowing new entries to be created immediately would be a better long term solution.
[26 May 2010 11:46]
Konstantin Osipov
The bug is a side effect of FLUSH TABLES implementation, which works as intended. Essentially, FLUSH TABLES statement should be seen by all as a combination of two commands: FLUSH TABLES NOWAIT, which is instant, and marks all tables for invalidation, and waiting for the invalidated tables to disappear. A table marked for invalidation can not be used in any connection. The request in this bug report is, essentially, when the second step is killed, to undo the effects of the first step. This is quite hard to accomplish since the two steps are not performed in one atomic block. We're looking at changing the implementation of FLUSH TABLES to only mark tables for flush, and then gradually expel flushed tables one by one. At this point it's unclear how to expel even a single table without disturbing concurrent transactions -- i.e. in a non-blocking fashion. Even when we solve that, however, the solution may have impact on performance of highly loaded and highly concurrent servers, and uncover some up to now unknown side effects, especially since currently FLUSH TABLES leads to ER_LOCK_DEADLOCK errors produced in transactional environments (i.e. simply aborts transactions).
[26 May 2010 11:47]
Konstantin Osipov
(Explaining reasons for setting high risk).
[26 May 2010 14:00]
James Day
What would the effort and work required be to do change what shows up in processlist to tell people what is happening? That should be enough to prevent most reports of this problem, even though it won't actually solve it. Would be a good and probably easy first step and the rest can be done later.
[12 Oct 2018 6:54]
MySQL Verification Team
Is Bug#65777 duplicated of this bug ?