Description:
The problem happens often during a stress test involving concurrent DML/DDL flow. The steps described below, however, are executed strictly consequently.
- thread A locks a table with READ lock;
- thread B connects to the server;
- thread B attempts to ALTER (rename) the table and starts waiting for the lock;
- after a while, thread A kills thread B;
- thread A waits until thread B is shown as 'Killed' in I_S.processlist;
- thread A unlocks the table;
=> the table gets renamed by thread B.
It does not happen always, so it's apparently a race condition of some kind.
The manual says about the kill flag (http://dev.mysql.com/doc/refman/5.5/en/kill.html):
"During ALTER TABLE, the kill flag is checked before each block of rows are read from the original table. If the kill flag was set, the statement is aborted and the temporary table is deleted."
Since thread A waits till thread B is flagged as 'killed' before unlocking the table, and since thread B apparently should not have even started the job, since the table was already locked by the time it connected, I assume something wrong is happening.
Notes:
More activity happening in parallel in the test case but not revealing any obvious problem:
- thread C behaves in the same way as thread B, only it attempts to add a column to the table rather than rename it; it never finishes the operation;
- thread D performs some unrelated DDL activity (e.g. creates/drops another table).
In the original test case the table which gets renamed is a child of a merge table, and thread A actually locks the parent table, while thread B attempts to rename the child. It will be investigated whether this configuration is important for reproducing the problem, or if it also happens with regular locked tables.
How to repeat:
The test case (simplified stress test) will be provided shortly.