Bug #40265 Falcon: Concurrent online DROP INDEX of the same key causes MySQL assertion
Submitted: 22 Oct 2008 21:54 Modified: 9 Jan 2009 15:03
Reporter: Christopher Powers
Status: Closed
Category:Server: Falcon Severity:S3 (Non-critical)
Version:6.0.7 OS:Any
Assigned to: Christopher Powers Target Version:6.0.8
Tags: F_ONLINE ALTER
Triage: Needs Triage: D2 (Serious)

[22 Oct 2008 21:54] Christopher Powers
Description:
Concurrent online DROP INDEX operations on the same key can result in an assertion in the
server.

Online drop index is a two-phase operation consisting of two calls into Falcon from the
server:

1. Check if the index exits (check_if_supported_alter)
2. Delete the index (alter_table_phase1)

If Step 1 fails, the server resorts to an offline operation.
If Step 1 succeeds, the server performs Step 2.
If Step 2 fails, the server triggers an assertion.

For online DROP INDEX, Step 1 is 'check if the index exists' and Step 2 is 'delete the
index'. 

When multiple clients attempt to drop the same key, all clients may return 'success' in
Step 1.

For Step 2, only one client will succeed. Falcon returns an error for the other clients,
resulting in an assertion.

How to repeat:
1. Install random query generator: 
https://inside.mysql.com/wiki/QARandomQueryGenerationTutorial

2. Run SystemQA falcon_online_alter

runall.pl 
   --basedir=<mysql directory> \
   --engine=Falcon \
   --grammar=conf/falcon_online_alter.yy \
   --threads=10 \
   --queries=100000

A failed drop index will result in an assertion in the server, but not Falcon.

Suggested fix:
If an index is not found during an online DROP INDEX, then do not return an error to the
server.

Specifically, if StorageDatabase::dropIndex() returns StorageErrorNoIndex, then ignore
the error.

Also, rebuild the server/Falcon index map only when drop index is successful, i.e.
StorageTableShare::deleteIndex() should only be called for a 'no error' return code.
[22 Oct 2008 22:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56846

2877 Christopher Powers	2008-10-22
      Bug#40265, "Falcon: Concurrent online DROP INDEX of the same key causes MySQL
assertion"
      
      Improve handling of concurrent online drop index of the same key.
[22 Oct 2008 22:50] Kevin Lewis
Wouldn't it be better to somehow lock the index to delete on the first call so that
subsequent calls to step 1 (Check if the index exits) do not succeed?  This way only one
client will attempt step 2 (Delete the index).
[22 Oct 2008 22:52] Christopher Powers
A bit more explanation:

This is really to address an internal condition in Falcon--kind of a special case.

Online or offline, if the index really does not exist, then the MySQL server will return
an SQL error before calling Falcon.

Online, if the index exists and the client gets past the "does the index exist" query
from the server (check_if_supported_alter), then we get called again with "delete the
index" (alter_table_phase1).

alter_table_phase1() is a do-or-die operation. If it fails for any reason, then the
server asserts. I don't know why--I asked once--but that's what it does.

In this case, alter_table_phase1() ultimately lands in StorageDatabase::dropIndex(),
which issues an internal SQL command to do the work. The SQL command fails if the index
can't be found (Table::findIndex, I think) and returns StorageErrorNoIndex.

I figured that if (1) we know this is an online operation, and (2) we've gotten this far,
and (3) the error is StorageErrorNoIndex, then the drop index request was legit and we
were simply outpaced by another client.

Any error other than StorageErrorNoIndex would result in a failure.

The tricky part of the online alter API is the gap between check_if_supported_alter() and
alter_table_phase1().

The alternative to failing silently (in this case) is to maintain some kind of state
between the two calls--flags or somesuch--which seemed risky and more brittle.
[23 Oct 2008 3:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56853

2878 Christopher Powers	2008-10-22
      Bug#40265, "Falcon: Concurrent online DROP INDEX of the same key causes MySQL
assertion"
      
      Use StorageInterface::alter_table_phase2() to drop index rather than phase1()
      Removed check for primary key in StorageInterface::addIndex() and dropIndex().
[28 Oct 2008 9:10] Bugs System
Pushed into 6.0.8-alpha  (revid:cpowers@mysql.com-20081023012155-b33f43khx53x3ljv)
(version source revid:cpowers@mysql.com-20081023012155-b33f43khx53x3ljv) (pib:5)
[9 Jan 2009 15:03] MC Brown
A note has been added to the 6.0.8 changelog: 

Running an online DROP INDEX operation on an index using the same key on a Falcon table
would fail with an assertion.