Bug #40265 Falcon: Concurrent online DROP INDEX of the same key causes MySQL assertion
Submitted: 22 Oct 2008 19:54 Modified: 9 Jan 2009 14:03
Reporter: Christopher Powers Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0.7 OS:Any
Assigned to: Christopher Powers CPU Architecture:Any
Tags: F_ONLINE ALTER
Triage: Needs Triage: D2 (Serious)

[22 Oct 2008 19:54] Christopher Powers
Description:
Concurrent online DROP INDEX operations on the same key can result in an assertion in the server.

Online drop index is a two-phase operation consisting of two calls into Falcon from the server:

1. Check if the index exits (check_if_supported_alter)
2. Delete the index (alter_table_phase1)

If Step 1 fails, the server resorts to an offline operation.
If Step 1 succeeds, the server performs Step 2.
If Step 2 fails, the server triggers an assertion.

For online DROP INDEX, Step 1 is 'check if the index exists' and Step 2 is 'delete the index'. 

When multiple clients attempt to drop the same key, all clients may return 'success' in Step 1.

For Step 2, only one client will succeed. Falcon returns an error for the other clients, resulting in an assertion.

How to repeat:
1. Install random query generator: 
https://inside.mysql.com/wiki/QARandomQueryGenerationTutorial

2. Run SystemQA falcon_online_alter

runall.pl 
   --basedir=<mysql directory> \
   --engine=Falcon \
   --grammar=conf/falcon_online_alter.yy \
   --threads=10 \
   --queries=100000

A failed drop index will result in an assertion in the server, but not Falcon.

Suggested fix:
If an index is not found during an online DROP INDEX, then do not return an error to the server.

Specifically, if StorageDatabase::dropIndex() returns StorageErrorNoIndex, then ignore the error.

Also, rebuild the server/Falcon index map only when drop index is successful, i.e. StorageTableShare::deleteIndex() should only be called for a 'no error' return code.
[22 Oct 2008 20:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56846

2877 Christopher Powers	2008-10-22
      Bug#40265, "Falcon: Concurrent online DROP INDEX of the same key causes MySQL assertion"
      
      Improve handling of concurrent online drop index of the same key.
[22 Oct 2008 20:50] Kevin Lewis
Wouldn't it be better to somehow lock the index to delete on the first call so that subsequent calls to step 1 (Check if the index exits) do not succeed?  This way only one client will attempt step 2 (Delete the index).
[22 Oct 2008 20:52] Christopher Powers
A bit more explanation:

This is really to address an internal condition in Falcon--kind of a special case.

Online or offline, if the index really does not exist, then the MySQL server will return an SQL error before calling Falcon.

Online, if the index exists and the client gets past the "does the index exist" query from the server (check_if_supported_alter), then we get called again with "delete the index" (alter_table_phase1).

alter_table_phase1() is a do-or-die operation. If it fails for any reason, then the server asserts. I don't know why--I asked once--but that's what it does.

In this case, alter_table_phase1() ultimately lands in StorageDatabase::dropIndex(), which issues an internal SQL command to do the work. The SQL command fails if the index can't be found (Table::findIndex, I think) and returns StorageErrorNoIndex.

I figured that if (1) we know this is an online operation, and (2) we've gotten this far, and (3) the error is StorageErrorNoIndex, then the drop index request was legit and we were simply outpaced by another client.

Any error other than StorageErrorNoIndex would result in a failure.

The tricky part of the online alter API is the gap between check_if_supported_alter() and alter_table_phase1().

The alternative to failing silently (in this case) is to maintain some kind of state between the two calls--flags or somesuch--which seemed risky and more brittle.
[23 Oct 2008 1:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/56853

2878 Christopher Powers	2008-10-22
      Bug#40265, "Falcon: Concurrent online DROP INDEX of the same key causes MySQL assertion"
      
      Use StorageInterface::alter_table_phase2() to drop index rather than phase1()
      Removed check for primary key in StorageInterface::addIndex() and dropIndex().
[28 Oct 2008 8:10] Bugs System
Pushed into 6.0.8-alpha  (revid:cpowers@mysql.com-20081023012155-b33f43khx53x3ljv) (version source revid:cpowers@mysql.com-20081023012155-b33f43khx53x3ljv) (pib:5)
[9 Jan 2009 14:03] MC Brown
A note has been added to the 6.0.8 changelog: 

Running an online DROP INDEX operation on an index using the same key on a Falcon table would fail with an assertion.