Bug #38043 Deadlock between server and falcon when executing concurrent ALTER + DML
Submitted: 11 Jul 2008 10:50 Modified: 30 Sep 2008 20:36
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0-falcon OS:Any
Assigned to: Christopher Powers CPU Architecture:Any

[11 Jul 2008 10:50] Philip Stoev
Description:
When executing a workload involing ALTER ADd/DROP key and DML, the server hanged with the following output of Falcon stalled():

Stalled threads
  Thread 0xa8602290 (-1475023984) sleep=1, grant=0, locks=3, who 0, parent=(nil)
    Pending StorageTableShare::setSequenceValue state 0 (1) syncObject 0xa8855978
  Thread 0xa8604140 (-1476396144) sleep=0, grant=1, locks=1, who 1, parent=(nil)
Stalled synchronization objects:
  SyncObject a8855978: state -1, readers 0, monitor 0, waiters 1
    Exclusive thread a8604140 (-1476396144), type 1;
    Waiting thread a8602290 (-1475023984), type 1; StorageTableShare::setSequenceValue

Processlist looks like this:

*************************** 2. row ***************************
     Id: 4
   User: root
   Host: localhost:53253
     db: test
Command: Query
   Time: 418
  State: Opening tables
   Info: UPDATE C AS X SET int_nokey = 8 WHERE X . int_key < 211 LIMIT 1
*************************** 3. row ***************************
     Id: 5
   User: root
   Host: localhost:53254
     db: test
Command: Query
   Time: 417
  State: Opening tables
   Info: INSERT INTO D ( int_nokey , pk ) VALUES ( 87 , 75 )
*************************** 4. row ***************************
     Id: 6
   User: root
   Host: localhost:53255
     db: test
Command: Query
   Time: 419
  State: closing tables
   Info: SELECT AVG( Y . int_key ) FROM E AS X LEFT JOIN E AS Y ON ( X . int_key = Y . int_nokey ) WHERE X .
*************************** 5. row ***************************
     Id: 7
   User: root
   Host: localhost:53256
     db: test
Command: Query
   Time: 417
  State: Opening tables
   Info: ALTER  TABLE C DROP KEY c
*************************** 6. row ***************************
     Id: 8
   User: root
   Host: localhost:53257
     db: test
Command: Query
   Time: 418
  State: Waiting for table
   Info: UPDATE D AS X SET int_key = 114 WHERE X . pk < 132 LIMIT 0
*************************** 7. row ***************************
     Id: 9
   User: root
   Host: localhost:53258
     db: test
Command: Query
   Time: 417
  State: update
   Info: INSERT INTO D ( pk , int_key ) VALUES ( 150 , 0 )
*************************** 8. row ***************************
     Id: 10
   User: root
   Host: localhost:53259
     db: test
Command: Query
   Time: 417
  State: Opening tables
   Info: INSERT INTO D ( pk , int_key ) VALUES ( 105 , 32 )
*************************** 9. row ***************************
     Id: 11
   User: root
   Host: localhost:53260
     db: test
Command: Query
   Time: 418
  State: Waiting for table
   Info: ALTER  TABLE D ADD KEY p ( int_nokey )
*************************** 10. row ***************************
     Id: 12
   User: root
   Host: localhost:53261
     db: test
Command: Query
   Time: 417
  State: closing tables
   Info: UPDATE E AS X SET int_key = 5 WHERE X . pk < 173 LIMIT 1
*************************** 11. row ***************************
     Id: 13
   User: root
   Host: localhost:53262
     db: test
Command: Query
   Time: 417
  State: closing tables
   Info: SELECT 237 FROM B AS X LEFT JOIN B AS Y ON ( X . pk = Y . int_nokey ) WHERE X . int_key < 249 GROUP

A threads dump will be attached shortly.

How to repeat:
A test case will be provided if this situation is observed again.
[11 Jul 2008 10:51] Philip Stoev
Output from kill -HUP:

Current locks:
lock: 0xa7da2298:

lock: 0xa7d720e0:

lock: 0xa7d53368:

lock: 0xa7d8ebc8: write
write     : 0x9819830 (9:5);

lock: 0xb72da7f0:

lock: 0xb7197a50:

lock: 0xb7191d10:

lock: 0x97d31cc:

lock: 0x97c8d6c:

lock: 0x97be134:

lock: 0x97b4f94:

lock: 0x9788bac:

lock: 0x979cb14:

lock: 0x9797124:

lock: 0x978fac4:
[11 Jul 2008 10:54] Philip Stoev
Thread stacks for bug 38043

Attachment: bug38043.threads (text/plain), 30.34 KiB.

[11 Jul 2008 11:26] Philip Stoev
This was just repeated with just two threads, processlist is as follows:

     Id: 3
   User: root
   Host: localhost:34224
     db: test
Command: Query
   Time: 150
  State: Opening tables
   Info: SELECT AVG( Y . int_key ) FROM C AS X LEFT JOIN D AS Y ON ( X . int_nokey = Y . int_nokey ) WHERE X
*************************** 3. row ***************************
     Id: 4
   User: root
   Host: localhost:34225
     db: test
Command: Query
   Time: 150
  State: closing tables
   Info: UPDATE E AS X SET pk = 75 WHERE X . pk < 81 LIMIT 6

Notice that none of the query is an ALTER. I suspect that a previous ALTER resulted in a table rename which caused table locking to go bad.

The falcon thread is waiting in StorageInterface::setIndexes, as per the attached thread dumps. The mysql thread is in close_open_tables().
[11 Jul 2008 11:31] Philip Stoev
Grammar file for bug 38043

Attachment: bug38043.yy (application/octet-stream, text), 1.14 KiB.

[11 Jul 2008 11:35] Philip Stoev
To reproduce this issue, please clone the mysql-test-extra-6.0 tree and then execute:

$ cd mysql-test-extra-6.0/mysql-test/gentest
$ perl runall.pl \
  --basedir=/path/to/mysql-6.0-falcon \
  --engine=falcon \
  --grammar=path/to/bug38043.yy \
  --mysqld=--log-output=file \
  --threads=2

the mysql-6.0 tree does not appear to be affected.
[11 Jul 2008 17:25] Kevin Lewis
Chris,  This also seems to be related to WL#4048 Online Add/Drop Index
[14 Jul 2008 5:29] Christopher Powers
Reproduced with SystemQA testcase.

There are two problems:

1. Deadlocks are caused by a pre-existing bug in StorageInterface::setIndexes(), which fails to release an exclusive lock on StorageTableShare::syncObject if StorageTable::setIndex returned an error.

2. Memory corruption is caused by StorageTableShare::getIndex().

Historically, getIndex() updated StorageTableShare::indexes[] with the information of an index being accessed for the first time. The index array was assumed to be of a fixed size, and was allocated during StorageTableShare::open().

For online Add/Drop Index (WL#4048), getIndex() was modified to resize the indexes array if it determined that a new index had been added. The resize operation is not sufficiently protected during concurrent access to the table share, and should instead be performed immediately after createIndex().
[14 Jul 2008 5:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49643

2741 Christopher Powers	2008-07-14
      Bug#38043: Deadlock between server and falcon
      
      Temporarily disable online alter until StorageTableShare::getIndex() is fixed.
[14 Jul 2008 5:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49645

2742 Christopher Powers	2008-07-14
      Bug#38043: Deadlock between server and falcon
      
      Restore warning in results file until online alter fixed
[25 Jul 2008 4:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50471

2745 Christopher Powers	2008-07-24
      Bug#38039   Assertion lockState == 0 failed in StorageTableShare::deleteTable SyncObject
      Bug#38041   Bizarre errors when ALTER ADD/DROP KEY on Falcon tables
      Bug#38043   Deadlock between server and falcon when executing concurrent ALTER + DML
      
      Improvements to online add/drop index:
      
      - StorageTableShare::indexes[] is now a resizeable DenseArray
      - Added sync lock protection to get/set index
      - Call add/drop index from StorageTable rather than StorageTableShare
      - Rename internal Falcon indexes to preserve index name/id mapping after drop
      - Immediately update StorageTableShare::indexes[] after add/drop rather than wait for subsequent open()
[25 Jul 2008 4:48] Christopher Powers
Short version of the testcase passes. Long version (--threads=10 --queries=1000000) results in error described in Bug#38044.
[18 Aug 2008 5:48] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51820

2787 Christopher Powers	2008-08-18
      Bug#38044 Falcon crash in StorageTable::compareKey at StorageTable.cpp:412
      Bug#38043 Deadlock between server and falcon when executing concurrent ALTER + DML
      Bug#38039 Assertion lockState == 0 failed in StorageTableShare::deleteTable SyncObject
      Bug#38041 Bizarre errors when ALTER ADD/DROP KEY on Falcon tables
      
      Improved index synchronization in the StorageInterface:
      - Falcon internal indexes no longer dependent upon server index id
      - Client connections lock StorageTableShare::syncIndexes to protect indexes across calls into StorageInterface
      - Changes to StorageTableShare::indexes are now protected by exclusive lock
[18 Aug 2008 16:56] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51859

2787 Christopher Powers	2008-08-18
      Bug#38044 Falcon crash in StorageTable::compareKey at StorageTable.cpp:412
      Bug#38043 Deadlock between server and falcon when executing concurrent ALTER + DML
      Bug#38039 Assertion lockState == 0 failed in StorageTableShare::deleteTable SyncObject
      Bug#38041 Bizarre errors when ALTER ADD/DROP KEY on Falcon tables
      
      Improved index synchronization in the StorageInterface:
      - Falcon internal indexes no longer dependent upon server index id
      - Client connections lock StorageTableShare::syncIndexes to protect indexes across calls into StorageInterface
      - Changes to StorageTableShare::indexes are now protected by exclusive lock
[18 Aug 2008 17:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51863

2787 Christopher Powers	2008-08-18
      Bug#38044 Falcon crash in StorageTable::compareKey at StorageTable.cpp:412
      Bug#38043 Deadlock between server and falcon when executing concurrent ALTER + DML
      Bug#38039 Assertion lockState == 0 failed in StorageTableShare::deleteTable SyncObject
      Bug#38041 Bizarre errors when ALTER ADD/DROP KEY on Falcon tables
      
      Improved index synchronization in the StorageInterface:
      - Falcon internal indexes no longer dependent upon server index id
      - Client connections lock StorageTableShare::syncIndexes to protect indexes across calls into StorageInterface
      - Changes to StorageTableShare::indexes are now protected by exclusive lock
[20 Aug 2008 8:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/52015

2773 Narayanan V	2008-08-20 [merge]
      WL#4380
      
      Merging mysql-5.1-sea -> mysql-6.0-sea
[28 Aug 2008 4:37] Bugs System
Pushed into 6.0.7-alpha  (revid:cpowers@mysql.com-20080818054529-s9swke0xd15rx1fu) (version source revid:vvaintroub@mysql.com-20080827094944-sh582y3m7duhzrpj) (pib:3)
[14 Sep 2008 4:07] Bugs System
Pushed into 6.0.6-alpha  (revid:cpowers@mysql.com-20080818054529-s9swke0xd15rx1fu) (version source revid:hakan@mysql.com-20080716175219-8unfm2ikarselqpw) (pib:3)
[30 Sep 2008 20:36] Jon Stephens
Documented in the 6.0.6 changelog as follows:

        Executing ALTER TABLE and DML statements concurrently on Falcon tables
        could cause the server to hang.