Bug #34634 Concurrent pagecache_delete() and eviction of the same page cause crash
Submitted: 18 Feb 2008 10:11 Modified: 5 Mar 2008 8:18
Reporter: Guilhem Bichot Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Maria storage engine Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Oleksandr Byelkin CPU Architecture:Any

[18 Feb 2008 10:11] Guilhem Bichot
Description:
Imagine that the block record's code (thread1), during a
maria_delete() for example, calls pagecache_delete() on page X.
And at this moment, thread2 is evicting X (has registered a request,
marked X with PCBLOCK_IN_SWITCH, then has written X to disk and is now
blocked on the page cache's mutex).
Then pagecache_delete() will proceed and call free_block() which will
segfault (because it will unlink_block() but the block is not in the
LRU).

How to repeat:
will attach test case
[18 Feb 2008 10:12] Guilhem Bichot
testcase to crash pagecache

Attachment: bug34634.tar.bz2 (application/x-bzip2, text), 3.14 KiB.

[18 Feb 2008 10:14] Guilhem Bichot
download bug34634.tar.bz2 from the "Files" section, apply
diff to ma_pagecache.c; replace ma_pagecache_single.c of your tree with the one from the tar.bz2, recompile, run "ma_pagecache_single_1k-t --debug". It should segfault. Look at the debug trace it contains tags "BUGINFO" which show what is going wrong.
[22 Feb 2008 8:47] Oleksandr Byelkin
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2008/02/22 10:42:29+02:00 bell@88-214-96-85.dialup.umc.net.ua 
#   Fixed problem of deleting blocks which are removing at the moment.
# 
# storage/maria/ma_pagecache.c
#   2008/02/22 10:42:19+02:00 bell@88-214-96-85.dialup.umc.net.ua +33 -1
#   Avoid deleting blocks which already chosen for deleting.
# 
diff -Nru a/storage/maria/ma_pagecache.c b/storage/maria/ma_pagecache.c
--- a/storage/maria/ma_pagecache.c      2008-02-22 10:45:48 +02:00
+++ b/storage/maria/ma_pagecache.c      2008-02-22 10:45:48 +02:00
@@ -1774,7 +1774,7 @@
   PAGECACHE_HASH_LINK *hash_link;
   PAGECACHE_BLOCK_LINK *block;
   int error= 0;
-  int page_status;
+  int page_status, old_block_status;
 
   DBUG_ENTER("find_block");
   KEYCACHE_THREAD_TRACE("find_block:begin");
@@ -1909,7 +1909,12 @@
       /* Resubmit the request */
       goto restart;
     }
+    old_block_status= block->status;
     block->status&= ~PCBLOCK_IN_SWITCH;
+    if (old_block_status & PCBLOCK_IN_SWITCH)
+      DBUG_PRINT("info", ("BUGINFO block loses PCBLOCK_IN_SWITCH; "
+                          "status %d changes to %d",
+                          old_block_status, block->status));
   }
   else
   {
@@ -2009,9 +2014,13 @@
            ! (block->status & PCBLOCK_IN_SWITCH) )
         {
          /* this is a primary request for a new page */
+          int old_block_status= block->status;
           DBUG_ASSERT(block->wlocks == 0);
           DBUG_ASSERT(block->pins == 0);
           block->status|= PCBLOCK_IN_SWITCH;
+          DBUG_PRINT("info", ("BUGINFO block got PCBLOCK_IN_SWITCH; "
+                              "status %d changes to %d",
+                              old_block_status, block->status));
 
           KEYCACHE_DBUG_PRINT("find_block",
                               ("got block %u for new page",
@@ -3225,6 +3234,15 @@
     if (!pagecache->can_be_used)
       goto end;
 
+    DBUG_ASSERT((block->status & PCBLOCK_IN_SWITCH) == 0);
+    if (block->status & PCBLOCK_REASSIGNED)
+    {
+      DBUG_PRINT("info", ("Block 0x%0lx already is reassigned",
+                          (ulong) block));
+      /* The block (will be | is) flushed and we can't prevent it */
+      error= !flush;
+      goto end;
+    }
     if (make_lock_and_pin(pagecache, block, lock, pin))
     {
       /*
@@ -3336,6 +3354,15 @@
       pagecache_pthread_mutex_unlock(&pagecache->cache_lock);
       DBUG_RETURN(0);
     }
+    if (block->status & (PCBLOCK_REASSIGNED | PCBLOCK_IN_SWITCH))
+    {
+      DBUG_PRINT("info", ("Block 0x%0lx already is reassigned or in switch",
+                          (ulong) block));
+      /* The block (will be | is) flushed and we can't prevent it */
+      error= !flush;
+      page_link->requests--;
+      goto end;
+    }
     block= page_link->block;
     /* See NOTE for pagecache_unlock about registering requests. */
     if (pin == PAGECACHE_PIN)
@@ -3348,6 +3375,7 @@
         lock is released, we will try to get the block again.
       */
       pagecache_pthread_mutex_unlock(&pagecache->cache_lock);
+      page_link->requests--;
       DBUG_PRINT("info", ("restarting..."));
       goto restart;
     }
@@ -3750,10 +3778,14 @@
   KEYCACHE_THREAD_TRACE("free block");
   KEYCACHE_DBUG_PRINT("free_block",
                       ("block is freed"));
+  if (block->requests > 1)
+    DBUG_PRINT("info",("BUGINFO block->requests is %d", block->requests));
   unreg_request(pagecache, block, 0);
   block->hash_link= NULL;
 
   /* Remove the free block from the LRU ring. */
+  if (block->next_used == NULL)
+    DBUG_PRINT("info",("BUGINFO block->next_used is NULL"));
   unlink_block(pagecache, block);
   if (block->temperature == PCBLOCK_WARM)
     pagecache->warm_blocks--;
[25 Feb 2008 9:48] Guilhem Bichot
review sent by mail
[3 Mar 2008 8:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43294

ChangeSet@1.2616, 2008-03-03 10:25:19+02:00, bell@desktop.sanja.is.com.ua +1 -0
  Fixed problem of deleting blocks which are being evicted at
  the moment. (BUG#34634)
[4 Mar 2008 12:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43366

ChangeSet@1.2616, 2008-03-04 13:58:50+02:00, bell@desktop.sanja.is.com.ua +1 -0
  Fixed problem of deleting blocks which are being evicted at
  the moment. (BUG#34634)
  Fixed potential bug in pinning schema.
[4 Mar 2008 12:05] Guilhem Bichot
Patch is approved. Another related problem found by Sanja remains (for the case of make_lock_and_pin() failing), so I leave the bug in "patch pending" state.
[4 Mar 2008 14:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43390

ChangeSet@1.2616, 2008-03-04 16:38:29+02:00, bell@desktop.sanja.is.com.ua +1 -0
  Fixed problem of deleting blocks which are being evicted at
  the moment. (BUG#34634)
  Fixed potential bug in pinning schema.
[4 Mar 2008 21:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43430

ChangeSet@1.2616, 2008-03-04 23:12:19+02:00, bell@desktop.sanja.is.com.ua +1 -0
  Fixed problem of deleting blocks which are being evicted at
  the moment. (BUG#34634)
  Fixed potential bug in pinning schema.