Bug #42279 Race condition in btr_search_drop_page_hash_when_freed()
Submitted: 22 Jan 2009 20:09 Modified: 19 Jun 2010 17:50
Reporter: Marko Mäkelä Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.0, 5.1, 4.1 OS:Any
Assigned to: Satya B CPU Architecture:Any
Tags: innodb race crash
Triage: Triaged: D1 (Critical)

[22 Jan 2009 20:09] Marko Mäkelä
Description:
In the function btr_search_drop_page_hash_when_freed(), buf_page_get_gen() can return NULL and cause a crash.

This is because it is possible that the page is evicted from the buffer pool between buf_page_peek_if_search_hashed() and buf_page_get_gen(), because the buffer pool mutex is released between these two calls.

How to repeat:
Create and drop tables very frequently in such a way that the working set does not fit in the buffer pool.

Suggested fix:
Make the two function calls while holding the buffer pool mutex, or prepare for a NULL return value from buf_page_get_gen().
[13 Mar 2009 19:51] Timothy Smith
Pushed to 5.1.33; Docs please return to "Patch approved" waiting for a 6.0 snapshot.

  Applying InnoDB snashot 5.1-ss4350, part 1.  Fixes
  
  Bug #42279    Race condition in btr_search_drop_page_hash_when_freed()
  
  Detailed revision comments:
  
  r4032 | marko | 2009-01-23 15:43:51 +0200 (Fri, 23 Jan 2009) | 10 lines
  branches/5.1: Merge r4031 from branches/5.0:
  
  btr_search_drop_page_hash_when_freed(): Check if buf_page_get_gen()
  returns NULL.  The page may have been evicted from the buffer pool
  between buf_page_peek_if_search_hashed() and buf_page_get_gen(),
  because the buffer pool mutex will be released between these two calls.
  (Bug #42279)
  
  rb://82 approved by Heikki Tuuri
[15 Mar 2009 0:20] Paul Dubois
Noted in 5.1.33 changelog.

The InnoDB btr_search_drop_page_hash_when_freed() function had a race
condition. 

Setting report to Patch Approved pending push into 6.0.x.
[24 Apr 2009 11:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72777

2731 Satya B	2009-04-24
      Applying InnoDB snashot 5.0-ss4900 part 1, Fixes BUG#42279
      
      1) BUG#42279 - Race condition in btr_search_drop_page_hash_when_freed()
      
      Detailed revision comments:
      
      r4031 | marko | 2009-01-23 15:33:46 +0200 (Fri, 23 Jan 2009) | 8 lines
      branches/5.0: btr_search_drop_page_hash_when_freed(): Check if
      buf_page_get_gen() returns NULL.  The page may have been evicted
      from the buffer pool between buf_page_peek_if_search_hashed() and
      buf_page_get_gen(), because the buffer pool mutex will be released
      between these two calls. (Bug #42279)
      
      rb://82 approved by Heikki Tuuri
      modified:
        innobase/btr/btr0sea.c
[24 Apr 2009 12:15] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72782

2870 Satya B	2009-04-24 [merge]
      NULL MERGE of innodb-5.0-ss4900 into 5.1 branch. Note BUG#42279 
      is pushed along with BUG#43309.
      
      Forgot to add BUG#4229 in the first paragraph in the commit 
      message
[5 May 2009 18:52] Bugs System
Pushed into 5.0.82 (revid:davi.arnaut@sun.com-20090505184158-dvmedh8n472y8np5) (version source revid:davi.arnaut@sun.com-20090505184158-dvmedh8n472y8np5) (merge vers: 5.0.82) (pib:6)
[5 May 2009 19:41] Bugs System
Pushed into 5.1.35 (revid:davi.arnaut@sun.com-20090505190206-9xmh7dlc6kom8exp) (version source revid:davi.arnaut@sun.com-20090505190206-9xmh7dlc6kom8exp) (merge vers: 5.1.35) (pib:6)
[6 May 2009 14:06] Bugs System
Pushed into 6.0.12-alpha (revid:svoj@sun.com-20090506125450-yokcmvqf2g7jhujq) (version source revid:satya.bn@sun.com-20090424121640-zg2txzmrfqj20ep0) (merge vers: 6.0.11-alpha) (pib:6)
[13 May 2009 23:31] Paul Dubois
Noted in 5.0.82, 6.0.12 changelogs. (Was already in 5.1.33 changelog.)
[15 Jun 2009 8:27] Bugs System
Pushed into 5.1.35-ndb-6.3.26 (revid:jonas@mysql.com-20090615074202-0r5r2jmi83tww6sf) (version source revid:jonas@mysql.com-20090615070837-9pccutgc7repvb4d) (merge vers: 5.1.35-ndb-6.3.26) (pib:6)
[15 Jun 2009 9:07] Bugs System
Pushed into 5.1.35-ndb-7.0.7 (revid:jonas@mysql.com-20090615074335-9hcltksp5cu5fucn) (version source revid:jonas@mysql.com-20090615072714-rmfkvrbbipd9r32c) (merge vers: 5.1.35-ndb-7.0.7) (pib:6)
[15 Jun 2009 9:48] Bugs System
Pushed into 5.1.35-ndb-6.2.19 (revid:jonas@mysql.com-20090615061520-sq7ds4yw299ggugm) (version source revid:jonas@mysql.com-20090615054654-ebgpz7elwu1xj36j) (merge vers: 5.1.35-ndb-6.2.19) (pib:6)
[12 Oct 2009 21:58] Martin Dimitrov
I am PhD student my research involves automated debugging, and I would like to ask the developers for some help/clarification on this bug. 

I believe that the description on how to reproduce this bug is incorrect. I believe that it is not possible to reproduce this bug by rapidly dropping and creating tables from multiple threads. 

The reasoning for the bug, is that the buffer pool mutex is not being held between buf_page_peek_if_search_hashed() and buf_page_get_gen(). Thus, an interleaving thread may displace some pages from the buffer pool. 

The example given in "How to repeat:" says to rapidly drop and create tables such that the working set does not fit in the buffer pool. 
However, dropping and creating tables is synchronized because both mysql_rm_table() and mysql_create_table() hold LOCK_open. Thus it is not possible to interleave a call to create table such that it displaces buffer pool pages in the middle of a call to drop table. I attempted to reproduce this bug as suggested, by 
inserting sleep just after buf_page_peek_if_search_hashed(), but I was unsuccessful, since my CREATE TABLE statements were always blocked by DROP TABLE statements. 

If my understanding is correct, what would be an alternative way to trigger this bug? 
Or is this data race infeasible, due to other synchronization operations (such as synchronization by LOCK_open in the case of drop/create table). 

Thank you very much for the help
Martin
[13 Oct 2009 8:53] Marko Mäkelä
Hi Martin,

You wrote that you believe that it is not possible to reproduce this bug by rapidly dropping and creating tables from multiple threads. You may be right. This bug was triggered by running a large number of test scripts concurrently. I did not try to create an isolated test case, because the bug was so obvious. The function btr_search_drop_page_hash_when_freed(), where this bug occurred, is invoked whenever a B-tree index page that resides in the buffer pool is freed, as in, the page in the tablespace file will be freed for future allocations, or the entire tablespace is dropped. Index pages can be freed when records are being purged. Note that DELETE will merely set a delete-mark flag, and the records will be actually purged by trx_purge().

Which research group do you belong to? My doctoral thesis at Helsinki University of Technology was about model checking by state space enumeration. I joined InnoDB development six years ago, but sadly have not been able to do any computer-assisted verification.
[13 Oct 2009 13:41] Martin Dimitrov
Marko, 

Thanks for the help and the prompt response. Bases on your explanation, I will give it another try to reproduce this bug (using different queries - not just drop/create table)

I am in University of Central Florida and my research is computer architecture. However lately, I have been working on debugging, but my focus has been on dynamic (vs. static) techniques. Currently I am researching concurrency related bugs - data races, deadlocks, or any other bugs triggered in the presence of concurrency (such as bug 42279)

I find this bug database extremely useful, since developers usually post: a test case, an explanation of the bug, a patch, etc - all the information that one needs to study bugs. 

Thanks, 
Martin
[14 Oct 2009 10:24] Marko Mäkelä
Hi Martin,
I wish you great progress with your research. Bug #47814 could be interesting from an academic point of view: a diagnostic routine that is being invoked on a lock wait timeout gets stuck in a lock wait. :-)
[14 Oct 2009 14:27] Martin Dimitrov
Yes, this is interesting. Thank you. And the way to reproduce 47814 was also pretty creative.
[5 May 2010 15:03] Bugs System
Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[6 May 2010 16:58] Paul Dubois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.
[28 May 2010 5:59] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)
[28 May 2010 6:28] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[28 May 2010 6:56] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[29 May 2010 22:36] Paul Dubois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.
[15 Jun 2010 8:08] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100615080459-smuswd9ooeywcxuc) (version source revid:mmakela@bk-internal.mysql.com-20100415070122-1nxji8ym4mao13ao) (merge vers: 5.1.47) (pib:16)
[15 Jun 2010 8:24] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100615080558-cw01bzdqr1bdmmec) (version source revid:mmakela@bk-internal.mysql.com-20100415070122-1nxji8ym4mao13ao) (pib:16)
[17 Jun 2010 12:02] Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:44] Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:29] Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)