Bug #58549 Race condition in buf_LRU_drop_page_hash_for_tablespace() and compressed tables
Submitted: 29 Nov 2010 8:55 Modified: 26 Mar 2012 23:49
Reporter: Marko Mäkelä Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S3 (Non-critical)
Version:5.1 plugin, 5.5+ OS:Any
Assigned to: Marko Mäkelä
Tags: compression, DISCARD TABLESPACE, drop table, race condition, row_format=compressed
Triage: Triaged: D1 (Critical) / R1 (None/Negligible) / E1 (None/Negligible)

[29 Nov 2010 8:55] Marko Mäkelä
Description:
It looks like I introduced a race condition when merging this fix to the InnoDB Plugin:
---
r2742 | inaam | 2008-10-08 22:02:15 +0300 | 11 lines

branches/5.1:

Fix Bug#39939 DROP TABLE/DISCARD TABLESPACE takes long time in
buf_LRU_invalidate_tablespace()

Improve implementation of buf_LRU_invalidate_tablespace by attempting
hash index drop in batches instead of doing it one by one.
---
The problem is this comparison in the code:

		if (bpage && !buf_page_in_file(bpage)) {

For example, when buf_page_get_gen() does buf_relocate() to create an uncompressed page frame for a block that only existed in compressed format in the buffer pool, neither it nor buf_buddy_free() will reset the bpage->state of the compressed-only block. Thus, the test in buf_LRU_drop_page_hash_for_tablespace() will mistakenly treat the freed compressed-only block descriptor as the real deal.

How to repeat:
Something like this:

CREATE TABLE t1 (...) ENGINE=InnoDB ROW_FORMAT=COMPRESSED;
--- insert many rows
CREATE TABLE t2 (...) ENGINE=InnoDB;
--- insert many rows (forcing discard of the uncompressed frames of t1)
DROP TABLE t2;
--- while the above is executing, access those pages of t1 that are in the buffer pool in compressed format only, to force them into the uncompressed pool

Suggested fix:
Make the test stricter. After buf_pool->mutex has been released, the bpage cannot be trusted at all if it was in the compressed pool. Thus, before the potential release of the buf_pool->mutex, read the prev_bpage->state. If it was BUF_BLOCK_ZIP_PAGE or BUF_BLOCK_ZIP_DIRTY, consider always restarting the scan. Alternatively, do not advance to prev_bpage, but keep the pointer at bpage, and safely read its LRU list predecessor on the next loop round.
[23 Nov 2011 5:56] Nizameddin Ordulu
Marko: Where is the patch for this? I understand that revid 3562 contains the fix for this but it also contains lots of other stuff.
[29 Nov 2011 7:15] Marko Mäkelä
The patch for this bug was pushed in a changeset that was part of a merge:

revno: 3562
revision-id: kent.boortz@oracle.com-20110703154737-d27i4ypu2a0ran21

            revno: 3351.14.352
            revision-id: marko.makela@oracle.com-20110228115118-ogs3ib1eaz9bsgkt
            parent: vasil.dimov@oracle.com-20110225095018-jgmv1pnuprrjzat1
            committer: Marko Mäkelä <marko.makela@oracle.com>
            branch nick: 5.1-innodb
            timestamp: Mon 2011-02-28 13:51:18 +0200
            message:
              Bug #58549 Race condition in buf_LRU_drop_page_hash_for_tablespace()
              and compressed tables

The following commands work for me:

bzr log -r3351.14.352
bzr diff -c3351.14.352
[26 Mar 2012 23:49] John Russell
Added to changelog for 5.1.59: 

A DROP TABLE or DROP INDEX statement for an InnoDB table on a busy 
server could cause a crash or corrupt data in the buffer pool, if the 
buffer pool contained data from an InnoDB compressed table that was 
being accessed at the same time. (The crash could occur whether or 
not the table being dropped used compression.)