Bug #61191 question about page_zip_available
Submitted: 16 May 2011 20:53 Modified: 10 Jan 2012 3:52
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S3 (Non-critical)
Version:5.1.52 OS:Any
Assigned to: Marko Mäkelä CPU Architecture:Any
Tags: compression, innodb

[16 May 2011 20:53] Mark Callaghan
Description:
This is a question about InnoDB plugin code. We are curious whether this is an obscure bug.

This is a bug about an edge case where page_zip_available() estimates the
log size after an update, but it doesn¹t take into account the 1 byte that
delete record occupies on the modification log.

In btr_cur_optimistic_update() around line 2004 there is the following
call to btr_cur_insert_if_possible() followed by an assertion and a
comment:

2006  rec = btr_cur_insert_if_possible(cursor, new_entry, 0/*n_ext*/, mtr);
2007  ut_a(rec); /* <- We calculated above the insert would fit */

btr_cur_insert_if_possible() tries to insert the record into the page
without splitting the page.
Following snippet is part of the calculation that's supposed to make sure
that the insert would fit (again inside btr_cur_optimistic_update()):

1930  if (UNIV_LIKELY_NULL(page_zip)
1931      && !btr_cur_update_alloc_zip(page_zip, block, index,
1932           new_rec_size, TRUE, mtr)) {
1933    err = DB_ZIP_OVERFLOW;
1934    goto err_exit;
1935  }

The important thing here is the call to btr_cur_update_alloc_zip(). This
function calls page_zip_available() internally to determine whether there
is enough space in the modification log. Here is the related part from
page_zip_available():

return(UNIV_LIKELY(length
+ trailer_len
+ page_zip->m_end
< page_zip_get_size(page_zip)));

The bug that I see occurs when the left hand side of the above comparison
is 1 less than the right hand. That roughly corresponds to having only 1
byte left in the modification log. The problem is that we need at least 2
more bytes because we also log the delete record on modification log and
that increases page_zip->m_end by 1 or 2. This happens during the call to
page_cur_delete_rec() between the calls to btr_cur_update_alloc_zip() and
btr_cur_insert_if_possible(). The fix is to modify page_zip_available() to
return the following if the record is being created:

Inside page_zip_available() in page0zip.ic:

331  total_len = length + trailer_len + page_zip->m_end;
332  if (UNIV_UNLIKELY(create)) {
333    /* When a record is created there will also be a delete record on
334       modification log */
335    total_len += 2;
336  }
337  return(UNIV_LIKELY(total_len < page_zip_get_size(page_zip)));

How to repeat:
read the code
[17 May 2011 6:25] Marko Mäkelä
First, I will provide some background before I answer the question.

A compressed InnoDB page consists of a page header (which is a copy of the uncompressed page header), zlib compressed record data, some free space, and page directory, containing 16 bits for every user record on the page.

On a leaf page of a clustered index tree, some space immediately before the page directory is reserved for system fields. For each record, we would store the DB_TRX_ID and DB_ROLL_PTR there. We would also store the BLOB pointers there.

Between the end of the zlib compressed record data stream and the page directory (or the area reserved for uncompressed fields) is a modification log. Originally, I intended to log record deletions (delete-marking and purging) in the modification log. This did not turn out to work well. I moved the delete-mark bit to a flag in the page directory, so that the delete-mark bit can be set or reset in-place, without writing anything to the modification log. Records are purged by editing some page header fields and shuffling the page directory at the end of the compressed page.

Only inserts and updates of compressed fields should grow the modification log and eventually require recompression or page splits. Deletes, purges and editing the system fields DB_TRX_ID, DB_ROLL_PTR, and BLOB pointers, all happen in-place.

You have probably encountered an outdated comment. I will investigate it and provide an answer in my next comment.
[17 May 2011 7:26] Marko Mäkelä
Where would page_cur_delete_rec() grow the modification log? The functions page_header_set_ptr(), page_header_set_field() and page_dir_slot_set_n_owned() do not touch the log.

Is it the page_zip_clear_rec() that you had in mind? It is an optional step in page_zip_dir_delete(). If the modification log runs out, no clearing will not be performed. The idea of this clearing is to allow the page to compress better when we are attempting a simple recompress without page_zip_reorganize() or btr_page_reorganize().

There is an apparent problem with page_zip_clear_rec(). In btr_cur_optimistic_update() we do this:

	page_cur_delete_rec(page_cursor, index, offsets, mtr);
...
	rec = btr_cur_insert_if_possible(cursor, new_entry, 0/*n_ext*/, mtr);
	ut_a(rec); /* <- We calculated above the insert would fit */

The problem is that page_cur_delete_rec() could fill the modification log while doing page_zip_clear_rec(), requiring recompression for the btr_cur_insert_if_possible(). In a pathological case, the data could fail to recompress.

The fix would be to pass a flag to page_cur_delete_rec() to tell whether to attempt the page_zip_clear_rec(), or to make page_zip_clear_rec() leave the modification log alone. I will attach a patch for the latter. I would be interested in your feedback on the performance impact.
[17 May 2011 8:28] Marko Mäkelä
Only clear system fields in page_zip_clear_rec()

Attachment: bug61191.patch (text/x-diff), 6.04 KiB.

[17 May 2011 20:16] Nizameddin Ordulu
Hi Marko,

I am the guy who found this bug. 

You are right that it is actually page_zip_clear_rec() that grows the modification log. What you describe as a pathological example is the actual bug that I am seeing. I will patch your change and test it and let you know if it works or not. Can you suggest any small tests that could stress test your change? Thanks a lot for the fix.

thanks,
nizam
[17 May 2011 21:42] Nizameddin Ordulu
Marko: there is another similar edge-case bug that I encountered and filed here: http://bugs.mysql.com/bug.php?id=61208. Can you please take a look at that too?
[18 May 2011 6:52] Marko Mäkelä
Thank you, Nizameddin!

The patch that I attached has not been reviewed or tested yet, other than me running mysql-test-run --suite innodb_plugin with UNIV_ZIP_DEBUG enabled. UNIV_ZIP_DEBUG enables the page_zip_validate() checks, ensuring that the compressed page always corresponds to the uncompressed page.

I have not been involved with testing the compression lately. As you have noticed, the worst case should be when the compression rate varies wildly.

We recently got a BLOB testing framework, which would probably have found Bug #55284 earlier if it had been in place. That script could be changed so that it would vary the compressibility of the data. I am not sure how well this approach could work alone.

For white-box testing, one possibility is to randomly vary the deflateInit2() parameter in page_zip_compress between Z_DEFAULT_COMPRESSION and Z_NO_COMPRESSION.

Do you have any test scripts that you could share?
[9 Jun 2011 20:45] Nizameddin Ordulu
Marko, unfortunately, I don't have any scripts or even any idea how we can reproduce this bug, however I have been running mysql with your patch for over a week and this bug hasn't been hit so far so it seems that your patch fixed this bug.
[9 Aug 2011 17:19] Mark Callaghan
This bug was also making xtrabackup fail intermittently for compressed tables so we suspect the fix is needed for InnoDB hot backup
[9 Aug 2011 17:57] Mark Callaghan
I misunderstood our problem -- we had the fix for this in mysqld but not in xtrabackup and that caused problems. So I don't think this is a problem in general.
[9 Sep 2011 18:30] Nizameddin Ordulu
Marko: I think this optimization (logging row deletions on the modification log) was masking another bug. That bug surfaced when I applied your patch. Please see bug 62395, thanks!
[9 Sep 2011 20:19] Nizameddin Ordulu
Referring to your earlier comment:
"
Only inserts and updates of compressed fields should grow the modification log and
eventually require recompression or page splits. Deletes, purges and editing the system fields DB_TRX_ID, DB_ROLL_PTR, and BLOB pointers, all happen in-place.
"

The problem in bug 62395 seems to be that obsolete blob pointers surface when a new row with a small blob field is inserted in the same slot of a deleted row that contained a long blob (thus had a blob pointer instead of the blob field.)
[28 Sep 2011 12:15] Calvin Sun
The fix was pushed into 5.1.59, 5.5.16, and 5.6.3.
[10 Jan 2012 3:52] John Russell
This fix is code cleanup only, no functional change, therefore no changelog entry.