Bug #63815 Reorganization May Make a Page Incompressible
Submitted: 21 Dec 2011 2:51 Modified: 18 Oct 2012 22:06
Reporter: Nizameddin Ordulu Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S1 (Critical)
Version: OS:Any
Assigned to: CPU Architecture:Any
Tags: btr_cur_optimistic_update, btr_cur_update_alloc_zip, compression, innodb

[21 Dec 2011 2:51] Nizameddin Ordulu
Description:
Marko,

This is a duplicate of bug 61456, but I wanted to start fresh because I wasn't sure about what you are referring to in some of your earlier comments in that bug.

We have some servers that are deployed with compression with innodb_compression_level=1. Below assertion in btr_cur_optimistic_update() causes crash:

2075  if (!(flags & BTR_KEEP_SYS_FLAG)) {
2076    row_upd_index_entry_sys_field(new_entry, index, DATA_ROLL_PTR,
2077                roll_ptr);
2078    row_upd_index_entry_sys_field(new_entry, index, DATA_TRX_ID,
2079                trx->id);
2080  }
2081
2082  /* There are no externally stored columns in new_entry */
2083  rec = btr_cur_insert_if_possible(cursor, new_entry, 0/*n_ext*/, mtr);
2084  ut_a(rec); /* <- We calculated above the insert would fit */
2085
2086  if (page_zip && !dict_index_is_clust(index)
2087      && page_is_leaf(page)) {
2088    /* Update the free bits in the insert buffer. */
2089    ibuf_update_free_bits_zip(block, mtr);
2090  }

The problem is the same: reorganized pages may fail to compress. To be more specific, I think the failure takes when following conditions are met before entering btr_cur_optimistic_update():
- the compressed page has space in the modification log, so btr_cur_update_alloc_zip() returns TRUE (so does page_zip_available())
- when reorganized the page becomes incompressible.

None of the checks in btr_cur_optimistic_update() accounts for the above case and when btr_cur_insert_if_possible() gets called page is reorganized, page fails to compress, and FALSE is returned by this function. This causes the assertion to be hit.

I fixed this by forcing btr_cur_optimistic_update() to obey the following rule:
- btr_cur_optimistic_update() should never attempt to recompress or reorganize the page of a compressed table.
Compression is an expensive operation so IMO, the cases in which the page is re-compressed (and reorganization causes compression) should be handled by btr_cur_pessimistic_update().

I also removed the call to btr_cur_update_alloc_zip() and put a call to page_zip_available() because the former may re-compress the page.

I also changed btr_cur_optimistic_insert() because you confirmed that change in bug 61456.

diff is as follows:

diff --git a/storage/innodb_plugin/btr/btr0cur.c b/storage/innodb_plugin/btr/btr0cur.c
index cd820e3..e68df06 100644
--- a/storage/innodb_plugin/btr/btr0cur.c
+++ b/storage/innodb_plugin/btr/btr0cur.c
@@ -1226,7 +1226,8 @@ fail_err:
 
                if (UNIV_UNLIKELY(reorg)) {
                        ut_a(zip_size);
-                       ut_a(*rec);
+                       if (!*rec)
+                               goto fail;
                }
        }
 
@@ -1998,8 +1999,8 @@ any_extern:
 #endif /* UNIV_ZIP_DEBUG */
 
        if (UNIV_LIKELY_NULL(page_zip)
-           && !btr_cur_update_alloc_zip(page_zip, block, index,
-                                        new_rec_size, TRUE, mtr)) {
+           && !page_zip_available(page_zip, dict_index_is_clust(index),
+                                  new_rec_size, TRUE)) {
                err = DB_ZIP_OVERFLOW;
                goto err_exit;
        }
@@ -2023,7 +2024,8 @@ any_extern:
        }
 
        max_size = old_rec_size
-               + page_get_max_insert_size_after_reorganize(page, 1);
+               + (page_zip ? page_get_max_insert_size(page, 1)
+                           : page_get_max_insert_size_after_reorganize(page, 1));
 
        if (!(((max_size >= BTR_CUR_PAGE_REORGANIZE_LIMIT)
               && (max_size >= new_rec_size))

Can you tell me:

-If the bug makes sense.
-If this fix is correct.
-This fix causes more calls to btr_cur_pessimistic_update(). What are the implications of this in terms of mutex contention and performance? The fact is that the only cases which used to be handled by btr_cur_optimistic_update() and are now handled by btr_cur_pessimistic_update() are those that involve page compression. Because this is an already expensive operation, I think performance will not hurt much but I would like to hear your opinion.
-How can I test for this kind of affects of compression? I am willing to introduce debug variables into source code.

How to repeat:
Not reproducible.

Suggested fix:
diff --git a/storage/innodb_plugin/btr/btr0cur.c b/storage/innodb_plugin/btr/btr0cur.c
index cd820e3..e68df06 100644
--- a/storage/innodb_plugin/btr/btr0cur.c
+++ b/storage/innodb_plugin/btr/btr0cur.c
@@ -1226,7 +1226,8 @@ fail_err:
 
                if (UNIV_UNLIKELY(reorg)) {
                        ut_a(zip_size);
-                       ut_a(*rec);
+                       if (!*rec)
+                               goto fail;
                }
        }
 
@@ -1998,8 +1999,8 @@ any_extern:
 #endif /* UNIV_ZIP_DEBUG */
 
        if (UNIV_LIKELY_NULL(page_zip)
-           && !btr_cur_update_alloc_zip(page_zip, block, index,
-                                        new_rec_size, TRUE, mtr)) {
+           && !page_zip_available(page_zip, dict_index_is_clust(index),
+                                  new_rec_size, TRUE)) {
                err = DB_ZIP_OVERFLOW;
                goto err_exit;
        }
@@ -2023,7 +2024,8 @@ any_extern:
        }
 
        max_size = old_rec_size
-               + page_get_max_insert_size_after_reorganize(page, 1);
+               + (page_zip ? page_get_max_insert_size(page, 1)
+                           : page_get_max_insert_size_after_reorganize(page, 1));
 
        if (!(((max_size >= BTR_CUR_PAGE_REORGANIZE_LIMIT)
               && (max_size >= new_rec_size))
[21 Dec 2011 7:47] Marko Mäkelä
Nizameddin,

Sorry, I have been busy developing a new feature. Weeks ago, I developed a patch that should have addressed this bug. Unfortunately, it was not error-free, and I have not been able to work on bugs since then. The good news is that the feature is almost complete, and I should be able to work on bugs in January.

You seem to be on the right track, but I am not sure if your patch is fixing all cases. The estimates from page_zip_available() are only valid as long as the page is not reorganized or recompressed. We should not use the estimates when they are not safe.

In my patch (rb:777), I was also trying to avoid unnecessary page reorganization, to reduce unnecessary redo logging and CPU consumption. For example, page_copy_rec_list() and page_copy_rec_end() should reorganize the uncompressed page before starting the bulk operation, and finally attempt to compress the entire page. Ideally, each operation should reorganize or recompress a given page at most once.

While testing my patch, we had some crash recovery failures that I did not fully analyze yet.
[21 Dec 2011 7:59] Marko Mäkelä
Answers to your questions:

-This fix causes more calls to btr_cur_pessimistic_update(). What are the implications of this in terms of mutex contention and performance?

The pessimistic operations will acquire an exclusive lock on the non-leaf levels of the index tree (index->lock). This could reduce concurrency. It is probably not good to hold such a mutex when compressing pages.

It might not be that bad, though. We are also holding x-lock on index->lock when writing off-page columns. This is mainly because BLOB pages are being allocated from the same file segment as clustered index leaf pages, and the allocation structure is being protected by index->lock.

-How can I test for this kind of affects of compression? I am willing to introduce debug variables into source code.

One idea that I have not tested yet is that you could vary the compression rate. In the deflateInit2() call in page0zip.c, randomly vary the zlib compression level. That should cause variance in the compression ratio. Something like this might do the trick:

	err = deflateInit2(&c_stream,
			   rand() % 10,
			   Z_DEFLATED, UNIV_PAGE_SIZE_SHIFT,
			   MAX_MEM_LEVEL, Z_DEFAULT_STRATEGY);

You might even try a biased distribution, such as rand() & 3 ? 9 : 0.
[26 Dec 2011 1:06] Nizameddin Ordulu
make page_zip_compress() return FALSE randomly.

Attachment: simulate_comp_failures.patch (application/octet-stream, text), 9.62 KiB.

[26 Dec 2011 2:52] Nizameddin Ordulu
Marko,

I created a patch to make page_zip_compress() return FALSE randomly in order to find the bugs related to compression failures. The mtr test simulate_comp_failures.test is attached. I list below the bugs I found by running this test (some of them are already mentioned but I list them anyway for completeness)

In btr_cur_optimistic_insert():
1219  /* Now, try the insert */
1220
1221  {
1222    const rec_t* page_cursor_rec = page_cur_get_rec(page_cursor);
1223    *rec = page_cur_tuple_insert(page_cursor, entry, index,
1224               n_ext, mtr);
1225    reorg = page_cursor_rec != page_cur_get_rec(page_cursor);
1226
1227    if (UNIV_UNLIKELY(reorg)) {
1228      ut_a(zip_size);
1229      /* It's possible for rec to be NULL if the page is compressed.                                                                                             
1230         This is because a reorganized page may become incompressible */
1231      if (!*rec)
1232        goto fail;
1233    }
1234  }

In btr_cur_optimistic_update():
2028  /* We do not attempt to reorganize if the page is compressed.                                                                                                  
2029     This is because the page may fail to compress after reorganization */
2030  max_size = old_rec_size
2031    + (page_zip ? page_get_max_insert_size(page, 1)
2032                : page_get_max_insert_size_after_reorganize(page, 1));

In btr_cur_pessimistic_update():

@@ -2384,7 +2390,10 @@ make_external:
                err = DB_SUCCESS;
                goto return_after_reservations;
        } else {
-               ut_a(optim_err != DB_UNDERFLOW);
+               /* If the page is compressed, it is possible for btr_cur_optimistic_update()
+                  to return DB_UNDERFLOW and btr_cur_insert_if_possible() to return FALSE.
+                  See http://bugs.mysql.com/bug.php?id=61208 */
+               ut_a(page_zip || optim_err != DB_UNDERFLOW);
 
                /* Out of space: reset the free bits. */
                if (!dict_index_is_clust(index)

Following fixes a bug where ret_pos=0 before the call to page_zip_reorganize() inside page_copy_rec_list_start(). Following is a comment inside page_copy_rec_list_start(), and I quote it here:

 796      /* Before trying to reorganize the page,
 797      store the number of preceding records on the page. */
 798      ulint ret_pos
 799        = page_rec_get_n_recs_before(ret);
 800      /* Before copying, "ret" was the predecessor
 801      of the predefined supremum record.  If it was
 802      the predefined infimum record, then it would
 803      still be the infimum.  Thus, the assertion
 804      ut_a(ret_pos > 0) would fail here. */
 805
 806      if (UNIV_UNLIKELY
 807          (!page_zip_reorganize(new_block, index, mtr))) {

When ret_pos is 0 following lines of code causes innodb to crash:

 817        /* The page was reorganized:
 818        Seek to ret_pos. */
 819        do {
 820          ret = rec_get_next_ptr(ret, TRUE);
 821        } while (--ret_pos);

This is because ret_pos is decremented before it is evaluated so the line 820 is executed until ret becomes NULL after which a SIGSEGV happens.

Following fixes that:
-
-                               do {
+                               while (ret_pos) {
+                                       ut_a(ret);
                                        ret = rec_get_next_ptr(ret, TRUE);
-                               } while (--ret_pos);
+                                       --ret_pos;
+                               }
[26 Dec 2011 3:13] Nizameddin Ordulu
New patch to fix all the bugs mentioned above:

diff --git a/storage/innodb_plugin/btr/btr0cur.c b/storage/innodb_plugin/btr/btr0cur.c
index cd820e3..9112cf1 100644
--- a/storage/innodb_plugin/btr/btr0cur.c
+++ b/storage/innodb_plugin/btr/btr0cur.c
@@ -1226,7 +1226,10 @@ fail_err:
 
                if (UNIV_UNLIKELY(reorg)) {
                        ut_a(zip_size);
-                       ut_a(*rec);
+                       /* It's possible for rec to be NULL if the page is compressed.
+                          This is because a reorganized page may become incompressible */
+                       if (!*rec)
+                               goto fail;
                }
        }
 
@@ -2022,8 +2025,11 @@ any_extern:
                goto err_exit;
        }
 
+       /* We do not attempt to reorganize if the page is compressed.
+          This is because the page may fail to compress after reorganization */
        max_size = old_rec_size
-               + page_get_max_insert_size_after_reorganize(page, 1);
+               + (page_zip ? page_get_max_insert_size(page, 1)
+                           : page_get_max_insert_size_after_reorganize(page, 1));
 
        if (!(((max_size >= BTR_CUR_PAGE_REORGANIZE_LIMIT)
               && (max_size >= new_rec_size))
@@ -2384,7 +2390,10 @@ make_external:
                err = DB_SUCCESS;
                goto return_after_reservations;
        } else {
-               ut_a(optim_err != DB_UNDERFLOW);
+               /* If the page is compressed, it is possible for btr_cur_optimistic_update()
+                  to return DB_UNDERFLOW and btr_cur_insert_if_possible() to return FALSE.
+                  See http://bugs.mysql.com/bug.php?id=61208 */
+               ut_a(page_zip || optim_err != DB_UNDERFLOW);
 
                /* Out of space: reset the free bits. */
                if (!dict_index_is_clust(index)

diff --git a/storage/innodb_plugin/page/page0page.c b/storage/innodb_plugin/page/page0page.c
index 91ac5e8..b3d9036 100644
--- a/storage/innodb_plugin/page/page0page.c
+++ b/storage/innodb_plugin/page/page0page.c
@@ -817,10 +817,12 @@ page_copy_rec_list_start(
                                /* The page was reorganized:
                                Seek to ret_pos. */
                                ret = new_page + PAGE_NEW_INFIMUM;
-
-                               do {
+                               /* ret_pos may be zero */
+                               while (ret_pos) {
+                                       ut_a(ret);
                                        ret = rec_get_next_ptr(ret, TRUE);
-                               } while (--ret_pos);
+                                       --ret_pos;
+                               }
                        }
                }
        }
[26 Dec 2011 3:16] Nizameddin Ordulu
fix the bugs

Attachment: 63815.patch (application/octet-stream, text), 2.40 KiB.

[28 Dec 2011 20:17] Nizameddin Ordulu
Marko,

This is a duplicate of bug 61456, but I wanted to start fresh because I wasn't sure about
what you are referring to in some of your earlier comments in that bug.

We have some servers that are deployed with compression with innodb_compression_level=1.
Below assertion in btr_cur_optimistic_update() causes crash:

2075  if (!(flags & BTR_KEEP_SYS_FLAG)) {
2076    row_upd_index_entry_sys_field(new_entry, index, DATA_ROLL_PTR,
2077                roll_ptr);
2078    row_upd_index_entry_sys_field(new_entry, index, DATA_TRX_ID,
2079                trx->id);
2080  }
2081
2082  /* There are no externally stored columns in new_entry */
2083  rec = btr_cur_insert_if_possible(cursor, new_entry, 0/*n_ext*/, mtr);
2084  ut_a(rec); /* <- We calculated above the insert would fit */
2085
2086  if (page_zip && !dict_index_is_clust(index)
2087      && page_is_leaf(page)) {
2088    /* Update the free bits in the insert buffer. */
2089    ibuf_update_free_bits_zip(block, mtr);
2090  }

The problem is the same: reorganized pages may fail to compress. To be more specific, I
think the failure takes when following conditions are met before entering
btr_cur_optimistic_update():
- the compressed page has space in the modification log, so btr_cur_update_alloc_zip()
returns TRUE (so does page_zip_available())
- when reorganized the page becomes incompressible.

None of the checks in btr_cur_optimistic_update() accounts for the above case and when
btr_cur_insert_if_possible() gets called page is reorganized, page fails to compress, and
FALSE is returned by this function. This causes the assertion to be hit.

I fixed this by forcing btr_cur_optimistic_update() to obey the following rule:
- btr_cur_optimistic_update() should never attempt to recompress or reorganize the page
of a compressed table.
Compression is an expensive operation so IMO, the cases in which the page is
re-compressed (and reorganization causes compression) should be handled by
btr_cur_pessimistic_update().

I also removed the call to btr_cur_update_alloc_zip() and put a call to
page_zip_available() because the former may re-compress the page.

I also changed btr_cur_optimistic_insert() because you confirmed that change in bug
61456.

diff is as follows:

diff --git a/storage/innodb_plugin/btr/btr0cur.c b/storage/innodb_plugin/btr/btr0cur.c
index cd820e3..e68df06 100644
--- a/storage/innodb_plugin/btr/btr0cur.c
+++ b/storage/innodb_plugin/btr/btr0cur.c
@@ -1226,7 +1226,8 @@ fail_err:
 
                if (UNIV_UNLIKELY(reorg)) {
                        ut_a(zip_size);
-                       ut_a(*rec);
+                       if (!*rec)
+                               goto fail;
                }
        }
 
@@ -1998,8 +1999,8 @@ any_extern:
 #endif /* UNIV_ZIP_DEBUG */
 
        if (UNIV_LIKELY_NULL(page_zip)
-           && !btr_cur_update_alloc_zip(page_zip, block, index,
-                                        new_rec_size, TRUE, mtr)) {
+           && !page_zip_available(page_zip, dict_index_is_clust(index),
+                                  new_rec_size, TRUE)) {
                err = DB_ZIP_OVERFLOW;
                goto err_exit;
        }
@@ -2023,7 +2024,8 @@ any_extern:
        }
 
        max_size = old_rec_size
-               + page_get_max_insert_size_after_reorganize(page, 1);
+               + (page_zip ? page_get_max_insert_size(page, 1)
+                           : page_get_max_insert_size_after_reorganize(page, 1));
 
        if (!(((max_size >= BTR_CUR_PAGE_REORGANIZE_LIMIT)
               && (max_size >= new_rec_size))

Can you tell me:

-If the bug makes sense.
-If this fix is correct.
-This fix causes more calls to btr_cur_pessimistic_update(). What are the implications of
this in terms of mutex contention and performance? The fact is that the only cases which
used to be handled by btr_cur_optimistic_update() and are now handled by
btr_cur_pessimistic_update() are those that involve page compression. Because this is an
already expensive operation, I think performance will not hurt much but I would like to
hear your opinion.
-How can I test for this kind of affects of compression? I am willing to introduce debug
variables into source code.
[24 Jan 2012 14:29] Marko Mäkelä
Nizameddin,

I think that your idea of making page_zip_compress() return FALSE randomly is a good and simple one. We should add that fault injection to our testing framework.

I did start a patch for these bugs last October (rb:777), but unfortunately I have been spending most of my time on completing a major feature since then. I am afraid that there are two bugs ahead of this one in my queue.

Your replacement of ut_a(*rec) with if (!*rec) goto fail looks OK to me. It corresponds to what I did in my incomplete patch.

This one is also correct, but my patch would try a little harder, to avoid the costly pessimistic operation. Pessimistic operations (tree structure changes) generate more redo log and increase the index tree x-latch hold time (index->lock).

+	/* We do not attempt to reorganize if the page is compressed.
+	   This is because the page may fail to compress after reorganization */
 	max_size = old_rec_size
-		+ page_get_max_insert_size_after_reorganize(page, 1);
+		+ (page_zip ? page_get_max_insert_size(page, 1)
+		            : page_get_max_insert_size_after_reorganize(page, 1));

The relaxation of ut_a(optim_err != DB_UNDERFLOW); by ||page_zip is correct as well.

The fix for page_copy_rec_list_start() is correct as well. A preceding comment actually says that ret_pos>0 does not necessarily hold.
[31 Jan 2012 21:25] Nizameddin Ordulu
Marko, 

Can you explain what your patch does to avoid calling btr_cur_pessimistic_update() and attach the related part of the code?

thanks,
nizam
[2 Feb 2012 12:54] Marko Mäkelä
Nizam, I worked on the patch last October. Now I finally completed a large feature, and will be able to work on demanding bug fixes again. I will try to complete my patch next week. Then I can tell.
[18 Oct 2012 22:06] John Russell
I added this bug number to the changelog entry for Bug#14554000, rather than creating a new changelog item.