MySQL Bugs: #61456: Optimistic Insert/Update uses wrong heuristics for compressed page size

Bug #61456	Optimistic Insert/Update uses wrong heuristics for compressed page size
Submitted:	9 Jun 2011 7:40	Modified:	18 Oct 2012 22:40
Reporter:	Nizameddin Ordulu	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB Plugin storage engine	Severity:	S5 (Performance)
Version:	5.1.52	OS:	Any
Assigned to:	Marko Mäkelä	CPU Architecture:	Any
Tags:	compression, innodb, quicklz

Description:
I have seen this bug occur when the compression is quicklz, however, it's theoretically possible for it to occur when compression is zlib. The bug was hit insidte btr_cur_optimistic_update() but it could have been hit inside btr_cur_optimistic_insert() too. I explain the reason I found for the bug for both btr_cur_optimistic_insert and btr_cur_optimistic_update below.
Inside btr_cur_optimistic_update() following code determines whether the insertion of the record can succeed without splitting the page: 

max_size = page_get_max_insert_size_after_reorganize(page, 1);

....

  if (dict_index_is_clust(index)
      && (page_get_n_recs(page) >= 2)
      && UNIV_LIKELY(leaf)
      && (dict_index_get_space_reserve() + rec_size > max_size)
      && (btr_page_get_split_rec_to_right(cursor, &dummy_rec)
    || btr_page_get_split_rec_to_left(cursor, &dummy_rec))) {
fail:
    err = DB_FAIL;
fail_err:

    if (big_rec_vec) {
      dtuple_convert_back_big_rec(index, entry, big_rec_vec);
    }

    return(err);
  }

  if (UNIV_UNLIKELY(max_size < BTR_CUR_PAGE_REORGANIZE_LIMIT
        || max_size < rec_size)
      && UNIV_LIKELY(page_get_n_recs(page) > 1)
      && page_get_max_insert_size(page, 1) < rec_size) {

    goto fail;
  }

If the above works fine and doesn't goto 'fail:', the insertion is performed by the following code: 
  
  if (UNIV_UNLIKELY(reorg)) {
    ut_a(zip_size);
    ut_a(*rec); // <-- problematic assertion
  }

This code assumes that after the reorganization there will be at least max_size space, however after reorganization the page may no longer be compressable in which case page_cur_tuple_insert() would return NULL. 

My fix for this is to add the following before ut_a(*rec)
      if (!*rec && page_compression_method == COMP_METHOD_QUICKLZ) {
        goto fail;
      }
      ut_a(*rec);

Similarly in btr_cur_optimistic_update() following checks are performed on the size of the record before the actual update:

  if (UNIV_LIKELY_NULL(page_zip)
      && !btr_cur_update_alloc_zip(page_zip, block, index,
           new_rec_size, TRUE, mtr)) {
    err = DB_ZIP_OVERFLOW;
    goto err_exit;
  }

  if (UNIV_UNLIKELY(new_rec_size
        >= (page_get_free_space_of_empty(page_is_comp(page))
            / 2))) {

    err = DB_OVERFLOW;
    goto err_exit;
  }

  if (UNIV_UNLIKELY(page_get_data_size(page)
        - old_rec_size + new_rec_size
        < BTR_CUR_PAGE_COMPRESS_LIMIT)) {

    /* The page would become too empty */

    err = DB_UNDERFLOW;
    goto err_exit;
  }

  max_size = old_rec_size
    + page_get_max_insert_size_after_reorganize(page, 1);

  if (!(((max_size >= BTR_CUR_PAGE_REORGANIZE_LIMIT)
         && (max_size >= new_rec_size))
        || (page_get_n_recs(page) <= 1))) {

    /* There was not enough space, or it did not pay to
    reorganize: for simplicity, we decide what to do assuming a reorganization is needed, though it might not be necessary */

    err = DB_OVERFLOW;
    goto err_exit;
  }

Again, InnoDB assumes that the record would fit if it's there is enough space after reorganization. If reorganization causes the page to become uncompressable then the above code fails at the following assertion:

  rec = btr_cur_insert_if_possible(cursor, new_entry, 0/*n_ext*/, mtr);
  ut_a(rec); /* <- We calculated above the insert would fit */

In fact the above comment (placed here from the original) is exactly wrong because the above code doesn't take into account the possible compressed page size growth after reorganization. My fix for this is to add check that makes btr_optimistic_update() fail if there is not enough space for the record before reorganization: 

+       if (UNIV_LIKELY_NULL(page_zip)
+           && page_compression_method == COMP_METHOD_QUICKLZ) {
+               max_size = old_rec_size + page_get_max_insert_size(page, 1);
+               if (max_size < new_rec_size) {
+                       err = DB_ZIP_OVERFLOW;
+                       goto err_exit;
+               }
+       }
+
        /* Do lock checking and undo logging */
        err = btr_cur_upd_lock_and_undo(flags, cursor, update, cmpl_info,
                                        thr, mtr, &roll_ptr);

I also made the following change in ibuf_index_page_calc_free_zip() for the same reason:

@@ -197,8 +197,15 @@ ibuf_index_page_calc_free_zip(
        ut_ad(zip_size == buf_block_get_zip_size(block));
        ut_ad(zip_size);
 
-       max_ins_size = page_get_max_insert_size_after_reorganize(
-               buf_block_get_frame(block), 1);
+       if (page_compression_method == COMP_METHOD_QUICKLZ) {
+               max_ins_size = page_get_max_insert_size(buf_block_get_frame(block), 1);
+       } else {
+               max_ins_size = page_get_max_insert_size_after_reorganize(
+                       buf_block_get_frame(block), 1);
+       }

That is, if compression is quicklz, the free space is estimated using page_get_max_insert_size() instead of page_get_max_insert_size_after_reorganize().

I hit this bug only after running a compression-enabled server for a week.
Can you tell me:
* if I understand this correctly?
* whether this solution makes sense from a correctness standpoint?
* whether it makes sense from a performance standpoint? Would it degrade the perf dramatically?

How to repeat:
read the code

Suggested fix:
fix included in description of the problem.

Also if you can suggest a better solution, please do.

Sorry Nizameddin, I have been busy with other tasks. Unfortunately I am afraid that I cannot look at this before late August, due to vacation and a deadline. It looks like you are on the right track. It looks like I forgot to push the Bug#61208 fix before my vacation, which starts now.

Yes, this is a real bug. When I designed the InnoDB compression, I did not make any assumptions about compression ratio. It looks like I forgot to review the high-level logic in btr0cur.c after completing the implementation.

As you have probably noticed, there is also the "last effort" code path that involves page_zip_reorganize(). It is possible that InnoDB is attempting both btr_page_reorganize() and page_zip_reorganize() for the same insert or update. This should be checked and prevented.

Marko: Is there an easy way to recover from these estimation errors? I have just opened another bug that occurred with zlib compression: http://bugs.mysql.com/bug.php?id=62190. I don't necessarily want to fix the estimation of whether the record will fit on the page, but I would like to fallback to sth like btr_cur_pessimistic_update() if btr_cur_optimistic_update() fails. Can you suggest a way to do this?

Thanks Nizameddin, I can confirm the first bug and your suggested fix. Instead of ut_a(*rec), we should just if (!*rec) goto fail. While we checked that there is enough space in the uncompressed page, we cannot know if the compression succeeds.

Do you have an actual failure of this assertion in btr_cur_optimistic_update()?
	ut_a(rec); /* <- We calculated above the insert would fit */
btr_cur_optimistic_update() should be correct, because it uses btr_cur_update_alloc_zip() for an exact estimate. If there is not enough space in the modification log, it should return DB_ZIP_OVERFLOW.

The code path on which this assertion is deletes the original record and inserts an updated record. The delete does not consume place in the compressed page, as it is done in-place. The insert should succeed without reorganizing or recompressing the page, unless there is a bug in the btr_cur_update_alloc_zip() check.

We could try harder in btr_cur_optimistic_update() and skip the btr_cur_update_alloc_zip() check there, so that we can avoid some double work in btr_cur_pessimistic_update().

Also, I see that btr_cur_insert_if_possible() should attempt reorganization only if compression is not being used. Reorganization will already have been attempted in a failed attempt to insert into a compressed page.

One last thing: I think that we should start testing InnoDB compression with a randomly-varying compression ratio. When a debug flag is set in debug builds, choose the compression level randomly, alternating between minimum (no compression) and maximum level. This should give better test coverage than attempts to vary the compressibility of user records.

Sorry, I missed some of your explanation.

OK, you seem to be right, btr_cur_optimistic_update() is assuming that it can reorganize the page. The btr_cur_update_alloc_zip() does not take this into account.

Also, btr_cur_update_alloc_zip() may be attempting to recompress the page without reorganizing it. This can waste CPU. The fix is to get rid of the btr_cur_update_alloc_zip() call here. Even in btr_cur_update_in_place() I think that we would be better off calling page_zip_available() directly instead of btr_cur_update_alloc_zip().

You are also right about ibuf_index_page_calc_free_zip(). Any operation that involves page reorganization can fail on a compressed page. The result from page_zip_available() is only valid if we avoid reorganization and recompressing.

After reading all of ibuf_index_page_calc_free_zip(), I see no bug there. It computes the bits based on the maximum size [with reorganize] and page_zip_max_ins_size() [without reorganize], whichever is greater.

I would add a debug assertion to ibuf_insert_to_index_page_low() that ensures that compressed pages will never be reorganized there, because the free bits calculation does not allow reorganization. Also, all btr_page_reorganize() calls should be removed there for compressed pages, because page_cur_tuple_insert() already invokes page_zip_reorganize() if all else fails.

I'm going to associate this bug number with the same changelog entry as bug#14554000 rather than creating a new changelog entry with similar symptoms.

This was internal bug#12845774, fixed in MySQL 5.1.66.