Bug #78672 assert fails in fil_io during linkbench with transparent innodb compression
Submitted: 1 Oct 2015 15:03 Modified: 5 Oct 2015 18:23
Reporter: Mark Callaghan Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.7.8 OS:Any
Assigned to: CPU Architecture:Any

[1 Oct 2015 15:03] Mark Callaghan
Description:
I get this error while running linkbench with transparent compression. I don't get it with old-style compression. Alas, the value of 'err' is not written to the db error log.

2015-09-30 13:11:56 0x7f279d127700  InnoDB: Assertion failure in thread 139808115685120 in file fil0fil.cc line 5393
InnoDB: Failing assertion: req_type.is_dblwr_recover() || err == DB_SUCCESS
InnoDB: We intentionally generate a memory trap.

I added logging for the value of 'err' when the assert fails and it is 101 -> DB_IO_DECOMPRESS_FAIL.

How to repeat:
I ran linkbench and reproduce with:
* 5.7.8 RC
* XFS and ext4
* kernel 3.16.0-30-generic
* Ubuntu 14.04 LTS
* server has 4 cores, 8G RAM, 100G SSD
* linkbench config changed to use maxid1=10,000,001, loaders=4, requesters=4, maxtime=3600
* linkbench pattern is do load step, then repeat 1-hour query steps. I get the error in the first hour for ext4 and in less than 20 hours for xfs

My cmake script is:
prefix=$1
cmake \
      -DBUILD_CONFIG=mysql_release \
      -DCMAKE_BUILD_TYPE=RelWithDebInfo \
      -DCMAKE_INSTALL_PREFIX:PATH=$prefix \
      -DWITH_SSL="bundled" \
      -DWITH_ZLIB="bundled" \
      -DMYSQL_DATADIR="${prefix}/data" \
      -DMYSQL_UNIX_ADDR="${prefix}/var/mysql.sock" \
      -DMYSQL_USER="mysql" \
      -DENABLED_LOCAL_INFILE=1 \
      -DMYSQL_MAINTAINER_MODE=1 \
      -DWITH_PERF_SCHEMA_STORAGE_ENGINE=1 \
      -DWITH_BOOST=/home/mdcallag/b/boost_1_58_0 \
      .

Suggested fix:
First, log the value of err to make this easier to debug when the assert fails
[1 Oct 2015 15:03] Mark Callaghan
My DDL is:
drop database if exists linkdb;
create database linkdb;
use linkdb;

    CREATE TABLE `linktable` (
      `id1` bigint(20) unsigned NOT NULL DEFAULT '0',
      `id1_type` int(10) unsigned NOT NULL DEFAULT '0',
      `id2` bigint(20) unsigned NOT NULL DEFAULT '0',
      `id2_type` int(10) unsigned NOT NULL DEFAULT '0',
      `link_type` bigint(20) unsigned NOT NULL DEFAULT '0',
      `visibility` tinyint(3) NOT NULL DEFAULT '0',
      `data` varchar(255) NOT NULL DEFAULT '',
      `time` bigint(20) unsigned NOT NULL DEFAULT '0',
      `version` int(11) unsigned NOT NULL DEFAULT '0',
      PRIMARY KEY (`link_type`,`id1`,`id2`),
      KEY `id1_type` (`id1`,`link_type`,`visibility`,`time`,`id2`,`version`,`data`)
    ) ENGINE=InnoDB COMPRESSION="zlib" DEFAULT CHARSET=latin1;

    CREATE TABLE `counttable` (
      `id` bigint(20) unsigned NOT NULL DEFAULT '0',
      `id_type` int(10) unsigned NOT NULL DEFAULT '0',
      `link_type` bigint(20) unsigned NOT NULL DEFAULT '0',
      `count` int(10) unsigned NOT NULL DEFAULT '0',
      `time` bigint(20) unsigned NOT NULL DEFAULT '0',
      `version` bigint(20) unsigned NOT NULL DEFAULT '0',
      PRIMARY KEY (`id`,`link_type`)
    ) ENGINE=InnoDB COMPRESSION="zlib" DEFAULT CHARSET=latin1;
    
    CREATE TABLE `nodetable` (
        `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
        `type` int(10) unsigned NOT NULL,
        `version` bigint(20) unsigned NOT NULL,
        `time` int(10) unsigned NOT NULL,
        `data` mediumtext NOT NULL,
        primary key(`id`)
        ) ENGINE=InnoDB COMPRESSION="zlib" DEFAULT CHARSET=latin1;
[1 Oct 2015 15:04] Mark Callaghan
Stack traces:

/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f19691d1340]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39)[0x7f1968612cc9]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f19686160d8]
/home/mdcallag/b/orig578/bin/mysqld[0x7407de]
/home/mdcallag/b/orig578/bin/mysqld(_Z6fil_ioRK9IORequestbRK9page_id_tRK11page_size_tmmPvS8_+0x6e4)[0x1162bb4]
/home/mdcallag/b/orig578/bin/mysqld[0x1119640]
/home/mdcallag/b/orig578/bin/mysqld(_Z13buf_read_pageRK9page_id_tRK11page_size_t+0x3c)[0x111a6ac]
/home/mdcallag/b/orig578/bin/mysqld(_Z16buf_page_get_genRK9page_id_tRK11page_size_tmP11buf_block_tmPKcmP5mtr_tb+0x413)[0x10e7fc3]
/home/mdcallag/b/orig578/bin/mysqld(_Z27btr_cur_search_to_nth_levelP12dict_index_tmPK8dtuple_t15page_cur_mode_tmP9btr_cur_tmPKcmP5mtr_t+0x6ac)[0x10cc7ec]
/home/mdcallag/b/orig578/bin/mysqld(_Z29row_ins_clust_index_entry_lowmmP12dict_index_tmP8dtuple_tmP9que_thr_tb+0x252)[0xfe8e22]
/home/mdcallag/b/orig578/bin/mysqld(_Z25row_ins_clust_index_entryP12dict_index_tP8dtuple_tP9que_thr_tmb+0xa4)[0xfea264]
/home/mdcallag/b/orig578/bin/mysqld(_Z12row_ins_stepP9que_thr_t+0x30a)[0xfea6fa]
/home/mdcallag/b/orig578/bin/mysqld[0xffb64e]
/home/mdcallag/b/orig578/bin/mysqld(_ZN11ha_innobase9write_rowEPh+0x2d2)[0xf273b2]
/home/mdcallag/b/orig578/bin/mysqld(_ZN7handler12ha_write_rowEPh+0x10d)[0x7ba8ed]
/home/mdcallag/b/orig578/bin/mysqld(_Z12write_recordP3THDP5TABLEP9COPY_INFOS4_+0x158)[0xd85a48]
/home/mdcallag/b/orig578/bin/mysqld(_ZN14Sql_cmd_insert12mysql_insertEP3THDP10TABLE_LIST+0x96a)[0xd874aa]
/home/mdcallag/b/orig578/bin/mysqld(_ZN14Sql_cmd_insert7executeEP3THD+0xc2)[0xd87c32]
/home/mdcallag/b/orig578/bin/mysqld(_Z21mysql_execute_commandP3THD+0x518)[0xc24f68]
/home/mdcallag/b/orig578/bin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x378)[0xc2afd8]
/home/mdcallag/b/orig578/bin/mysqld(_Z16dispatch_commandP3THDP8COM_DATA19enum_server_command+0x17ca)[0xc2c82a]
/home/mdcallag/b/orig578/bin/mysqld(_Z10do_commandP3THD+0x16f)[0xc2d80f]
/home/mdcallag/b/orig578/bin/mysqld(handle_connection+0x270)[0xce40c0]
/home/mdcallag/b/orig578/bin/mysqld(pfs_spawn_thread+0x1a0)[0xea4090]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f19691c9182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f19686d647d]

/home/mdcallag/b/orig578/bin/mysqld(my_print_stacktrace+0x2c)[0xe33cac]
/home/mdcallag/b/orig578/bin/mysqld(handle_fatal_signal+0x451)[0x76f6b1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f28a2f2f340]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39)[0x7f28a2370cc9]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f28a23740d8]
/home/mdcallag/b/orig578/bin/mysqld[0x7407de]
/home/mdcallag/b/orig578/bin/mysqld(_Z6fil_ioRK9IORequestbRK9page_id_tRK11page_size_tmmPvS8_+0x6e4)[0x1162bb4]
/home/mdcallag/b/orig578/bin/mysqld[0x1119640]
/home/mdcallag/b/orig578/bin/mysqld(_Z13buf_read_pageRK9page_id_tRK11page_size_t+0x3c)[0x111a6ac]
/home/mdcallag/b/orig578/bin/mysqld(_Z16buf_page_get_genRK9page_id_tRK11page_size_tmP11buf_block_tmPKcmP5mtr_tb+0x413)[0x10e7fc3]
/home/mdcallag/b/orig578/bin/mysqld[0x114baf7]
/home/mdcallag/b/orig578/bin/mysqld[0x1153aee]
/home/mdcallag/b/orig578/bin/mysqld(_Z17dict_stats_updateP12dict_table_t23dict_stats_upd_option_t+0x135d)[0x1156e6d]
/home/mdcallag/b/orig578/bin/mysqld(dict_stats_thread+0x9fc)[0x115956c]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f28a2f27182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f28a243447d]
[1 Oct 2015 15:07] Mark Callaghan
my.cnf for the test

Attachment: my.cnf.578 (application/octet-stream, text), 2.66 KiB.

[1 Oct 2015 15:08] Mark Callaghan
Reproduction requires many disk reads during the test. I configure linkbench to have a database several times larger than the InnoDB buffer pool. In this case, 2G buffer pool and 10G database. To estimate linkbench database size with compression in bytes, multiply value of maxid1 by 1000
[2 Oct 2015 2:06] Mark Callaghan
The problem is permanent. Reading that page after crash recover leads to the same crash.
[3 Oct 2015 12:22] Umesh Shastry
Hello Mark,

Thank you for the report.

Thanks,
Umesh
[3 Oct 2015 12:24] Umesh Shastry
test results

Attachment: 78672.results (application/octet-stream, text), 30.49 KiB.

[4 Oct 2015 23:01] Sunny Bains
The config disables the double write buffer:

innodb_doublewrite=0
# innodb_page_size=8K
innodb_buffer_pool_size=2G

1. This config setting cannot detect a torn write. The fact that the old compression doesn't get a torn write on an 8K page is pure luck. In theory it is susceptible to the same problem. With transparent page compression the problem is much easier to reproduce with a 32K page size, for obvious reasons.

2. On NVMFS + FusionIO HW you can disable the double write buffer because it provides atomic writes. AFAIK, EXT4, XFS and BTRFS don't provide atomic writes. Therefore, the double write buffer  should be enabled, with or without the transparent page compression.
[4 Oct 2015 23:07] Sunny Bains
The only thing I'm missing here is whether a crash was involved. If there was no crash involved then the double write buffer setting is irrelevant, at least for a read before the crash.
[5 Oct 2015 4:23] Umesh Shastry
Didn't notice any crash on xfs(with maxtime 3600 and later default):

[umshastr@hod03]/export/umesh/linkbench-master: bin/linkbench -r
.
.
INFO 2015-10-04 15:10:25,287 [Thread-3]: ThreadID = 3 DELETE_NODE totalOps = 86366 totalErrors = 0 ops = 868 sampleDuration = 999189ms samples = 868 mean = 12891.52 min = 155 25% = 6206 50% = 11152 75% = 17038 90% = 25951 95% = 32541 99% = 45756 max = 113034
INFO 2015-10-04 15:10:25,287 [Thread-3]: ThreadID = 3 GET_NODE totalOps = 1111931 totalErrors = 0 ops = 11383 sampleDuration = 999189ms samples = 10000 mean = 12277.17 min = 120 25% = 5656 50% = 10564 75% = 16690 90% = 24615 95% = 31162 99% = 47615 max = 195655
INFO 2015-10-04 15:10:25,287 [Thread-3]: ThreadID = 3 total requests = 8595457 requests/second = 85 found = 46148 not found = 52369 history queries = 1946/4358664
INFO 2015-10-04 15:10:25,289 [main]: ADD_NODE count = 884524  p25 = [0.2,0.3]ms  p50 = [0.2,0.3]ms  p75 = [0.2,0.3]ms  p95 = [0.3,0.4]ms  p99 = [2,3]ms  max = 483.734ms  mean = 0.373ms
INFO 2015-10-04 15:10:25,289 [main]: UPDATE_NODE count = 2532939  p25 = [6,7]ms  p50 = [11,12]ms  p75 = [18,19]ms  p95 = [36,37]ms  p99 = [57,58]ms  max = 624.624ms  mean = 13.991ms
INFO 2015-10-04 15:10:25,289 [main]: DELETE_NODE count = 347117  p25 = [6,7]ms  p50 = [10,11]ms  p75 = [17,18]ms  p95 = [32,33]ms  p99 = [49,50]ms  max = 375.484ms  mean = 12.924ms
INFO 2015-10-04 15:10:25,290 [main]: GET_NODE count = 4445488  p25 = [5,6]ms  p50 = [10,11]ms  p75 = [16,17]ms  p95 = [31,32]ms  p99 = [48,49]ms  max = 669.06ms  mean = 12.431ms
INFO 2015-10-04 15:10:25,290 [main]: ADD_LINK count = 3092623  p25 = [8,9]ms  p50 = [21,22]ms  p75 = [34,35]ms  p95 = [59,60]ms  p99 = [84,85]ms  max = 562.296ms  mean = 23.555ms
INFO 2015-10-04 15:10:25,290 [main]: DELETE_LINK count = 1029435  p25 = [5,6]ms  p50 = [17,18]ms  p75 = [30,31]ms  p95 = [58,59]ms  p99 = [83,84]ms  max = 527.884ms  mean = 21.132ms
INFO 2015-10-04 15:10:25,290 [main]: UPDATE_LINK count = 2754881  p25 = [8,9]ms  p50 = [21,22]ms  p75 = [35,36]ms  p95 = [63,64]ms  p99 = [91,92]ms  max = 1169.624ms  mean = 24.399ms
INFO 2015-10-04 15:10:25,291 [main]: COUNT_LINK count = 1680726  p25 = [0.1,0.2]ms  p50 = [6,7]ms  p75 = [12,13]ms  p95 = [26,27]ms  p99 = [42,43]ms  max = 338.408ms  mean = 8.107ms
INFO 2015-10-04 15:10:25,291 [main]: MULTIGET_LINK count = 181457  p25 = [0.2,0.3]ms  p50 = [0.3,0.4]ms  p75 = [15,16]ms  p95 = [34,35]ms  p99 = [55,56]ms  max = 417.181ms  mean = 8.82ms
INFO 2015-10-04 15:10:25,291 [main]: GET_LINKS_LIST count = 17432274  p25 = [0.2,0.3]ms  p50 = [0.3,0.4]ms  p75 = [11,12]ms  p95 = [28,29]ms  p99 = [45,46]ms  max = 1852.828ms  mean = 7.117ms
INFO 2015-10-04 15:10:25,291 [main]: REQUEST PHASE COMPLETED. 34381464 requests done in 100001 seconds. Requests/second = 343
[umshastr@hod03]/export/umesh/linkbench-master:

re-ran with default settings(requester/loader and maxtime) to see if I see any crash on xfs.
[5 Oct 2015 18:23] Mark Callaghan
The only crashes are from assert failures from innodb. The first is from fil_io as described in this bug.
[6 Oct 2015 5:24] Sunny Bains
1. I've been unable to crash 5.7.9 on OEL7 with EXT4. I've run the test over 5 times now. 

2. XFS testing remains

#1 could be due to some bugs fixed after the 5.7.8 cut off.
[12 Oct 2015 12:53] Umesh Shastry
Noticed that 5.7 latest source is crashing as in 5.7.8, attaching the worklog shortly.
[12 Oct 2015 12:54] Umesh Shastry
Performed similar steps with 5.7.10 as done in earlier case for 5.7.8

Attachment: 78672_5.7.10.results (application/octet-stream, text), 100.28 KiB.