Bug #78672 | assert fails in fil_io during linkbench with transparent innodb compression | ||
---|---|---|---|
Submitted: | 1 Oct 2015 15:03 | Modified: | 5 Oct 2015 18:23 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 5.7.8 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[1 Oct 2015 15:03]
Mark Callaghan
[1 Oct 2015 15:03]
Mark Callaghan
My DDL is: drop database if exists linkdb; create database linkdb; use linkdb; CREATE TABLE `linktable` ( `id1` bigint(20) unsigned NOT NULL DEFAULT '0', `id1_type` int(10) unsigned NOT NULL DEFAULT '0', `id2` bigint(20) unsigned NOT NULL DEFAULT '0', `id2_type` int(10) unsigned NOT NULL DEFAULT '0', `link_type` bigint(20) unsigned NOT NULL DEFAULT '0', `visibility` tinyint(3) NOT NULL DEFAULT '0', `data` varchar(255) NOT NULL DEFAULT '', `time` bigint(20) unsigned NOT NULL DEFAULT '0', `version` int(11) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`link_type`,`id1`,`id2`), KEY `id1_type` (`id1`,`link_type`,`visibility`,`time`,`id2`,`version`,`data`) ) ENGINE=InnoDB COMPRESSION="zlib" DEFAULT CHARSET=latin1; CREATE TABLE `counttable` ( `id` bigint(20) unsigned NOT NULL DEFAULT '0', `id_type` int(10) unsigned NOT NULL DEFAULT '0', `link_type` bigint(20) unsigned NOT NULL DEFAULT '0', `count` int(10) unsigned NOT NULL DEFAULT '0', `time` bigint(20) unsigned NOT NULL DEFAULT '0', `version` bigint(20) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`id`,`link_type`) ) ENGINE=InnoDB COMPRESSION="zlib" DEFAULT CHARSET=latin1; CREATE TABLE `nodetable` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `type` int(10) unsigned NOT NULL, `version` bigint(20) unsigned NOT NULL, `time` int(10) unsigned NOT NULL, `data` mediumtext NOT NULL, primary key(`id`) ) ENGINE=InnoDB COMPRESSION="zlib" DEFAULT CHARSET=latin1;
[1 Oct 2015 15:04]
Mark Callaghan
Stack traces: /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f19691d1340] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39)[0x7f1968612cc9] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f19686160d8] /home/mdcallag/b/orig578/bin/mysqld[0x7407de] /home/mdcallag/b/orig578/bin/mysqld(_Z6fil_ioRK9IORequestbRK9page_id_tRK11page_size_tmmPvS8_+0x6e4)[0x1162bb4] /home/mdcallag/b/orig578/bin/mysqld[0x1119640] /home/mdcallag/b/orig578/bin/mysqld(_Z13buf_read_pageRK9page_id_tRK11page_size_t+0x3c)[0x111a6ac] /home/mdcallag/b/orig578/bin/mysqld(_Z16buf_page_get_genRK9page_id_tRK11page_size_tmP11buf_block_tmPKcmP5mtr_tb+0x413)[0x10e7fc3] /home/mdcallag/b/orig578/bin/mysqld(_Z27btr_cur_search_to_nth_levelP12dict_index_tmPK8dtuple_t15page_cur_mode_tmP9btr_cur_tmPKcmP5mtr_t+0x6ac)[0x10cc7ec] /home/mdcallag/b/orig578/bin/mysqld(_Z29row_ins_clust_index_entry_lowmmP12dict_index_tmP8dtuple_tmP9que_thr_tb+0x252)[0xfe8e22] /home/mdcallag/b/orig578/bin/mysqld(_Z25row_ins_clust_index_entryP12dict_index_tP8dtuple_tP9que_thr_tmb+0xa4)[0xfea264] /home/mdcallag/b/orig578/bin/mysqld(_Z12row_ins_stepP9que_thr_t+0x30a)[0xfea6fa] /home/mdcallag/b/orig578/bin/mysqld[0xffb64e] /home/mdcallag/b/orig578/bin/mysqld(_ZN11ha_innobase9write_rowEPh+0x2d2)[0xf273b2] /home/mdcallag/b/orig578/bin/mysqld(_ZN7handler12ha_write_rowEPh+0x10d)[0x7ba8ed] /home/mdcallag/b/orig578/bin/mysqld(_Z12write_recordP3THDP5TABLEP9COPY_INFOS4_+0x158)[0xd85a48] /home/mdcallag/b/orig578/bin/mysqld(_ZN14Sql_cmd_insert12mysql_insertEP3THDP10TABLE_LIST+0x96a)[0xd874aa] /home/mdcallag/b/orig578/bin/mysqld(_ZN14Sql_cmd_insert7executeEP3THD+0xc2)[0xd87c32] /home/mdcallag/b/orig578/bin/mysqld(_Z21mysql_execute_commandP3THD+0x518)[0xc24f68] /home/mdcallag/b/orig578/bin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x378)[0xc2afd8] /home/mdcallag/b/orig578/bin/mysqld(_Z16dispatch_commandP3THDP8COM_DATA19enum_server_command+0x17ca)[0xc2c82a] /home/mdcallag/b/orig578/bin/mysqld(_Z10do_commandP3THD+0x16f)[0xc2d80f] /home/mdcallag/b/orig578/bin/mysqld(handle_connection+0x270)[0xce40c0] /home/mdcallag/b/orig578/bin/mysqld(pfs_spawn_thread+0x1a0)[0xea4090] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f19691c9182] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f19686d647d] /home/mdcallag/b/orig578/bin/mysqld(my_print_stacktrace+0x2c)[0xe33cac] /home/mdcallag/b/orig578/bin/mysqld(handle_fatal_signal+0x451)[0x76f6b1] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f28a2f2f340] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39)[0x7f28a2370cc9] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f28a23740d8] /home/mdcallag/b/orig578/bin/mysqld[0x7407de] /home/mdcallag/b/orig578/bin/mysqld(_Z6fil_ioRK9IORequestbRK9page_id_tRK11page_size_tmmPvS8_+0x6e4)[0x1162bb4] /home/mdcallag/b/orig578/bin/mysqld[0x1119640] /home/mdcallag/b/orig578/bin/mysqld(_Z13buf_read_pageRK9page_id_tRK11page_size_t+0x3c)[0x111a6ac] /home/mdcallag/b/orig578/bin/mysqld(_Z16buf_page_get_genRK9page_id_tRK11page_size_tmP11buf_block_tmPKcmP5mtr_tb+0x413)[0x10e7fc3] /home/mdcallag/b/orig578/bin/mysqld[0x114baf7] /home/mdcallag/b/orig578/bin/mysqld[0x1153aee] /home/mdcallag/b/orig578/bin/mysqld(_Z17dict_stats_updateP12dict_table_t23dict_stats_upd_option_t+0x135d)[0x1156e6d] /home/mdcallag/b/orig578/bin/mysqld(dict_stats_thread+0x9fc)[0x115956c] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f28a2f27182] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f28a243447d]
[1 Oct 2015 15:07]
Mark Callaghan
my.cnf for the test
Attachment: my.cnf.578 (application/octet-stream, text), 2.66 KiB.
[1 Oct 2015 15:08]
Mark Callaghan
Reproduction requires many disk reads during the test. I configure linkbench to have a database several times larger than the InnoDB buffer pool. In this case, 2G buffer pool and 10G database. To estimate linkbench database size with compression in bytes, multiply value of maxid1 by 1000
[2 Oct 2015 2:06]
Mark Callaghan
The problem is permanent. Reading that page after crash recover leads to the same crash.
[3 Oct 2015 12:22]
MySQL Verification Team
Hello Mark, Thank you for the report. Thanks, Umesh
[3 Oct 2015 12:24]
MySQL Verification Team
test results
Attachment: 78672.results (application/octet-stream, text), 30.49 KiB.
[4 Oct 2015 23:01]
Sunny Bains
The config disables the double write buffer: innodb_doublewrite=0 # innodb_page_size=8K innodb_buffer_pool_size=2G 1. This config setting cannot detect a torn write. The fact that the old compression doesn't get a torn write on an 8K page is pure luck. In theory it is susceptible to the same problem. With transparent page compression the problem is much easier to reproduce with a 32K page size, for obvious reasons. 2. On NVMFS + FusionIO HW you can disable the double write buffer because it provides atomic writes. AFAIK, EXT4, XFS and BTRFS don't provide atomic writes. Therefore, the double write buffer should be enabled, with or without the transparent page compression.
[4 Oct 2015 23:07]
Sunny Bains
The only thing I'm missing here is whether a crash was involved. If there was no crash involved then the double write buffer setting is irrelevant, at least for a read before the crash.
[5 Oct 2015 4:23]
MySQL Verification Team
Didn't notice any crash on xfs(with maxtime 3600 and later default): [umshastr@hod03]/export/umesh/linkbench-master: bin/linkbench -r . . INFO 2015-10-04 15:10:25,287 [Thread-3]: ThreadID = 3 DELETE_NODE totalOps = 86366 totalErrors = 0 ops = 868 sampleDuration = 999189ms samples = 868 mean = 12891.52 min = 155 25% = 6206 50% = 11152 75% = 17038 90% = 25951 95% = 32541 99% = 45756 max = 113034 INFO 2015-10-04 15:10:25,287 [Thread-3]: ThreadID = 3 GET_NODE totalOps = 1111931 totalErrors = 0 ops = 11383 sampleDuration = 999189ms samples = 10000 mean = 12277.17 min = 120 25% = 5656 50% = 10564 75% = 16690 90% = 24615 95% = 31162 99% = 47615 max = 195655 INFO 2015-10-04 15:10:25,287 [Thread-3]: ThreadID = 3 total requests = 8595457 requests/second = 85 found = 46148 not found = 52369 history queries = 1946/4358664 INFO 2015-10-04 15:10:25,289 [main]: ADD_NODE count = 884524 p25 = [0.2,0.3]ms p50 = [0.2,0.3]ms p75 = [0.2,0.3]ms p95 = [0.3,0.4]ms p99 = [2,3]ms max = 483.734ms mean = 0.373ms INFO 2015-10-04 15:10:25,289 [main]: UPDATE_NODE count = 2532939 p25 = [6,7]ms p50 = [11,12]ms p75 = [18,19]ms p95 = [36,37]ms p99 = [57,58]ms max = 624.624ms mean = 13.991ms INFO 2015-10-04 15:10:25,289 [main]: DELETE_NODE count = 347117 p25 = [6,7]ms p50 = [10,11]ms p75 = [17,18]ms p95 = [32,33]ms p99 = [49,50]ms max = 375.484ms mean = 12.924ms INFO 2015-10-04 15:10:25,290 [main]: GET_NODE count = 4445488 p25 = [5,6]ms p50 = [10,11]ms p75 = [16,17]ms p95 = [31,32]ms p99 = [48,49]ms max = 669.06ms mean = 12.431ms INFO 2015-10-04 15:10:25,290 [main]: ADD_LINK count = 3092623 p25 = [8,9]ms p50 = [21,22]ms p75 = [34,35]ms p95 = [59,60]ms p99 = [84,85]ms max = 562.296ms mean = 23.555ms INFO 2015-10-04 15:10:25,290 [main]: DELETE_LINK count = 1029435 p25 = [5,6]ms p50 = [17,18]ms p75 = [30,31]ms p95 = [58,59]ms p99 = [83,84]ms max = 527.884ms mean = 21.132ms INFO 2015-10-04 15:10:25,290 [main]: UPDATE_LINK count = 2754881 p25 = [8,9]ms p50 = [21,22]ms p75 = [35,36]ms p95 = [63,64]ms p99 = [91,92]ms max = 1169.624ms mean = 24.399ms INFO 2015-10-04 15:10:25,291 [main]: COUNT_LINK count = 1680726 p25 = [0.1,0.2]ms p50 = [6,7]ms p75 = [12,13]ms p95 = [26,27]ms p99 = [42,43]ms max = 338.408ms mean = 8.107ms INFO 2015-10-04 15:10:25,291 [main]: MULTIGET_LINK count = 181457 p25 = [0.2,0.3]ms p50 = [0.3,0.4]ms p75 = [15,16]ms p95 = [34,35]ms p99 = [55,56]ms max = 417.181ms mean = 8.82ms INFO 2015-10-04 15:10:25,291 [main]: GET_LINKS_LIST count = 17432274 p25 = [0.2,0.3]ms p50 = [0.3,0.4]ms p75 = [11,12]ms p95 = [28,29]ms p99 = [45,46]ms max = 1852.828ms mean = 7.117ms INFO 2015-10-04 15:10:25,291 [main]: REQUEST PHASE COMPLETED. 34381464 requests done in 100001 seconds. Requests/second = 343 [umshastr@hod03]/export/umesh/linkbench-master: re-ran with default settings(requester/loader and maxtime) to see if I see any crash on xfs.
[5 Oct 2015 18:23]
Mark Callaghan
The only crashes are from assert failures from innodb. The first is from fil_io as described in this bug.
[6 Oct 2015 5:24]
Sunny Bains
1. I've been unable to crash 5.7.9 on OEL7 with EXT4. I've run the test over 5 times now. 2. XFS testing remains #1 could be due to some bugs fixed after the 5.7.8 cut off.
[12 Oct 2015 12:53]
MySQL Verification Team
Noticed that 5.7 latest source is crashing as in 5.7.8, attaching the worklog shortly.
[12 Oct 2015 12:54]
MySQL Verification Team
Performed similar steps with 5.7.10 as done in earlier case for 5.7.8
Attachment: 78672_5.7.10.results (application/octet-stream, text), 100.28 KiB.