Bug #34076 repeatable crash in buf_flush_buffered_writes
Submitted: 26 Jan 2008 7:33 Modified: 29 May 2008 9:48
Reporter: River Tarnell Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.0-bk OS:Solaris
Assigned to: Vasil Dimov CPU Architecture:Any

[26 Jan 2008 7:33] River Tarnell
Description:
While importing an SQL dump into a newly initialised server instance, a repeatable crash in InnoDB is observed.  

Two stack traces:

  ---- called from signal handler with signal 11 (SIGSEGV) ------
=>[8] buf_flush_buffered_writes(), line 282 in "buf0flu.c"
  [9] buf_flush_post_to_doublewrite_buf(block = 0x2b6165f38), line 429 in "buf0flu.c"
  [10] buf_flush_write_block_low(block = 0x2b6165f38), line 513 in "buf0flu.c"
  [11] buf_flush_try_page(space = 0, offset = 79229U, flush_type = 3U), line 607 in "buf0flu.c"
  [12] buf_flush_try_neighbors(space = 0, offset = 79178U, flush_type = 3U), line 789 in "buf0flu.c"
  [13] buf_flush_batch(flush_type = 3U, min_n = 18446744073709551614U, lsn_limit = RECORD), line 904 in "buf0flu.c"
  [14] log_preflush_pool_modified_pages(new_oldest = RECORD, sync = 0), line 1589 in "log0log.c"
  [15] log_checkpoint_margin(), line 2100 in "log0log.c"
  [16] log_check_margins(), line 3020 in "log0log.c"
  [17] log_free_check(), line 400 in "log0log.ic"
  [18] row_ins_index_entry_low(mode = 2U, index = 0x143d0b8, entry = 0x14440b8, ext_vec = (nil), n_ext_vec = 0, thr = 0x1450a30), line 1978 in "row0ins.c"
  [19] row_ins_index_entry(index = 0x143d0b8, entry = 0x14440b8, ext_vec = (nil), n_ext_vec = 0, thr = 0x1450a30), line 2168 in "row0ins.c"
  [20] row_ins_index_entry_step(node = 0x1450130, thr = 0x1450a30), line 2246 in "row0ins.c"
  [21] row_ins(node = 0x1450130, thr = 0x1450a30), line 2378 in "row0ins.c"
  [22] row_ins_step(thr = 0x1450a30), line 2482 in "row0ins.c"
  [23] row_insert_for_mysql(mysql_rec = 0x2fb24a268 "\xfc", prebuilt = 0x14370b8), line 1150 in "row0mysql.c"
  [24] ha_innobase::write_row(this = 0x2fb1d9b10, record = 0x2fb24a268 "\xfc"), line 3287 in "ha_innodb.cc"
  [25] write_record(thd = 0x2fb1ac010, table = 0x2fb1d6050, info = 0xfffffd7ffcffcdc0), line 1501 in "sql_insert.cc"
  [26] mysql_insert(thd = 0x2fb1ac010, table_list = 0x2fb1f7080, fields = CLASS, values_list = CLASS, update_fields = CLASS, update_values = CLASS, duplic = DUP_ERROR, ignore = false), line 821 in "sql_insert.cc"
  [27] mysql_execute_command(thd = 0x2fb1ac010), line 3597 in "sql_parse.cc"
  [28] mysql_parse(thd = 0x2fb1ac010, inBuf = 0x2fda69020 "INSERT INTO `archive` VALUES (0,'Disaster_risk_reduction','','[[WP:AWB|AWB assisted]] clean up + tagging',178836,'Bluemoose','20060115161252',0,'',35280309,35140541,0,NULL,NULL),(0,'Disaster_risk_reduction','','Changing {{wikify}} to {{wikify-date|January 2006}}',112114,'Pearle','20060403055313',1,'',46703628,46501037,0,NULL,NULL),(0,'Disaster_risk_reduction','','{{copyvio | url=http://acds.co.za/dvnPhdnew2.pdf}}See page 22 (Section 1.3.1) ~~~~',716829,'Copysan','20060512211131',0,'',52898304,52655892,0,NUL" ..., length = 1039226U, found_semicolon = 0xfffffd7ffcfff368), line 6174 in "sql_parse.cc"
  [29] dispatch_command(command = COM_QUERY, thd = 0x2fb1ac010, packet = 0x2fc948011 "INSERT INTO `archive` VALUES (0,'Disaster_risk_reduction','','[[WP:AWB|AWB assisted]] clean up + tagging',178836,'Bluemoose','20060115161252',0,'',35280309,35140541,0,NULL,NULL),(0,'Disaster_risk_reduction','','Changing {{wikify}} to {{wikify-date|January 2006}}',112114,'Pearle','20060403055313',1,'',46703628,46501037,0,NULL,NULL),(0,'Disaster_risk_reduction','','{{copyvio | url=http://acds.co.za/dvnPhdnew2.pdf}}See page 22 (Section 1.3.1) ~~~~',716829,'Copysan','20060512211131',0,'',52898304,52655892,0,NUL" ..., packet_length = 1039227U), line 1889 in "sql_parse.cc"
  [30] do_command(thd = 0x2fb1ac010), line 1595 in "sql_parse.cc"
  [31] handle_one_connection(arg = 0x2fb1f1010), line 1201 in "sql_parse.cc"
  [32] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffedc0edb 
  [33] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffedc1110 

  ---- called from signal handler with signal 11 (SIGSEGV) ------
  [8] buf_flush_buffered_writes(), line 282 in "buf0flu.c"
  [9] buf_flush_post_to_doublewrite_buf(block = 0x2cc3d4530), line 429 in "buf0flu.c"
  [10] buf_flush_write_block_low(block = 0x2cc3d4530), line 513 in "buf0flu.c"
  [11] buf_flush_try_page(space = 0, offset = 642975U, flush_type = 3U), line 607 in "buf0flu.c"
  [12] buf_flush_try_neighbors(space = 0, offset = 642944U, flush_type = 3U), line 789 in "buf0flu.c"
  [13] buf_flush_batch(flush_type = 3U, min_n = 18446744073709551614U, lsn_limit = RECORD), line 904 in "buf0flu.c"
  [14] log_preflush_pool_modified_pages(new_oldest = RECORD, sync = 0), line 1589 in "log0log.c"
  [15] log_checkpoint_margin(), line 2100 in "log0log.c"
  [16] log_check_margins(), line 3020 in "log0log.c"
  [17] log_free_check(), line 400 in "log0log.ic"
  [18] row_ins_index_entry_low(mode = 2U, index = 0x2fb1f5aa0, entry = 0x2fb476f00, ext_vec = (nil), n_ext_vec = 0, thr = 0x2fb83fb70), line 1978 in "row0ins.c"
  [19] row_ins_index_entry(index = 0x2fb1f5aa0, entry = 0x2fb476f00, ext_vec = (nil), n_ext_vec = 0, thr = 0x2fb83fb70), line 2168 in "row0ins.c"
  [20] row_ins_index_entry_step(node = 0x2fc0ba118, thr = 0x2fb83fb70), line 2246 in "row0ins.c"
  [21] row_ins(node = 0x2fc0ba118, thr = 0x2fb83fb70), line 2378 in "row0ins.c"
  [22] row_ins_step(thr = 0x2fb83fb70), line 2482 in "row0ins.c"
  [23] row_insert_for_mysql(mysql_rec = 0x2fc342228 "\xfd\xff5^Bde", prebuilt = 0x2fcad2120), line 1150 in "row0mysql.c"
  [24] ha_innobase::write_row(this = 0x2fd160f50, record = 0x2fc342228 "\xfd\xff5^Bde"), line 3287 in "ha_innodb.cc"
  [25] write_record(thd = 0x2fb44e010, table = 0x2fdc46550, info = 0xfffffd7ff2dfcdc0), line 1501 in "sql_insert.cc"
  [26] mysql_insert(thd = 0x2fb44e010, table_list = 0x2fb3ff088, fields = CLASS, values_list = CLASS, update_fields = CLASS, update_values = CLASS, duplic = DUP_ERROR, ignore = false), line 821 in "sql_insert.cc"
  [27] mysql_execute_command(thd = 0x2fb44e010), line 3597 in "sql_parse.cc"
  [28] mysql_parse(thd = 0x2fb44e010, inBuf = 0x2fc446020 "INSERT INTO `pub_CSV_test3_old5` VALUES ('de','Kau-ur',NULL,'Kau-ur',NULL,NULL,NULL,NULL,NULL,'Kaur',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,13.6991666666667,-15.3327777777778,'city',2107,0,'GM','C','','',3270,'city m','[[Kategori','','',''),('de','Finnetunnel',NULL,'Finnetunnel',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,51.1755555555556,11.5322222222222,'landmark',0,0,'TH','','','',3270,'tunnel','[[Kategori','','',''),('de','Stadlauer Ostbahnbr\xc3\x83\xc2\xbccke',NULL,'Stadlauer " ..., length = 1043626U, found_semicolon = 0xfffffd7ff2dff368), line 6174 in "sql_parse.cc"
  [29] dispatch_command(command = COM_QUERY, thd = 0x2fb44e010, packet = 0x2fe968011 "INSERT INTO `pub_CSV_test3_old5` VALUES ('de','Kau-ur',NULL,'Kau-ur',NULL,NULL,NULL,NULL,NULL,'Kaur',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,13.6991666666667,-15.3327777777778,'city',2107,0,'GM','C','','',3270,'city m','[[Kategori','','',''),('de','Finnetunnel',NULL,'Finnetunnel',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,51.1755555555556,11.5322222222222,'landmark',0,0,'TH','','','',3270,'tunnel','[[Kategori','','',''),('de','Stadlauer Ostbahnbr\xc3\x83\xc2\xbccke',NULL,'Stadlauer " ..., packet_length = 1043627U), line 1889 in "sql_parse.cc"
  [30] do_command(thd = 0x2fb44e010), line 1595 in "sql_parse.cc"
  [31] handle_one_connection(arg = 0x2fb5aa010), line 1201 in "sql_parse.cc"
  [32] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffedc0edb 
  [33] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffedc1110 

I would suspect a memory or disk problem, except that the crash occurs at exactly the same point each time, and before the page is written to disk.

I have saved both core files in case more information is needed.

How to repeat:
N/A

Suggested fix:
N/A
[26 Jan 2008 7:45] River Tarnell
Forgot to mention the exact error from the server log:

 InnoDB: Next record offset nonsensical 0 for rec 99
 080126  7:09:18  InnoDB: Page dump in ascii and hex (16384 bytes):
[...]
 080126  7:09:18  InnoDB: Page checksum 3498417198, prior-to-4.0.14-form checksum 1659323021
 InnoDB: stored checksum 3498417198, prior-to-4.0.14-form stored checksum 1659323021
 InnoDB: Page lsn 2 2390998991, low 4 bytes of lsn at page end 2390998991
 InnoDB: Page number (if stored to page already) 642845,
 InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0
 InnoDB: Page may be an index page where index id is 0 0
 080126  7:09:18  InnoDB: Apparent corruption of an index page n:o 642845 in space 0
 InnoDB: to be written to data file. We intentionally crash server
 InnoDB: to prevent corrupt data from ending up in data
 InnoDB: files.
 080126  7:09:18InnoDB: Assertion failure in thread 49 in file buf0flu.c line 282
[26 Jan 2008 7:59] Valeriy Kravchuk
Thank you for a problem report. Some ideas on how to repeat would be useful, though. Please, send the resuls of SHOW CREATE TABLE and SHOW TABLE STATUS for the `pub_CSV_test3_old5` and `archive` tables.
[26 Jan 2008 8:08] River Tarnell
I'm not sure exactly how to repeat it.  The dump files being imported are rather large (many GBs), but I can upload them to ftp.mysql.com if that would help.

Here are the table definitions:

 CREATE TABLE `pub_CSV_test3_old5` (
   `lang` varchar(10) collate utf8_bin NOT NULL,
   `Titel` varchar(180) collate utf8_bin NOT NULL,
   `Titel_en` varchar(90) collate utf8_bin default NULL,
   `Titel_de` varchar(90) collate utf8_bin default NULL,
   `Titel_es` varchar(90) collate utf8_bin default NULL,
   `Titel_fr` varchar(90) collate utf8_bin default NULL,
   `Titel_it` varchar(90) collate utf8_bin default NULL,
   `Titel_ja` varchar(160) collate utf8_bin default NULL,
   `Titel_nl` varchar(90) collate utf8_bin default NULL,
   `Titel_pl` varchar(90) collate utf8_bin default NULL,
   `Titel_pt` varchar(90) collate utf8_bin default NULL,
   `Titel_ru` varchar(200) collate utf8_bin default NULL,
   `Titel_sv` varchar(90) collate utf8_bin default NULL,
   `Titel_fi` varchar(160) collate utf8_bin default NULL,
   `Titel_no` varchar(160) collate utf8_bin default NULL,
   `Titel_eo` varchar(160) collate utf8_bin default NULL,
   `Titel_sk` varchar(160) collate utf8_bin default NULL,
   `Titel_da` varchar(160) collate utf8_bin default NULL,
   `Titel_cs` varchar(160) collate utf8_bin default NULL,
   `lat` double NOT NULL default '0',
   `lon` double NOT NULL default '0',
   `type` varchar(10) collate utf8_bin NOT NULL,
   `pop` double NOT NULL default '0',
   `Height` double NOT NULL default '0',
   `Country` varchar(10) collate utf8_bin NOT NULL,
   `Subregion` varchar(10) collate utf8_bin NOT NULL,
   `Scale` varchar(10) collate utf8_bin NOT NULL,
   `dim` varchar(10) collate utf8_bin NOT NULL,
   `psize` double NOT NULL,
   `style` varchar(12) collate utf8_bin NOT NULL,
   `t` varchar(10) collate utf8_bin NOT NULL,
   `image` varchar(160) collate utf8_bin NOT NULL,
   `imagejpg` varchar(160) collate utf8_bin NOT NULL,
   `name` varchar(160) collate utf8_bin NOT NULL,
   UNIQUE KEY `Titel_en` (`Titel_en`),
   UNIQUE KEY `Titel_de` (`Titel_de`),
   UNIQUE KEY `Titel_es` (`Titel_es`),
   UNIQUE KEY `Titel_fr` (`Titel_fr`),
   UNIQUE KEY `Titel_it` (`Titel_it`),
   UNIQUE KEY `Titel_ja` (`Titel_ja`),
   UNIQUE KEY `Titel_nl` (`Titel_nl`),
   UNIQUE KEY `Titel_pl` (`Titel_pl`),
   UNIQUE KEY `Titel_pt` (`Titel_pt`),
   UNIQUE KEY `Titel_ru` (`Titel_ru`),
   UNIQUE KEY `Titel_sv` (`Titel_sv`),
   UNIQUE KEY `Titel_fi` (`Titel_fi`),
   UNIQUE KEY `Titel_no` (`Titel_no`),
   UNIQUE KEY `Titel_eo` (`Titel_eo`),
   UNIQUE KEY `Titel_sk` (`Titel_sk`),
   UNIQUE KEY `Titel_da` (`Titel_da`),
   UNIQUE KEY `Titel_cs` (`Titel_cs`),
   KEY `Pop` (`pop`),
   KEY `type` (`type`),
   KEY `Character` (`psize`),
   KEY `lang` (`lang`),
   KEY `lon` (`lon`),
   KEY `lat` (`lat`),
   KEY `image` (`image`),
   KEY `Country` (`Country`),
   KEY `Subregion` (`Subregion`),
   KEY `Titel` (`Titel`),
   KEY `imagejpg` (`imagejpg`),
   KEY `name` (`name`),
   KEY `dim` (`dim`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

CREATE TABLE `archive` (
  `ar_namespace` int(11) NOT NULL default '0',
  `ar_title` varchar(255) character set latin1 collate latin1_bin NOT NULL default '',
  `ar_text` mediumblob NOT NULL,
  `ar_comment` tinyblob NOT NULL,
  `ar_user` int(5) unsigned NOT NULL default '0',
  `ar_user_text` varchar(255) character set latin1 collate latin1_bin NOT NULL default '',
  `ar_timestamp` varchar(14) character set latin1 collate latin1_bin NOT NULL default '',
  `ar_minor_edit` tinyint(1) NOT NULL default '0',
  `ar_flags` tinyblob NOT NULL,
  `ar_rev_id` int(8) unsigned default NULL,
  `ar_text_id` int(8) unsigned default NULL,
  `ar_deleted` tinyint(1) unsigned NOT NULL default '0',
  `ar_len` int(8) unsigned default NULL,
  `ar_page_id` int(10) unsigned default NULL,
  KEY `name_title_timestamp` (`ar_namespace`,`ar_title`,`ar_timestamp`),
  KEY `usertext_timestamp` (`ar_user_text`,`ar_timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

(These are from two separate dumps which are being imported at the same time.)
[26 Jan 2008 10:21] Sveta Smirnova
Yes, dump will be really useful. Would be great if you could provide it.

Thanks in advance.
[26 Jan 2008 13:24] River Tarnell
Another backtrace:

  ---- called from signal handler with signal 11 (SIGSEGV) ------
  [8] mach_read_from_1(b = 0xfffffffffffffffd "<bad address 0xfffffffd>"), line 35 in "mach0data.ic"
  [9] rec_get_bit_field_1(rec = (nil), offs = 3U, mask = 7U, shift = 0), line 189 in "rem0rec.ic"
  [10] rec_get_status(rec = (nil)), line 398 in "rem0rec.ic"
  [11] rec_get_offsets_func(rec = (nil), index = 0x2fba17120, offsets = 0xfffffd7ffc3fb630, n_fields = 3U, heap = 0xfffffd7ffc3fb620, file = 0xd88820 "page0cur.c", line = 369U), line 340 in "rem0rec.c"
  [12] page_cur_search_with_match(page = 0x227d60000 "", index = 0x2fba17120, tuple = 0x2fb85d0e0, mode = 4U, iup_matched_fields = 0xfffffd7ffc3fbe40, iup_matched_bytes = 0xfffffd7ffc3fbe48, ilow_matched_fields = 0xfffffd7ffc3fbe50, ilow_matched_bytes = 0xfffffd7ffc3fbe58, cursor = 0xfffffd7ffc3fc780), line 369 in "page0cur.c"

Although the trace is different, the root cause is the same: the next record pointer in the page is 0.  (page_cur_search_with_match iterates over each record until it reaches the 0 record, which it passes to rec_get_offsets.  rec_get_bit_field_1 subtracts offs (3) from the record address (0), which gives -3, or 0xfffffffffffffffd.)

I'll upload the dumps I'm using to reproduce this today - as they contain private data I need to make sure it's okay to do so first.
[28 Jan 2008 13:47] Heikki Tuuri
buf0flu.c in 5.0:

"
        for (i = 0; i < trx_doublewrite->first_free; i++) {

                block = trx_doublewrite->buf_block_arr[i];
                ut_a(block->state == BUF_BLOCK_FILE_PAGE);

                if (mach_read_from_4(block->frame + FIL_PAGE_LSN + 4)
                            != mach_read_from_4(block->frame + UNIV_PAGE_SIZE
                                        - FIL_PAGE_END_LSN_OLD_CHKSUM + 4)) {
                            ut_print_timestamp(stderr);
                            fprintf(stderr,
"  InnoDB: ERROR: The page to be written seems corrupt!\n"
"InnoDB: The lsn fields do not match! Noticed in the buffer pool\n"
"InnoDB: before posting to the doublewrite buffer.\n");
                }

                if (block->check_index_page_at_flush
                                && !page_simple_validate(block->frame)) {

                        buf_page_print(block->frame);

                        ut_print_timestamp(stderr);
                        fprintf(stderr,
        "  InnoDB: Apparent corruption of an index page n:o %lu in space %lu\n"
        "InnoDB: to be written to data file. We intentionally crash server\n"
        "InnoDB: to prevent corrupt data from ending up in data\n"
        "InnoDB: files.\n",
                        (ulong) block->offset, (ulong) block->space);

                        ut_error;
                }
        }
"

page_simple_validate() complains about a nonsensical next record offset.

I have never seen this bug reported before.

If you can give us a table dump that deterministically causes this corruption, it would be very valuable.

Regards,

Heikki
[30 Jan 2008 20:51] River Tarnell
I have uploaded u_kolossos.34076.sql.gz to ftp.mysql.com (1,566,190,485 bytes, MD5 c043fe89d6f5468aeffe8c6096a2b913).  This is the database that seems to be causing the crash: during the import, in all cases pub_CSV_test_old5, or one of the tables close before/after that one, was being imported.  I can't see anything strange about that table other than the large number of indices.  I haven't uploaded the other dumps I was importing, since these don't appear to be related to the crash.

To reproduce the problem, I simply begin importing this database on the affected system (while at the same time doing other work, like importing other dumps).  After 40-60 minutes, the server crashes.

I have started importing the other dumps without this one, and so far the server has been running for 10 hours without problems.

I've failed to reproduce this problem on another system, but I've also spent 3 days trying to find a hardware problem on the affected system, with no luck.  (ZFS reported no disk errors even with SHA256 checksums, there were no ECC errors reported, and memtest and mpmemory were unable to find any defective memory, despite running for many hours.)
[31 Jan 2008 14:21] Heikki Tuuri
Assigning this bug (importing a 1.5 GB zipped table dump!) to Vasil.
--Heikki
[31 Jan 2008 14:24] Heikki Tuuri
River,

you failed to reproduce the crash in another Solaris computer? Did you use the same my.cnf there? The buffer pool size may affect this.

Regards,

Heikki
[31 Jan 2008 15:27] River Tarnell
/etc/my.cnf from clematis

Attachment: clematis.cnf (application/octet-stream, text), 3.08 KiB.

[31 Jan 2008 15:27] River Tarnell
/etc/my.cnf from yarrow

Attachment: yarrow.cnf (application/octet-stream, text), 4.58 KiB.

[31 Jan 2008 15:33] River Tarnell
I've attached both /etc/my.cnf's.  the affected system is yarrow; this is a dual dual-core Xeon system with 16GB RAM (Dell PowerEdge 1950).  the system i failed to reproduce the problem on (clematis) is a dual Opteron with 3GB RAM (Sun V20z); the buffer size is much smaller because of the reduced memory.

The other main difference is the storage; yarrow has 12 15k RPM direct-attached SCSI disks.  clematis has an iSCSI array connected via gigabit ethernet with 16 7,200rpm SATA disks.  On clematis, I ran the test on both the array, and its local storage (two 10k RPM SCSI disks in RAID-1).

Both systems are running Solaris 10 11/06 (update 3), with roughly the same patches installed, although the kernel patch is different:

SunOS yarrow 5.10 Generic_127112-06 i86pc i386 i86pc
SunOS clematis 5.10 Generic_120012-14 i86pc i386 i86pc

I tested clematis with 5.0.45-enterprise; most of the testing on yarrow was using 5.0.51 or 5.0-bk, but i did also reproduce the problem using 5.0.45.

yarrow's my.cnf has innodb_doublewrite disabled at the moment, but it was enabled on both systems while i was testing.
[29 May 2008 9:48] Vasil Dimov
I was unable to repeat this some time ago.
Loading this dump took several hours and did not result in a crash.

Any way to repeat this more deterministically? Can you try to isolate the cause in this 10GB dump? Probably cut the file in 2 5GB pieces and load each of them. It should crash with one of them. Then take than one and split it in 2 2.5GB files and so on.