Bug #21638 InnoDB: possible segfault in page_copy_rec_list_end_no_locks
Submitted: 14 Aug 2006 22:42 Modified: 18 Jun 2010 12:48
Reporter: Jeremy Rumpf Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:5.0.24 OS:Linux (Linux)
Assigned to: Marko Mäkelä CPU Architecture:Any
Tags: innodb corruption crashing

[14 Aug 2006 22:42] Jeremy Rumpf
Description:
I have a server that routinely segfaults in page_copy_red_list_end_no_locks().

Background:

I have two production servers, and 'A' and a 'B' (master slave). The 'A' machine was built from a dump of our old production server running mysql 3.5.23. The 'B' machine was then synced to the 'A' machine via mysqldump --master-data=1 --single-transaction.

The two servers are identical:

  DELL PowerEdge 2850
  2 x Intel Xeon 3Ghz
  2GB of Memory
  Database on three RAID 1 Mirror Sets

  CentOS 4.3

  Linux db10-a 2.6.9-34.0.2.ELsmp #1 SMP Fri Jul 7 19:52:49 CDT 2006 i686 i686 i386 GNU/Linux

  lrwxrwxrwx  1 root root 13 Jul 26 12:03 /lib/libc.so.6 -> libc-2.3.4.so
  -rwxr-xr-x  1 root root 1438668 Mar  7 23:08 /lib/libc-2.3.4.so
  -rw-r--r--  1 root root 2415476 Mar  7 21:23 /usr/lib/libc.a
  -rw-r--r--  1 root root 204 Mar  7 20:32 /usr/lib/libc.so

  mysql-5.0.24-debug (MySQL Community Edition - Debug (GPL))

Each server houses two databases with about 150GB of data (measured datafile size). Both databases are fully InnoDB. The primary server ('A') runs about 450 connections at peak and averages about 200Qps.

Both servers run precompiled mysql-debug-5.0.24-linux-i686-glibc23 downloaded from mysql.com

Issue:

Two days after the 'A' server went live, it crashed with this stack trace:

 0x817b8f2 handle_segfault + 368
 0xcdf888 (?)
 0x314e (?)
 0x82e0dcf btr_page_reorganize + 22
 0x82bb620 ibuf_insert_to_index_page + 675
 0x82bc8bb ibuf_merge_or_delete_for_page + 2012
 0x8310a18 buf_page_io_complete + 445
 0x8337009 fil_aio_wait + 511
 0x82a085a io_handler_thread + 28
 0xcd9371 (?)
 0x2e59be (?)

The server failed to start as InnoDB could not roll the logs on. After failing over to the 'B' server, I immediately took a dump of the 'B' machine to reload 'A' with mysqldump --master-server=1 --single-transaction.

The dump loaded into the 'A' machine just fine, but after configuring replication and letting it sync for awhile, it crashed again with the same stack trace.

Figuring it could be a bad transaction that was being replicated over, I took another dump of 'B' and attempted to reload 'A'. Again, the server crashed while the replication threads were syncing.

Now I think it may be hardware related (memory?), so I swapped out the 'A' server with an identical spare we had. Reloaded from the dump of 'B', again crashed while the replication threads were syncing.

So to debug the issue, I finally got a core dump with a backtrace that reads:

 #0  0x0051c7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
 #1  0x0030394b in pthread_kill () from /lib/tls/libpthread.so.0
 #2  0x08279d63 in write_core (sig=5584) at stacktrace.c:220
 #3  0x0817b9b5 in handle_segfault (sig=11) at mysqld.cc:2115
 #4  <signal handler called>
 #5  0x083464ba in page_copy_rec_list_end_no_locks (new_page=0x9525c000 "Lu\005\233", page=0x95260000 "Lu\005\233",
    rec=0x95260063 "infimum", index=0xb235e668, mtr=0x7e3beeb0) at  page0page.c:519
 #6  0x082e0bfd in btr_page_reorganize_low (recovery=0, page=0x9525c000 "Lu\005\233", index=0xb235e668, mtr=0x7e3beeb0)
    at ../include/page0page.ic:189
 #7  0x082e0dcf in btr_page_reorganize (page=0x9525c000 "Lu\005\233", index=0xb235e668, mtr=0x7e3beeb0) at btr0btr.c:920
 #8  0x082bb620 in ibuf_insert_to_index_page (entry=0xb2368aa8, page=0x9525c000 "Lu\005\233", index=0xb235e668,
    mtr=0x7e3beeb0) at ibuf0ibuf.c:2854
 #9  0x082bc8bb in ibuf_merge_or_delete_for_page (page=0x9525c000 "Lu\005\233", space=362, page_no=67259,
    update_ibuf_bitmap=1) at ibuf0ibuf.c:3220
 #10 0x08310a18 in buf_page_io_complete (block=0x814a5598) at buf0buf.c:1887
 #11 0x08337009 in fil_aio_wait (segment=2) at fil0fil.c:4155
 #12 0x082a085a in io_handler_thread (arg=0x0) at srv0start.c:446
 #13 0x00300371 in start_thread () from /lib/tls/libpthread.so.0
 #14 0x00cd29be in clone () from /lib/tls/libc.so.6

/etc/my.cnf:

[manager]
user=mysql
monitoring-interval = 2
port = 2273
bind-address = 127.0.0.1
socket=/tmp/mysqlmanager.sock
pid-file=/raid/base/mysql/mysqlmanager.pid
#default-mysqld-path = /usr/local/mysql/libexec/mysqld
default-mysqld-path = /usr/local/mysql/bin/mysqld

[mysql.server]
use-manager

[mysqld]
port                    = 3306
socket                  = /tmp/mysql.sock
back_log                = 20
max_connections         = 510
max_allowed_packet      = 16M
interactive_timeout     = 86400
wait_timeout            = 86400

#SLAVE server AUTO_INCREMENTS are odd [1,3,5,7]
server_id                = 10
auto_increment_increment = 2
auto_increment_offset    = 1

log_error               = /raid/base/mysql/db10-a.err
log_bin                 = /raid/base/mysql/binlog/binlog
#log-slave-updates
#replicate-same-server-id = 0
max_binlog_size         = 512M
max_binlog_cache_size   = 4090M
datadir                 = /raid/base/mysql
master-info-file        = /raid/base/mysql/relay/master.info
relay-log               = /raid/base/mysql/relay/relay
relay-log-index         = /raid/base/mysql/relay/relay.index
relay-log-info-file     = /raid/base/mysql/relay/relay.info
max_relay_log_size      = 512M
sync-binlog             = 1
log_slow_queries
long_query_time         = 10
tmpdir                  = /raid/base/mysql
default_table_type      = innodb
transaction_isolation   = REPEATABLE-READ
skip-bdb
skip-locking

# Global per connection buffers
binlog_cache_size       = 2M
sort_buffer_size        = 1M
join_buffer_size        = 1M

# Other Global buffers
table_cache             = 512
thread_cache            = 8
query_cache_size        = 64M
query_cache_limit       = 8M
query_alloc_block_size  = 32k
transaction_alloc_block_size = 32768
transaction_prealloc_size    = 32768
thread_stack            = 196k
max_heap_table_size     = 64M
tmp_table_size          = 16M

# MyISAM buffers
key_buffer_size         = 8M
read_buffer_size        = 256k
read_rnd_buffer_size    = 256k

# INNODB buffers
innodb_support_xa               = off
innodb_additional_mem_pool_size = 32M
innodb_buffer_pool_size         = 768M
innodb_file_io_threads          = 4
innodb_log_buffer_size          = 8M
innodb_flush_log_at_trx_commit  = 1
innodb_flush_method             = fdatasync
innodb_max_dirty_pages_pct      = 90
innodb_lock_wait_timeout        = 60

# INNODB files
innodb_file_per_table
innodb_autoextend_increment     = 200
innodb_data_home_dir            = /raid/base/mysql/innodb/datafiles
innodb_data_file_path           = 0000/sys0000.ibd:4096M;0001/sys0001.ibd:4096M
innodb_log_group_home_dir       = /raid/base/mysql/innodb/logfiles
innodb_log_file_size            = 512M
innodb_log_files_in_group       = 3
innodb_log_arch_dir             = /raid/base/mysql/innodb/logfiles
innodb_log_archive              = 0
innodb_open_files               = 1024

Any help or ideas to workaround the issue are welcome. I can provide a full core dump and any other information upon request. At this point I'm not sure if there's a magic SQL statement that will trigger it. So far it has died syncing different tables each time.

Thanks,
Jeremy

How to repeat:
For now, I just load a dump of our 'B' server and configure replication to sync up the 'A' server. The server will segfault with the above while rolling on the transactions from the 'B' server.
[15 Aug 2006 4:44] Jeremy Rumpf
gdb "backtrace full"

Attachment: core.5584.bt (application/octet-stream, text), 7.23 KiB.

[15 Aug 2006 18:33] Jeremy Rumpf
Error log entrie

Attachment: core.5584.err (application/octet-stream, text), 2.58 KiB.

[17 Aug 2006 22:36] Jeremy Rumpf
Today, 08-17-2006 the 'B' server crashed in identical fashion to the 'A' server that was problematic before.

The internal mysql stack trace for the crash resolved as:

0x817b8f2 handle_segfault + 368
0x6d4888 (?)
0x386d (?)
0x82e0dcf btr_page_reorganize + 22
0x82bb620 ibuf_insert_to_index_page + 675
0x82bc8bb ibuf_merge_or_delete_for_page + 2012
0x8310a18 buf_page_io_complete + 445
0x83164ad buf_read_page_low + 453
0x8316e5a buf_read_ibuf_merge_pages + 166
0x82b9929 ibuf_contract_ext + 1215
0x82b997e ibuf_contract_for_n_pages + 39
0x829fc18 srv_master_thread + 982
0x6ce371 (?)
0x4349be (?)

Unfortunately, I could not capture a core dump as the server wasn't configured with core-file. When the server was restarted after the crash, it did catch a signal 11 when trying to process the logs. From there I configured the server for a core dump and captured a dump of the segfaulting restart. NOTE: The core is from an InnoDB recovery restart after the crash, not the crash itself.

core gdb stack trace:

#0  0x0067b7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x0020f94b in pthread_kill () from /lib/tls/libpthread.so.0
#2  0x08279d63 in write_core (sig=13146) at stacktrace.c:220
#3  0x0817b9b5 in handle_segfault (sig=11) at mysqld.cc:2115
#4  <signal handler called>
#5  0x083464ba in page_copy_rec_list_end_no_locks (new_page=0x81ed0000 "�u", page=0x81ed4000 "�u",
    rec=0x81ed4063 "infimum", index=0xb2056168, mtr=0x7b0f38a0) at page0page.c:519
#6  0x082e0bfd in btr_page_reorganize_low (recovery=0, page=0x81ed0000 "�u", index=0xb2056168,
    mtr=0x7b0f38a0) at ../include/page0page.ic:189
#7  0x082e0dcf in btr_page_reorganize (page=0x81ed0000 "�u", index=0xb2056168, mtr=0x7b0f38a0)
    at btr0btr.c:920
#8  0x082bb620 in ibuf_insert_to_index_page (entry=0xb2056d00, page=0x81ed0000 "�u", index=0xb2056168,
    mtr=0x7b0f38a0) at ibuf0ibuf.c:2854
#9  0x082bc8bb in ibuf_merge_or_delete_for_page (page=0x81ed0000 "�u", space=985, page_no=68784,
    update_ibuf_bitmap=1) at ibuf0ibuf.c:3220
#10 0x08310a18 in buf_page_io_complete (block=0x80caec18) at buf0buf.c:1887
#11 0x083164ad in buf_read_page_low (err=0x7b0f3e28, sync=1, mode=3839924, space=985,
    tablespace_version=-9166254503735853056, offset=68784) at buf0rea.c:152
#12 0x08316e5a in buf_read_ibuf_merge_pages (sync=1, space_ids=0x7b0f4310, space_versions=0x7b0f42d0,
    page_nos=0x7b0f4330, n_stored=1) at buf0rea.c:630
#13 0x082b9929 in ibuf_contract_ext (n_pages=0x7b0f43f8, sync=1) at ibuf0ibuf.c:2165
#14 0x082b997e in ibuf_contract_for_n_pages (sync=1, n_pages=5) at ibuf0ibuf.c:2212
#15 0x0829fb43 in srv_master_thread (arg=0x0) at srv0srv.c:2285
#16 0x0020c371 in start_thread () from /lib/tls/libpthread.so.0
#17 0x003499be in clone () from /lib/tls/libc.so.6

Since my second production server just crashed, I'll be forced to move things back to MyISAM and reload both servers. My ability to debug the issue further is greatly diminished.
[17 Aug 2006 23:03] Jeremy Rumpf
gdb "backtrace full" from the Innodb recovery

Attachment: core.13146.bt (application/octet-stream, text), 7.94 KiB.

[17 Aug 2006 23:07] Jeremy Rumpf
mysql error log from initial 'B' server crash

Attachment: b_crash.err (application/octet-stream, text), 3.09 KiB.

[17 Aug 2006 23:09] Jeremy Rumpf
mysql error log for Innodb recovery core file

Attachment: core.13146.err (application/octet-stream, text), 4.31 KiB.

[23 Aug 2006 13:05] Marko Mäkelä
Jeremy,

Can you please resend the error log, including the page dumps? We really need them to track down the cause of the assertion failure.
[23 Aug 2006 15:33] Heikki Tuuri
Hi!

It is the ut_error below that fires because page_cur_rec_insert() does not have enough space available on the page. A page reorganize should never fail in lack of space!

InnoDB printed a hex dump of the old page and the new page. We need those hex dumps to find out what went wrong.

Regards,

Heikki

        for (;;) {
                rec_t*  cur1_rec = page_cur_get_rec(&cur1);
                if (cur1_rec == sup) {
                        break;
                }
                offsets = rec_get_offsets(cur1_rec, index, offsets,
                                        ULINT_UNDEFINED, &heap);
                if (UNIV_UNLIKELY(!page_cur_rec_insert(&cur2, cur1_rec, index,
                                                        offsets, mtr))) {
                        /* Track an assertion failure reported on the mailing
                        list on June 18th, 2003 */

                        buf_page_print(new_page);
                        buf_page_print(page);
                        ut_print_timestamp(stderr);

                        fprintf(stderr,
"InnoDB: rec offset %lu, cur1 offset %lu, cur2 offset %lu\n",
                              (ulong)(rec - page),
                              (ulong)(page_cur_get_rec(&cur1) - page),
                              (ulong)(page_cur_get_rec(&cur2) - new_page));
                        ut_error;
[23 Aug 2006 15:38] Heikki Tuuri
Jeremy,

"
InnoDB: Page may be an index page where index id is 0 3429
InnoDB: (index schedule_detail_idx1 of table ccms/schedule_detail)
"

please also print SHOW CREATE TABLE ccms.schedule_detail

Regards,

Heikki
[23 Aug 2006 17:38] Jeremy Rumpf
Heikki,

Create info follows:

CREATE TABLE `schedule_detail` (
  `ident` int(11) NOT NULL auto_increment,
  `alt_ident` char(32) default NULL,
  `sched_source_ident` int(11) NOT NULL default '0',
  `sched_type_ident` int(11) NOT NULL default '0',
  `employee_ident` int(11) NOT NULL default '0',
  `manager_ident` int(11) NOT NULL default '0',
  `program_ident` int(11) NOT NULL default '0',
  `location_ident` int(11) NOT NULL default '0',
  `gmt_time_in` bigint(21) NOT NULL default '0',
  `gmt_time_out` bigint(21) NOT NULL default '0',
  `status` char(1) NOT NULL default 'E',
  `updated` bigint(20) NOT NULL default '0',
  `updated_ident` int(11) NOT NULL default '0',
  `created` bigint(20) NOT NULL default '0',
  `created_ident` int(11) NOT NULL default '0',
  PRIMARY KEY  (`ident`),
  KEY `schedule_detail_idx1` (`alt_ident`(17)),
  KEY `schedule_detail_idx2` (`sched_source_ident`),
  KEY `schedule_detail_idx3` (`sched_type_ident`),
  KEY `schedule_detail_idx4` (`employee_ident`),
  KEY `schedule_detail_idx5` (`manager_ident`),
  KEY `schedule_detail_idx6` (`program_ident`),
  KEY `schedule_detail_idx7` (`location_ident`),
  KEY `schedule_detail_idx8` (`status`),
  KEY `schedule_detail_idx9` (`gmt_time_in`,`gmt_time_out`),
  KEY `schedule_detail_idx10` (`employee_ident`,`gmt_time_in`,`gmt_time_out`),
  KEY `schedule_detail_idx11` (`manager_ident`,`gmt_time_in`,`gmt_time_out`),
  KEY `schedule_detail_idx12` (`program_ident`,`gmt_time_in`,`gmt_time_out`),
  KEY `schedule_detail_idx13` (`location_ident`,`gmt_time_in`,`gmt_time_out`)
) ENGINE=MyISAM AUTO_INCREMENT=3351584 DEFAULT CHARSET=latin1

The engine right now is MyISAM, as we've migrated back. But it was the same when we were on InnoDB. There are no foreign key constraints because we migrated from  mysql 3.5.23 on MyISAM directly to 5.0.24 on InnoDB. When converting we basically just use a sed script to search and replace ENGINE=MyISAM with ENGINE=InnoDB.

Thanks,
Jeremy
[23 Aug 2006 17:44] Jeremy Rumpf
full error log for A server core 5584 per Heikki

Attachment: core.5584.err.full (text/plain), 99.88 KiB.

[23 Aug 2006 19:01] Jeremy Rumpf
gzipped full error log for B server core 13146 per Heikki

Attachment: core.13146.err.full.gz (application/x-gzip, text), 63.05 KiB.

[23 Aug 2006 20:56] Jeremy Rumpf
Don't know if this information is valuable or not. From the core file core.5844 (A server crash), thread 6 was performing an insert from the relay logs on the schedule_detail table. The backtrace looks something like:

#18 0x081983d6 in mysql_parse (thd=0xa270068,
    inBuf=0x78521951 "INSERT INTO schedule_detail (\n
    sched_source_ident, sched_type_ident, employee_ident, manager_ident, program_ident, \n
    location_ident, gmt_time_in, gmt_time_out, status, updated, updated_ident, \n                       
    created, created_ident\n
     ) VALUES ( \n
     '104', \n
     '109', \n
     '84880', \n
     '80191', \n
     '178', \n
     '1', \n
     '1155225600', \n
     '1155227400', \n
     'E', \n
     '0', \n
     '0', \n
     '1155222693', \n
     '80191'\n
     )", length=170328232) at sql_parse.cc:5817
     lex = (LEX *) 0xa2700a8
     _db_func_ = 0x8478f5f "handle_slave_sql"
     _db_file_ = 0x8478850 "slave.cc"
     _db_level_ = 2
     _db_framep_ = (char **) 0x7898e2b8
#19 0x081f902f in Query_log_event::exec_event (this=0x785029b0, rli=0xa2200c4,
    query_arg=0x78521951 "INSERT INTO schedule_detail (\n
    sched_source_ident, sched_type_ident, employee_ident, manager_ident, program_ident, \n
    location_ident, gmt_time_in, gmt_time_out, status, updated, updated_ident, \n                       
    created, created_ident\n
    ) VALUES ( \n
    '104', \n
    '109', \n
    '84880', \n
    '80191', \n
    '178', \n
    '1', \n
    '1155225600', \n
    '1155227400', \n
    'E', \n
    '0', \n
    '0', \n
    '1155222693', \n
    '80191'\n
    )", q_len_arg=4294967292) at log_event.cc:1775
    new_db = 0xfffffffc <Address 0xfffffffc out of bounds>
    expected_error = 0
    actual_error = 2018519472
#20 0x081f8b26 in Query_log_event::exec_event (this=0xfffffffc, rli=0xa2200c4) at log_event.cc:1657
No locals.
#21 0x0826c49b in exec_relay_log_event (thd=0xa270068, rli=0xa2200c4) at slave.cc:3326
        type_code = 170328168
        exec_res = 170328168
        ev = (class Log_event *) 0x785029b0
#22 0x0826d8ba in handle_slave_sql (arg=0xa21f268) at slave.cc:3893
        thd = (class THD *) 0xa270068
        llbuff = "298\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
        llbuff1 = "29301478\000\000\000\000\000\000\000\000\000\000\000\000\000"
        rli = (RELAY_LOG_INFO *) 0xa2200c4
        errmsg = 0x0
        _db_func_ = 0x84a9674 "?func"
        _db_file_ = 0x84a967a "?file"
        _db_level_ = 1
---Type <return> to continue, or q <return> to quit---
        _db_framep_ = (char **) 0x0
        eta = 170000580
#23 0x00300371 in start_thread () from /lib/tls/libpthread.so.0
No symbol table info available.
#24 0x00cd29be in clone () from /lib/tls/libc.so.6

Thanks,
Jeremy
[24 Aug 2006 8:36] Marko Mäkelä
Jeremy,

I will investigate the hexdumps.  For what it is worth, I converted them back to binary with the following one-liner:

perl -e 'undef $/;$_=<>;print pack("H*",$_)'

The 20 bytes " be used when approp" in each page dump look suspicious, but they are at PAGE_BTR_SEG_LEAF and PAGE_BTR_SEG_TOP, which according to page0page.h are unused on other pages than the B-tree root.
[24 Aug 2006 9:27] Marko Mäkelä
Jeremy,

the crash occurs when InnoDB is merging data from the insert buffer to the index schedule_detail_idx1 of the table ccms.schedule_detail.  The insert buffer merge runs in a background thread.  That is why the crash is not related to any particular SQL statement.

It would be easier for us to repeat this bug if you could send an SQL dump of the contents of the table.

A work-around might be ENGINE=InnoDB ROW_FORMAT=REDUNDANT, but I am not yet sure if the crash is related to the compact page format introduced in MySQL 5.0.
[24 Aug 2006 10:01] Marko Mäkelä
I managed to load "page" (the second hexdump) from http://bugs.mysql.com/file.php?id=4192
to a gdb process where I created the table and set a breakpoint to page_cur_insert_rec_low().

I loaded the binary data to a buf_frame_alloc()ed block by calling memcpy() from gdb. This is how I converted the binary to little-endian 32-bit ints assumed by gdb:

(echo '{'; od -w16 -t x1 | sed -ne 's/ \(..\) \(..\) \(..\) \(..\)/0x\4\3\2\1,/g;s/^[0-9]*0x/0x/p'; echo '}';)

When I call page_validate() on the page, it reports corruption in page_simple_validate() as follows:

InnoDB: Dir slot does not point to right rec 99

The record at pos 99 is the predefined infimum record.  The page directory seems corrupted, because the slot points to 0x0100.

I think that we really need to have a complete test case in order to track the bug down. The corruption has probably happened much earlier. It may have been caused by faulty hardware or operating system drivers or a bug in InnoDB.

I'll attach the page_print() output for the page. From that output, we can see that the page directory slots 0 through 7 are corrupted.  The offsets should lie between 99 and 16383.
[24 Aug 2006 10:04] Marko Mäkelä
page_print(page,index) from core.5584.err.full

Attachment: core.5584.err.full-page.txt.gz (application/gzip, text), 61.32 KiB.

[24 Aug 2006 13:41] Marko Mäkelä
Sorry, please disregard my analysis made with gdb.  I must have made some mistake when copying the data. The start of the page directory in the hex dump looks okay (starting 8 bytes backwards from the end of the page, 2 bytes for each slot in big endian format). Still, my hypothesis is that the source page is somehow corrupted,  e.g., the records overlap.
[24 Aug 2006 15:54] Heikki Tuuri
Hi!

I have now analyzed the 'new page' hex dump from server A.

The below record is the last in the initial heap of 10 byte long records (where the char(17) field is SQL NULL).

01000e80000a                       header 6 bytes
80309cd2                           primary key int field 4 bytes

We can find the above record also from the old page, as the very last record in the heap!

00040e88002a                       header 6 bytes
2d39383939373431383333393520202020 char field 17 bytes
80305b1f                           primary key int field 4 bytes

We can find the above record from old page.

01001590001b                       header ? THIS IS A FRAGMENT OF A RECORD IN OLD PAGE HEAP
80305f19                           int field ?
3937343139                         garbage ?

00000e90002a                       header 6 bytes
2d39383939373431393236303420202020 char field 17 bytes
80305b19                           int field

01041720001b                       GARBAGE
80305b1b
3937343139

It looks like that at the page reorganize, InnoDB believes that records where the char(17) field is not SQL NULL are 0x2a bytes long, that is, 42 bytes! Though they should be only 6 + 17 + 4 = 27 bytes long. In the old page there are many records where the 'relative pointer' in the record header has a sensible value 0x1b = 27.

Maybe InnoDB does not have the correct index definition when it starts the page reorganization?

If this is done in an insert buffer merge operation on the page, what guarantees that InnoDB has the right definition for the index?

In the compact table format, InnoDB needs the index definition to be able to find the records and their fields on an index page.

Regards,

Heikki
[24 Aug 2006 15:57] Heikki Tuuri
Putting this to the 'Analyzing' state.
[24 Aug 2006 16:11] Heikki Tuuri
Ok, now I can guess where that 42 bytes comes from:

6 header
32 char field
4 int

That is, in reorganize, InnoDB has forgotten that the index was just on the first 17 bytes of the CHAR column! This is probably a bug in how InnoDB stores the index definition for column prefix indexes in the compact table format.
[24 Aug 2006 16:59] Heikki Tuuri
Yes, the bug is that the index entry dtuple that we originally want to insert to the index onloy has the COLUMN dtypes stored into it. Then only that type information goes to the ibuf entry that is created for that index dtuple.

Fix: maybe we could in dict_index_copy_types() adjust the len of the type if it is a column prefix index? Can this break code in some other place?

dict0mem.h:

/* Data structure for a field in an index */
struct dict_field_struct{
        dict_col_t*     col;            /* pointer to the table column */
        const char*     name;           /* name of the column */
        ulint           order;          /* flags for ordering this field:
                                        DICT_DESCEND, ... */
        ulint           prefix_len;     /* 0 or the length of the column
                                        prefix in bytes in a MySQL index of
                                        type, e.g., INDEX (textcol(25));
                                        must be smaller than
                                        DICT_MAX_INDEX_COL_LEN; NOTE that
                                        in the UTF-8 charset, MySQL sets this
                                        to 3 * the prefix len in UTF-8 chars */
        ulint           fixed_len;      /* 0 or the fixed length of the
                                        column if smaller than
                                        DICT_MAX_INDEX_COL_LEN */
        ulint           fixed_offs;     /* offset to the field, or
                                        ULINT_UNDEFINED if it is not fixed
                                        within the record (due to preceding
                                        variable-length fields) */
};

dict0dict.c:

/***********************************************************************
Copies types of fields contained in index to tuple. */

void
dict_index_copy_types(
/*==================*/
        dtuple_t*       tuple,          /* in: data tuple */
        dict_index_t*   index,          /* in: index */
        ulint           n_fields)       /* in: number of field types to copy */
{
        dtype_t*        dfield_type;
        dtype_t*        type;
        ulint           i;

        if (UNIV_UNLIKELY(index->type & DICT_UNIVERSAL)) {
                dtuple_set_types_binary(tuple, n_fields);

                return;
        }

        for (i = 0; i < n_fields; i++) {
                dfield_type = dfield_get_type(dtuple_get_nth_field(tuple, i));
                type = dict_col_get_type(dict_field_get_col(
                                dict_index_get_nth_field(index, i)));
                *dfield_type = *type;
        }
}

ibuf0ibuf.c:

        for (i = 0; i < n_fields; i++) {
                /* We add 4 below because we have the 4 extra fields at the
                start of an ibuf record */

                field = dtuple_get_nth_field(tuple, i + 4);
                entry_field = dtuple_get_nth_field(entry, i);
                dfield_copy(field, entry_field);

                dtype_new_store_for_order_and_null_size(
                                buf2 + i * DATA_NEW_ORDER_NULL_TYPE_BUF_SIZE,
                                dfield_get_type(entry_field));
        }
[24 Aug 2006 21:59] Heikki Tuuri
Note that in versions <= 4.1 there was the following inefficiency: for a fixed length n column, for an SQL NULL value of a field in a record, InnoDB always reserved n bytes, even if the field would have been in a column prefix index of length < n! That was because the dtype in dtuples did not take into account column prefixes.
[25 Aug 2006 15:47] Heikki Tuuri
Raising this to P1 and S1 because this is equivalent to database corruption. Fortunately, the bug is not likely to hit many users.
[5 Sep 2006 13:03] Marko Mäkelä
Patch for Bug #21638 in MySQL/InnoDB 5.0

Attachment: bug21638-5.0-4.patch (text/x-patch), 3.88 KiB.

[5 Sep 2006 13:08] Marko Mäkelä
I attached a patch to this bug. Unfortunately, it will probably miss the 5.0.25 release. Also, I have been unable to reproduce the situation (a B-tree page reorganize occurring during insert buffer merge of a secondary index containing a prefix index of a fixed-length CHAR column).

Jeremy, I would appreciate it if you could apply the patch and see if the bug occurs. Also, the bug should not occur on the unpatched MySQL 5.0.24 when you specify ROW_FORMAT=REDUNDANT at the end of the CREATE TABLE statement.
[21 Sep 2006 10:42] Timothy Smith
Patch queued to -maint team tree (expect to push to 5.0.26)
[3 Oct 2006 20:01] Chad MILLER
Available in 5.0.26.
[3 Oct 2006 20:15] Chad MILLER
Available in 5.1.12-beta.
[4 Oct 2006 19:33] Marko Mäkelä
Proposed changelog entry:

When records are merged from the insert buffer and the page needs to be reorganized, InnoDB used incorrect column length information when interpreting the records of the page. This caused server crash because of apparent corruption of secondary indexes in ROW_FORMAT=COMPACT that contain prefix indexes of fixed-length columns. Data files should not be corrupted, but the crash was likely to repeat every time the server was restarted.
[4 Oct 2006 20:20] Paul DuBois
Marko, thanks.

Noted in 5.0.26, 5.1.12 changelogs.
[24 May 2007 12:39] Marko Mäkelä
The fix of this bug seems to have caused Bug #28138. We must review the code more carefully and try to come up with a repeatable test case for this bug.
[25 May 2007 12:40] Marko Mäkelä
I re-analyzed the first two page dumps from core.13146.err.full.gz. Before reorganization, there are 824 user records. During reorganization, 728 records could be inserted into the new page until a buffer overflow occurred.

The first 459 records on the page are 10 bytes long, because when the column `alt_ident` is NULL, we need 1+5 bytes for the record header (1 for storing the "is null" flag), and 4 bytes for the primary key (the auto-increment column `ident`). The rest of the records on the reorganized page are 42 bytes long (1+5 bytes for header, and apparently 32+4 bytes for data).  The 32 comes from the length of the column `alt_ident`, although it should be 17, the length of the prefix index.

I did not resolve the record lengths on the original page yet, but I would assume that there are 459 NULL records (length 10) and 365 non-NULL records (length 27), total data size 459*10+365*27 = 14445 bytes.

The correct solution seems to be that we will have to treat prefix indexes of fixed-length columns as if they were variable-length columns. Alternatively, we would have to disable the insert buffer on this kind of indexes.

I am trying to generate a repeatable test case for this bug.
[28 May 2007 13:12] Marko Mäkelä
Unfortunately, we have been unable to create a repeatable test case for this bug. It looks like this bug can only occur in a prefix index of a fixed-length CHAR column (not UTF-8) that may be NULL.

When a record is inserted via the insert buffer, we need to record some index information, so that the insert buffer record can be merged. In the index information, the length of the prefix index column was stored wrong. It was the length of the original column, although it should have been the length of the prefix.

The fix of Bug #21638 introduced Bug #28138, because the insert buffer would also declare prefix indexes of variable-length columns as fixed-length ones.
[4 Jun 2007 10:43] Marko Mäkelä
The fix of this bug (Bug #21638) introduced Bug #28283, which affects the insert buffering of prefix indexes of variable-length columns.

Bug #28283 was fixed without affecting the storage and interpretation of fixed-length columns in the insert buffer. Both bugs cause corruption of prefix indexes. CHECK TABLE should note the corruption.
[5 May 2010 15:17] Bugs System
Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[6 May 2010 2:24] Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug. Re-closing.
[28 May 2010 6:02] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)
[28 May 2010 6:31] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[28 May 2010 6:59] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[29 May 2010 2:49] Paul DuBois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.
[17 Jun 2010 12:06] Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:51] Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:34] Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)