Bug #34109 Seg fault in dict_accept with innodb and partitioned tables
Submitted: 28 Jan 2008 18:57 Modified: 7 Mar 2008 14:02
Reporter: S Kister Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:MySQL-community-debuginfo-5.1.22-0.rhel5 OS:Linux (CentOS release 5 (Final))
Assigned to: CPU Architecture:Any
Tags: innodb partition centos5 rhel5 5.1.22 replication

[28 Jan 2008 18:57] S Kister
Description:
I set up a new InnoDB database with a single database and partitioned table. Within a few minutes of turning on replication, the database starts crashing every few seconds. If I restart it with skip-innodb = 1, it stays up.

I wiped out the database and recreated it again. I started replication from the current position and ran into the same error.

Here is what was installed
MySQL-client-community-5.1.22-0.rhel5.i386.rpm
MySQL-community-debuginfo-5.1.22-0.rhel5.i386.rpm

Here is the info from the error log.

080128 10:37:52 mysqld_safe Number of processes running now: 0
080128 10:37:52 mysqld_safe mysqld restarted
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
080128 10:37:52  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
080128 10:37:52  InnoDB: Started; log sequence number 0 144769
080128 10:37:52 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=/var/lib/mysql/ace-relay-bin' to avoid this problem.
080128 10:37:52 [Note] Event Scheduler: Loaded 0 events
080128 10:37:52 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.1.22-rc-community-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Edition (GPL)
080128 10:37:52 [Note] Slave SQL thread initialized, starting replication in log 'pair-bin.000005' at position 35073, relay log '/var/lib/mysql/ace-relay-bin.000002' position: 35217
080128 10:37:52 [Note] Slave I/O thread: connected to master 'rep@pair:3306',replication started in log 'pair-bin.000005' at position 2733880
080128 10:37:52 - mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388600
read_buffer_size=131072
max_used_connections=0
max_threads=50
threads_connected=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 117273 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x9fe21e0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x910ab6c8, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x821601b
0x83c457b
0x83c45f2
0x842d20c
0x83a2730
0x83a1f9a
0x82c0f75
0x82c1001
0x82c167e
0x8325e99
0xcfb302
0xc5338e
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/refman/5.1/en/resolve-stack-dump.html
and follow instructions on how to resolve the stack trace.
Resolved stack trace is much more helpful in diagnosing the
problem, so please do resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at (nil)  is invalid pointer
thd->thread_id=2
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.

I then installed mysqld-debugging from
MySQL-community-debuginfo-5.1.22-0.rhel5.i386.rpm

and ran it in gdb per the instructions at http://dev.mysql.com/doc/refman/5.0/en/crashing.html
I'll attach the gdb information

How to repeat:
I set up a new InnoDB database with a single database and partitioned table.

CREATE TABLE ldc
 (lid BIGINT(12) NOT NULL
 ,did TINYINT(2) NOT NULL
 ,value VARCHAR(120) NOT NULL
 ,update_time INT(10) NOT NULL
 ,month TINYINT(2) NOT NULL DEFAULT 0
 ,score INT(3) DEFAULT 0
 ,sid INT(15)        
) ENGINE=INNODB
 PARTITION BY HASH (month)
 PARTITIONS 12;

ALTER TABLE ldc DROP PRIMARY KEY; 
ALTER TABLE ldc ADD 
  CONSTRAINT pk_ldc
  PRIMARY KEY (did, value, lid, month);

Within a few minutes of turning on replication, the database starts crashing every few seconds.

Suggested fix:
No segfault
[28 Jan 2008 19:00] S Kister
gdb stack trace

Attachment: gdb.34109 (application/octet-stream, text), 8.47 KiB.

[29 Jan 2008 9:07] Susanne Ebrecht
Many thanks for writing a bug report.
Do you use statement or row based replication?

What else did you do. Besides creating and altering the table?
[29 Jan 2008 20:28] S Kister
I used the default replication, which is MBR according to the documentation.

Here are the exact repro steps on the slave:

/etc/my.cnf
[mysqld]
datadir = /var/lib/mysql
socket = /var/lib/mysql/mysql.sock
innodb_file_per_table
transaction-isolation = READ-COMMITTED
server-id=9
slave-skip-errors=1062,1064
replicate-do-table=duplicate_1.listing_duplicate_content
#skip-innodb = 1

[mysql.server]
user=mysql
basedir=/usr

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

# yum -y remove mysql-server
# rm -rf /var/lib/mysql/
# rpm -Uvh MySQL-server-community-5.1.22-0.rhel5.i386.rpm 
# rpm -Uvh MySQL-client-community-5.1.22-0.rhel5.i386.rpm 
# mysqladmin -u root password 'root'
# mysql -uroot -proot
mysql> CREATE DATABASE duplicate_1 DEFAULT CHARACTER SET utf8;
mysql> change master to master_host='pair', master_user='rep', master_password='xxx', master_log_file='pair-bin.000006', master_log_pos=106;
Query OK, 0 rows affected (0.01 sec)

mysql> slave start;
Query OK, 0 rows affected (0.01 sec)

On the master, I did a shut down/restart to create a new replication file. I then did a create table and let the application insert/update/delete rowss. I'll attach the relay log. It only took 6 minutes to crash. The master replication logfile is 8Mb (1.6MB compressed), I can attach that as well if you need it
[29 Jan 2008 20:29] S Kister
ace-relay-bin.000001

Attachment: ace-relay-bin.000001 (application/octet-stream, text), 153 bytes.

[29 Jan 2008 20:29] S Kister
ace-relay-bin.000002.bz2

Attachment: ace-relay-bin.000002.bz2 (application/x-bzip2, text), 296.60 KiB.

[29 Jan 2008 20:35] S Kister
One other note, the primary database leaks a lot of memory. There is just the single table as described in the original bug description, and just single row inserts/update/deletes. I have tried different OS's, mysql 4, 5.0, 5.1 and even 6 and it still leaks memory. I haven't filed a bug on that yet, but it leaks almost 1GB per day and crashes the database every few days. I was hoping to see if the replica leaks to have a reproducible case with the relay logs before filing a bug.

I thought I would mention that in case it is related to this problem.
[31 Jan 2008 14:00] Susanne Ebrecht
What do you mean with leak, exactly?

Did you made the "CREATE DATABASE" on the slave? Usually you should do this on the master. Maybe you did it, and it just looks wrong here.
[1 Feb 2008 2:48] S Kister
Never mind on the memory leak comment. I reproduced the crash using a table that does not leak memory, so that is not the issue.

I did run the create database on the slave. The master already had that database created.
[14 Feb 2008 5:07] Valeriy Kravchuk
Please, try to repeat with a newer version, 5.1.23-rc, and inform about the results.
[28 Feb 2008 7:58] S Kister
It still crashes in 5.1.23

# mysqladmin --version
mysqladmin  Ver 8.42 Distrib 5.1.23-rc, for redhat-linux-gnu on i686

mysql.log:
080227 23:59:12 mysqld_safe Number of processes running now: 0
080227 23:59:12 mysqld_safe mysqld restarted
080227 23:59:12  InnoDB: Started; log sequence number 4 3108266933
080227 23:59:12 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=ace-relay-bin' to avoid this problem.
080227 23:59:12 [Note] Event Scheduler: Loaded 0 events
080227 23:59:12 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.1.23-rc-community'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Edition (GPL)
080227 23:59:12 [Note] Slave SQL thread initialized, starting replication in log 'jack-bin.000022' at position 106, relay log './ace-relay-bin.000002' position: 250
080227 23:59:12 [Note] Slave I/O thread: connected to master 'rep@jack:3306',replication started in log 'jack-bin.000023' at position 23733
080227 23:59:12 - mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388600
read_buffer_size=131072
max_used_connections=0
max_threads=151
threads_connected=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 337676 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x9cd3ca8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xb4acf528, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x8209375
0x83ae498
0x83aee73
0x840fd97
Stack trace seems successful - bottom reached
Please read http://dev.mysql.com/doc/refman/5.1/en/resolve-stack-dump.html
and follow instructions on how to resolve the stack trace.
Resolved stack trace is much more helpful in diagnosing the
problem, so please do resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at (nil)  is invalid pointer
thd->thread_id=2
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
[7 Mar 2008 14:02] Susanne Ebrecht
Many thanks for writing a bug report.

Unfortunately I can't repeat this behaviour on Debian.
Maybe this is Red Hat specific.

Please feel free to open this bug again, when you know how we could reproduce this issue.