MySQL Bugs: #50713: mysqld got signal 11 , segfault

Bug #50713	mysqld got signal 11 , segfault
Submitted:	29 Jan 2010 6:42	Modified:	25 Aug 2010 16:56
Reporter:	Alec Matusis	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: General	Severity:	S1 (Critical)
Version:	5.0.87, 5.0.90	OS:	Linux (2.6.31 Ubuntu 9.10-server)
Assigned to:		CPU Architecture:	Any
Tags:	segfault, segmentation fault

Description:
When setting up a new slave (copied InnoDB .frm .ibd ib_logfile0 ib_logfile1, etc from another slave), and starting new slave, mysql crashes like this:

InnoDB: The InnoDB memory heap is disabled
InnoDB: use atomic builtins.
InnoDB: Log scan progressed past the checkpoint lsn 345 3140792695
100128 21:31:33  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 345 3146035200
InnoDB: Doing recovery: scanned up to log sequence number 345 3151278080
InnoDB: Doing recovery: scanned up to log sequence number 345 3156520960
InnoDB: Doing recovery: scanned up to log sequence number 345 3161763840
InnoDB: Doing recovery: scanned up to log sequence number 345 3167006720
InnoDB: Doing recovery: scanned up to log sequence number 345 3172249600
InnoDB: Doing recovery: scanned up to log sequence number 345 3177492480
InnoDB: Doing recovery: scanned up to log sequence number 345 3182735360
InnoDB: Doing recovery: scanned up to log sequence number 345 3187978240
InnoDB: Doing recovery: scanned up to log sequence number 345 3193221120
InnoDB: Doing recovery: scanned up to log sequence number 345 3198464000
InnoDB: Doing recovery: scanned up to log sequence number 345 3203706880
InnoDB: Doing recovery: scanned up to log sequence number 345 3203925571
100128 21:31:37  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 100128 21:31:38 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_connections=1024
threads_connected=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 8519680 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=(nil)
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x40000, backtrace may not be correct.
Bogus stack limit or frame pointer, fp=0x40000, stack_bottom=0x2afd969e0000, thread_stack=262144, aborting backtrace.

This slave is identical to 4 other slaves that have been successfully configured this exact way, EXPECT that the other slaves were on Ubuntu 8.04-server (2.6.24), and this one is on 2.6.31

Some additional info:

root@db6:/log6# free
             total       used       free     shared    buffers     cached
Mem:      33006912   12679028   20327884          0     264644   11361056
-/+ buffers/cache:    1053328   31953584
Swap:      4883752          0    4883752

root@db6:/log6# cat /proc/cpuinfo 
....
processor       : 15
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz

identical my.cnf to other slaves that work

strace:

.....

24023 futex(0x3770220, FUTEX_WAKE_PRIVATE, 1) = 0
24023 futex(0x377025c, FUTEX_WAIT_PRIVATE, 1603, NULL <unfinished ...>
24033 <... pread resumed> "\221t\1\30\0\0Gf\0\0\r\200\0\0\22\264\0\0\1T`} `E\277\0\0\0\0\0\0"..., 16384, 299466752) = 16384
24033 futex(0x377025c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x3770220, 1604) = 1
24033 sched_yield( <unfinished ...>
24023 <... futex resumed> )             = 0
24023 futex(0x3770220, FUTEX_WAKE_PRIVATE, 1) = 0
24023 futex(0x377025c, FUTEX_WAIT_PRIVATE, 1605, NULL <unfinished ...>
24032 <... pread resumed> "\226\337)u\0\0@\1\0\0\0\0\0\0\0\0\0\0\1Y\272*\244\353\0\0\0\0\0\0\0\0"..., 16384, 268451840) = 16384
24032 pread(20,  <unfinished ...>
24033 <... sched_yield resumed> )       = 0
24033 futex(0x1f2bd78c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
24034 <... pread resumed> ")\6Ai\0\0Gi\0\0003?\0\0<\362\0\0\1Yg8.\333E\277\0\0\0\0\0\0"..., 16384, 299515904) = 16384
24034 futex(0x377025c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x3770220, 1606) = 1
24034 sched_yield( <unfinished ...>
24023 <... futex resumed> )             = 0
24023 futex(0x3770220, FUTEX_WAKE_PRIVATE, 1) = 0
24023 futex(0x377025c, FUTEX_WAIT_PRIVATE, 1607, NULL <unfinished ...>
24034 <... sched_yield resumed> )       = 0
24034 futex(0x1f2bd78c, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
24029 <... pread resumed> ")-3\r\0\0Gt\0\0\6\306\0\0\31\245\0\0\1Yk\212\246\324E\277\0\0\0\0\0\0"..., 16384, 299696128) = 16384
24029 futex(0x377025c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x3770220, 1608) = 1
24029 sched_yield( <unfinished ...>
24023 <... futex resumed> )             = 0
24023 futex(0x3770220, FUTEX_WAKE_PRIVATE, 1) = 0
24023 futex(0x377025c, FUTEX_WAIT_PRIVATE, 1609, NULL <unfinished ...>
24028 <... pread resumed> "\263^\240M\0\0@\244\0\0#\356\0\0\1$\0\0\1Y\2715\354\357E\277\0\0\0\0\0\0"..., 16384, 271122432) = 16384
24028 futex(0x377025c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x3770220, 1610) = 1
24028 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
24028 write(2, "InnoDB: Log scan progressed past"..., 2634) = 2634
24028 exit_group(1)                     = ?

another attempt:
 
.....

24084 futex(0x318225c, FUTEX_WAIT_PRIVATE, 1677, NULL <unfinished ...>
24095 <... pread resumed> "\221t\1\30\0\0Gf\0\0\r\200\0\0\22\264\0\0\1T`} `E\277\0\0\0\0\0\0"..., 16384, 299466752) = 16384
24095 futex(0x318225c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x3182220, 1678) = 1
24095 sched_yield( <unfinished ...>
24091 <... pread resumed> "\226\337)u\0\0@\1\0\0\0\0\0\0\0\0\0\0\1Y\272*\244\353\0\0\0\0\0\0\0\0"..., 16384, 268451840) = 16384
24091 pread(20,  <unfinished ...>
24084 <... futex resumed> )             = 0
24084 futex(0x3182220, FUTEX_WAKE_PRIVATE, 1) = 0
24084 futex(0x318225c, FUTEX_WAIT_PRIVATE, 1679, NULL <unfinished ...>
24095 <... sched_yield resumed> )       = 0
24095 futex(0x1eccfc0c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
24090 <... pread resumed> ")\6Ai\0\0Gi\0\0003?\0\0<\362\0\0\1Yg8.\333E\277\0\0\0\0\0\0"..., 16384, 299515904) = 16384
24090 futex(0x318225c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x3182220, 1680) = 1
24090 sched_yield( <unfinished ...>
24084 <... futex resumed> )             = 0
24084 futex(0x3182220, FUTEX_WAKE_PRIVATE, 1) = 0
24084 futex(0x318225c, FUTEX_WAIT_PRIVATE, 1681, NULL <unfinished ...>
24090 <... sched_yield resumed> )       = 0
24090 futex(0x1eccfc0c, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
24093 <... pread resumed> "\263^\240M\0\0@\244\0\0#\356\0\0\1$\0\0\1Y\2715\354\357E\277\0\0\0\0\0\0"..., 16384, 271122432) = 16384
24093 futex(0x318225c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x3182220, 1682) = 1
24093 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
24093 write(2, "InnoDB: Log scan progressed past"..., 2634) = 2634
24093 exit_group(1) 

Without copying InnodDB tables from another slave, mysql starts normally. InnoDB tables can then be created from SQL statements.

How to repeat:
copy InnoDB tables from a working slave, start mysql.

Suggested fix:
downgrading to ubuntu 8.04 2.6.24 fixes it.

I was mistaken about rolling back to 2.6.24 fixing this. This occurs even on 2.6.24 (Ubuntu 8.04). I tried 5.0.90, the bug still occurs. I copied this data from another slave, still the same bug.

Thank you for the report.

Do you have option --innodb-file-per-table enabled? Please provide your configuration file?

--innodb-file-per-table is enabled 
/etc/my.cnf :

[client]
port            = 3306
socket          = /tmp/mysql.sock
[mysqld]
user            = mysql
port            = 3306
socket          = /tmp/mysql.sock
skip-locking
innodb_file_per_table
key_buffer = 6400M
table_cache = 2048
sort_buffer_size = 8M
myisam_sort_buffer_size = 8M
max_binlog_cache_size = 4G
max_join_size = 4294967295
max_seeks_for_key = 4294967295
max_write_lock_count = 4294967295
myisam_max_sort_file_size = 2G
datadir = /data6
log-error = /log6/mysqld.log
query_cache_type = 1
query_cache_size = 64M
log-slow-queries
long_query_time = 1
thread_cache_size = 32
max_connections = 1024
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_max_dirty_pages_pct = 25
innodb_support_xa = 0
innodb_thread_concurrency = 32
innodb_commit_concurrency = 64
innodb_concurrency_tickets = 8
log-bin = /log6/db6-log-bin
server-id = 10
relay-log = /log6/db6-relay-bin
skip-slave-start
log-slave-updates
replicate-do-db    = im
tmpdir = /data6/tmp
innodb_data_home_dir = /data6
innodb_log_group_home_dir = /log6
innodb_log_arch_dir = /arch6
innodb_buffer_pool_size = 19200M
innodb_log_file_size = 50M
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[isamchk]
key_buffer = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M
[myisamchk]
key_buffer = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout

Thank you for the feedback.

> --innodb-file-per-table is enabled 

Have you copied whole datadir or only ib* files? Have you placed all *ibd files into subdirectories named same like in the original locations?

>Have you copied whole datadir or only ib* files? 
>Have you placed all *ibd files into subdirectories named same like in the original locations?

Yes, whole data directory including all .ibd, .frm and .opt files were copied and put in the exact same locations with rsync. ib* files were also copied.

We cloned MANY slaves in the past using this exact copying process, and we saw the crash for the first time on this specific dataset.

Thank you for the feedback.

> We cloned MANY slaves in the past using this exact copying process, and we saw the crash
for the first time on this specific dataset.

Have you verified this specific dataset for hardware errors? Specifically hard disk and memory.

Alec, what kind of server do you have?  Are you on the latest bios revision?

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".