Bug #4552 mysql crash from load data replication
Submitted: 14 Jul 2004 21:49 Modified: 7 Aug 2004 14:11
Reporter: Dathan Pattishall Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:4.0.18 OS:Linux (Linux dbuser1.gbxsc.friendster.c)
Assigned to: Guilhem Bichot CPU Architecture:Any

[14 Jul 2004 21:49] Dathan Pattishall
Description:
LOAD DATA INFILE "/tmp/photos.txt" INTO TABLE photos;

Query OK, 14771608 rows affected (16 min 4.63 sec)

Records: 14771608  Deleted: 0  Skipped: 0  Warnings: 0

 

mysql> update users set pendingtestimonials=-1, photocount=-1, newmsgcount=-1;

Query OK, 7323178 rows affected (10 min 40.70 sec)

Rows matched: 9985237  Changed: 7323178  Warnings: 0

Done one the master and it killed all the slaves.

How to repeat:
Run innodb on the master
Run innodb on the slave

execute with no other load a load data for this table structure

CREATE TABLE `photos` (
  `id` int(10) NOT NULL auto_increment,
  `user` int(10) NOT NULL default '0',
  `random` int(5) default NULL,
  `moderation` tinyint(1) default NULL,
  `lastupd` timestamp(14) NOT NULL,
  `caption` varchar(255) default NULL,
  PRIMARY KEY  (`id`),
  KEY `user` (`user`),
  KEY `moderation` (`moderation`)
) TYPE=InnoDB

Here is the my.cnf

on the slaves

mysqld]
port        = 3306
socket      = /var/lib/mysql/mysql.sock
skip-external-locking
skip-name-resolve
key_buffer=128M
max_allowed_packet=16M
table_cache=512
tmp_table_size  = 64M
sort_buffer_size=10M
read_buffer_size=4M
read_rnd_buffer_size=4M
thread_cache=20
myisam_sort_buffer_size=32M
transaction-isolation = READ-UNCOMMITTED
query_cache_type = 1
query_cache_size = 64M

# Replication settings (must be unique across master and all slaves)
server-id   = %SERVERID%

# Table Filters (only replicate these tables - only used for slow-read slaves)
slave-skip-errors=1062
replicate-ignore-table=friendster.friends
replicate-ignore-table=friendster.potentialfriends
replicate-ignore-table=friendster.testimonials

# Create slow query log (WARNING: can consume a lot of disk on slow-read slaves)
long_query_time=10
max_connections=300
max_connect_errors=999999
wait_timeout=120

# Point the following paths to different dedicated disks
tmpdir          = /tmp/
relay-log       = /var/lib/mysql/dbslave-relay-bin
master-info-file    = /var/lib/mysql/master.info
relay-log-index     = /var/lib/mysql/dbslave-relay-bin.index
relay-log-info-file = /var/lib/mysql/relay-log.info

# Uncomment the following if you are using InnoDB tables
innodb_data_home_dir = /var/lib/mysql/
innodb_data_file_path = ibdata1:10240M:autoextend
innodb_log_group_home_dir = /var/lib/mysql/
innodb_log_arch_dir = /var/lib/mysql/
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
innodb_buffer_pool_size=1700M
innodb_additional_mem_pool_size=20M
# Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size=512M
innodb_log_buffer_size=8M
# deadlock detection to roll back transactions
innodb_lock_wait_timeout = 50
# How many system threads inside innodb
innodb_flush_log_at_trx_commit=2
default-table-type=innodb
innodb_fast_shutdown=1
innodb_thread_concurrency=32
#innodb_max_dirty_pages_pct=20
[15 Jul 2004 0:13] Guilhem Bichot
Dathan, my dear customer,
"it killed all the slaves" is not sufficient info for me to have an idea of what happened.
Please, could you copy-paste here the error message(s) that was possibly printed into the slave's error log or SHOW SLAVE STATUS?
Thanks!
Guilhem
[15 Jul 2004 0:17] Dathan Pattishall
This was the msg mysql left

040713  3:18:52  Slave I/O thread: connected to master 'dbslave@dbmstr1:3306',  replication started in log
 'db-bin.078' at position 515
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=134217728
read_buffer_size=4190208
max_used_connections=46
max_connections=300
threads_connected=11
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 4430669 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

this is a 64 bit sever with 8 GB of ram
[15 Jul 2004 0:20] Dathan Pattishall
Additionally to fix the problem

I deleted dbslave-bin.* and relay-log.info and incremented the master.info to past the load data contiguous blocks of data, to the next mysql statement.
[17 Jul 2004 23:50] Guilhem Bichot
Testing now, with an InnoDB table of the suggested structure, loading 14771608 random rows in it... (takes time)
[18 Jul 2004 0:50] Guilhem Bichot
Hi Dathan.

So I generated a file with 14771608 rows, and loaded it in an InnoDB table having the same structure as your 'photos', on the master. It took quite a while, and finally the _master_ crashed when it was almost done with the LOAD DATA; the file to load was 470 MB and when the master crashed, the slave's temporary file (for replication of LOAD DATA INFILE) was 435 MB. Master said this (it's long to read, but interesting): after a while of LOAD DATA it started to print:
040718  0:22:27  InnoDB: WARNING: over 4 / 5 of the buffer pool is occupied by
InnoDB: lock heaps or the adaptive hash index! Check that your
InnoDB: transactions do not set too many row locks.
InnoDB: Your buffer pool size is 16 MB. Maybe you should make
InnoDB: the buffer pool bigger?
InnoDB: Starting the InnoDB Monitor to print diagnostics, including
InnoDB: lock heap and hash index sizes.

=====================================
040718  0:22:41 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 16 seconds
----------
SEMAPHORES
----------
<I cut, there are many rows>
----------------------------
END OF INNODB MONITOR OUTPUT
============================
040718  0:37:38  InnoDB: ERROR: over 9 / 10 of the buffer pool is occupied by
InnoDB: lock heaps or the adaptive hash index! Check that your
InnoDB: transactions do not set too many row locks.
InnoDB: Your buffer pool size is 16 MB. Maybe you should make
InnoDB: the buffer pool bigger?
InnoDB: We intentionally generate a seg fault to print a stack trace
InnoDB: on Linux!
040718  0:37:38InnoDB: Assertion failure in thread 196621 in file buf0lru.c line 238
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. See section 6.1 of
InnoDB: http://www.innodb.com/ibman.php about forcing recovery.
InnoDB: Thread 81926 stopped in file os0file.c line 2904
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388600
read_buffer_size=131072
max_used_connections=1
max_connections=100
threads_connected=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 225791 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x8a3e520
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x44009138, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x81124b8
0x4002d4ec
0x829dd5e
0x829947d
0x8299e6d
0x829ff5e
0x82a02af
0x82997d4
0x827fd25
0x8260e34
0x82612b8
0x82613e7
0x826155a
0x8261667
0x8262b4f
0x8194008
0x8156556
0x81a4048
0x81a377f
0x8123d5e
0x8125a54
0x81209f8
0x81203c1
0x811f993
0x40028600
0x40308a37
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://www.mysql.com/doc/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x8a84090 = LOAD DATA INFILE "/tmp/photos.txt" INTO TABLE photos
thd->thread_id=3
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.

Sure, my buffer pool is 16MB, whereas your slave has 1.7 GB.
But it shows that your slave's crash *could* have been caused by an assertion failure in InnoDB.
Please, are you sure the error log snippet you posted is complete? It looks unusually short (it does not show a stack trace, which is usually the case when SIGSEGV).
And which package do you use (standard, max, etc, built by us?) ?
Thanks for any precision you could bring.

Guilhem
[19 Jul 2004 20:23] Dathan Pattishall
This is a package built by mysql. This was all that I got out of the error log. What I can do is try to duplcate the problem and enable core. But, that is not likly to happen since I ran out of servers to test on.
[7 Aug 2004 14:11] Guilhem Bichot
Can't repeat, unfortunately. If this happens again, please mark it here. It may also be a good idea to upgrade to 4.0.20.