MySQL Bugs: #1805: Windows MySQL slave crashes and will not restart

Bug #1805	Windows MySQL slave crashes and will not restart
Submitted:	11 Nov 2003 7:54	Modified:	9 Apr 2008 6:38
Reporter:	Jim Taylor	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	4.0.16	OS:	Windows (Windows 2000)
Assigned to:		CPU Architecture:	Any

Description:
Windows 2000 slave, FreeBSD master; both running MySQL 4.0.16. Started replication as per documentation (copy database, etc.) and works fine for 2 hours to 2 days (I've done this twice).  Then MySQL suddenly stops running on Windows machine.  No entry in error log, no message, queries just return "MySQL server has gone away."

MySQL will not restart.  Windows displays "Program Error: mysqld-max-nt.e.exe has generated errors and will be closed by Windows.  You will need to restart the program."
 
MySQL error log contains no record of a problem.  Here's the last entry--it looks like the start-up was successful: 
     031111  9:31:47  InnoDB: Started
     MySql: ready for connections.
     Version: '4.0.16-max-nt'  socket: ''  port: 3306

Rebooting computer and reinstalling MySQL do not help.  

Removing "server-id = 2" line from my.cnf file allows MySQL on slave to start (but it will no longer replicate without a server ID). 

How to repeat:
Start replication and let it run for a while until it dies.

Hi!

There is too little information in the report to study the bug.

Can you retry it on Linux? If mysqld crashes also on Linux, please resolve the stack trace.

Regards,

Heikki

I tried upgrading to the latest "alpha" MySQL release and restarting replication from the beginning. Again, it ran OK for a while (about 15 hours and several thousand transactions) and then died.

Here's the "mysql.err" output for the initial failure and the first attempt to restart (I've changed the server name to "myserver.com" for security reasons):

MySql: ready for connections.
Version: '4.1.0-alpha-max-nt' socket: '' port: 3306
031111 17:30:17 Slave I/O thread: connected to master 'repuser@myserver.com:3306', replication started in log 'FIRST' at position 4
031111 19:43:28 Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
031111 19:43:28 Slave I/O thread: Failed reading log event, reconnecting to retry, log 'myserver-bin.011' position 19675799
MySql: ready for connections.
Version: '4.1.0-alpha-max-nt' socket: '' port: 3306
031112 8:19:04 MySql: Got signal 22. Aborting!

031112 8:19:26 Slave I/O thread: connected to master 'repuser@myserver.com:3306', replication started in log 'myserver-bin.011' at position 108331732
031112 8:19:26 While trying to obtain the list of slaves from the master 'myserver.com:3306', user 'repuser' got the following error: 'Lost connection to MySQL server during query'
031112 8:19:26 Slave I/O thread exiting, read up to log 'myserver-bin.011', position 108331732
MySql: ready for connections.
Version: '4.1.0-alpha-max-nt' socket: '' port: 3306
031112 8:26:45 MySql: Got signal 22. Aborting!

What does "Got signal 22" mean? I could not find any reference to it in the documentation....

Several times during the day, we load large batches of data (hundreds of thousands of records) to our server. The MySQL slave seems to die when one of these loads is in progress (though it successfully processes other loads of the same size). Is there some transaction buffer that could be overflowing or something?

I am configuring my XP box as slave and my Suse box as the master
I would like to see your my.ini/my.cnf files configuration of
course you can change your private data.

One question: your Windows box is running a client application
against the Linux master while the slave is running ?

Below is my.cnf.  It is generic one from MySQL install, except for replication section and some of the "set-variable" lines above it (I set these to match the parameters on the master).  

I've tried my.cnf with and without the SSL commands, makes no difference in slave reliability.  

Master is running FreeBSD, not Linux.  Windows machine with slave database does run client updates against master (mostly Microsoft Access queries via ODBC).  These work fine both while replication is working and after slave crashes. 

Some client queries are large (for example, "Load Data Local Infile" uploading 200,000 records).  Although they run from the slave machine, none of the client queries that update the master use any data from the slave database. And while the slave is running, the records inserted or updated on the master are replicated in the slave database correctly.  

Is there some transaction limit that might be reached that could cause slave to crash after a certain number of client transactions?

I can see how some buffer could fill and cause MySQL to crash; but I can't understand why it won't restart--and yet it doesn't log anything except for the mysterious "Got signal 22. Aborting!"

my.cnf:

# Example mysql config file.
# Copy this file to c:\my.cnf to set global options
# 
# One can use all long options that the program supports.
# Run the program with --help to get a list of available options

# This will be passed to all mysql clients
[client]
#password=my_password
port=3306
#socket=MySQL

# Here is entries for some specific programs
# The following values assume you have at least 32M ram

# The MySQL server
[mysqld]
#log
port=3306
#socket=MySQL
skip-locking
set-variable	= key_buffer=16M
set-variable	= max_allowed_packet=1M
set-variable	= table_cache=64
set-variable	= sort_buffer=512K
set-variable	= net_buffer_length=16K
set-variable	= myisam_sort_buffer_size=8M
set-variable    = read_buffer_size=1M
# REPLICATION SECTION START
#server-id = 2
master-host=myserver.net
master-user=myuser
master-password=mypassword
master-port=3306
master-ssl=1
master-ssl-ca =   '/var/db/mysql/cacert.pem'
master-ssl-cert = '/var/db/mysql/client-cert.pem'
master-ssl-key =  '/var/db/mysql/client-key.pem'
replicate-do-db=mydb
replicate-ignore-table=mydb.UselessTable
#REPLICATION SECTION END
# Uncomment the following if you want to log updates
#log-bin

# Uncomment the following rows if you move the MySQL distribution to another
# location
#basedir = d:/mysql/
#datadir = d:/mysql/data/

# Uncomment the following if you are NOT using BDB tables
skip-bdb

# Uncomment the following if you are using BDB tables
#set-variable	= bdb_cache_size=4M
#set-variable	= bdb_max_lock=10000

# Uncomment the following if you are using Innobase tables
#innodb_data_file_path = ibdata1:400M
#innodb_data_home_dir = c:\ibdata
#innodb_log_group_home_dir = c:\iblogs
#innodb_log_arch_dir = c:\iblogs
#set-variable = innodb_mirrored_log_groups=1
#set-variable = innodb_log_files_in_group=3
#set-variable = innodb_log_file_size=5M
#set-variable = innodb_log_buffer_size=8M
#innodb_flush_log_at_trx_commit=1
#innodb_log_archive=0
#set-variable = innodb_buffer_pool_size=16M
#set-variable = innodb_additional_mem_pool_size=2M
#set-variable = innodb_file_io_threads=4
#set-variable = innodb_lock_wait_timeout=50

[mysqldump]
quick
set-variable	= max_allowed_packet=16M

[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates

[isamchk]
set-variable	= key_buffer=20M
set-variable	= sort_buffer=20M
set-variable	= read_buffer=2M
set-variable	= write_buffer=2M

[myisamchk]
set-variable	= key_buffer=20M
set-variable	= sort_buffer=20M
set-variable	= read_buffer=2M
set-variable	= write_buffer=2M

[mysqlhotcopy]
interactive-timeout

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Many thanks for writing a bug report.

I once tested MySQL 5.0 with FreeBSD 7 and Windows XP here and it runs without problems.

This problem won't occur by using newer versions of MySQL, Windows and FreeBSD. I will close this bug now.