Bug #2030 relay bin log on slave resets multiple times a second
Submitted: 6 Dec 2003 9:54 Modified: 13 Aug 2004 15:42
Reporter: Chris Waskowich Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:4.0.16 OS:Mac OS X (Mac OS X Server 10.3)
Assigned to: Guilhem Bichot

[6 Dec 2003 9:54] Chris Waskowich
I have a slave and master set-up.  The master is on a Mac OS X Server 10.1.5 box and the slave is 
on a Mac OS X Server 10.3.1 box.  The slave is running fine, except for the fact that the relay-bin 
increments several times a second and doesn't seem to ever get larger than about 32kB, but 
typically it hangs out at about 1024B or less.  Another symptom/cause is that it seems with each 
increment of the relay-bin, the tables get flushed.  Here is some real world data from the server.  It 
has been up for 14 hours and has performed 66182 flush tables and the relay-bin index is at 
66250.  When starting the server (14 hours ago) the relay-bin index was at 0.  Additionally, with 
each increment, the host.err file seems to get this block of data written to it:

Status information:

Current dir: /usr/local/mysql/var/
Running threads: 0  Stack size: 65536
Current locks:
lock: 81388c:

key_cache status:
blocks used:    124690
not flushed:         0
w_requests:          0
writes:              0
r_requests:          0
reads:               0

handler status:
read_key:            0
read_next:           0
read_rnd             0
read_first:          0
write:               0
delete               0
update:              0

Table status:
Opened tables:          0
Open tables:            0
Open files:             4
Open streams:           0

Alarm status:
Active alarms:   1
Max used alarms: 1
Next alarm time: 3600

Here is the my.cnf file for this slave:

port            = 3306
socket          = /tmp/mysql.sock

port            = 3306
socket          = /tmp/mysql.sock


key_buffer = 128M
max_allowed_packet = 1M
thread_stack = 64K
table_cache = 32
sort_buffer = 64K
net_buffer_length = 2K

server-id = 2

master-host = <removed>
master-user = <removed>
master-password = <removed>
master-port = 3306
master-connect-retry = 60

replicate-do-db = db1
replicate-do-db = db2
replicate-do-db = db3







Some more information:  The default install of MySQL on OS X Server 10.3 did the same thing - this 
was version 4.0.14; however, it seems as though Apple messed something up and it hoses primary 
keys; so I compiled my own.  Also, the Master server and the Slave server were compiled from the 
same tar file.  One more thing: when the slave server was running on a Mac OS X Server 10.1.5 
box, this did not happen with any version of MySQL.

How to repeat:
I've tried various items in the my.cnf file and it always performs the above actions.  If I could find a 
way to not have it repeat, I'd be ecstatic.

Suggested fix:
None that I'm aware of.
[6 Dec 2003 9:59] Chris Waskowich
Oh, and another thing, here is the configure command I used to compile mysql-4.0.16:

./configure --prefix=/usr/local/mysql --enable-thread-safe-client --enable-shared=yes --
enable-static=yes --without-debug --with-mysqld-user=mysql --with-charset=latin1 --with-
extra-charsets=none --without-docs --without-bench

I've also configured it without the '--enable-thread-safe-client' option and had the same results.
[6 Dec 2003 12:48] Alexander Keremidarski
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Additional info:

Relay log behaviour is described at: http://www.mysql.com/doc/en/Replication_Implementation_Details.html
[6 Dec 2003 12:56] Chris Waskowich

Thanks for the response, however, I'm confused to as why there is a difference between the bin-log 
functionality when the only thing that changed was the OS.  The my.cnf file was the same when the 
slave was on 10.1.5 and mysql was the same version as well (with the same configure).  Maybe the 
bug is with the OS, but there is nothing I can find anywhere, even in the URL you reference that 
aids me in the problem I'm having.
[6 Dec 2003 13:04] Chris Waskowich

There is nothing in the documentation as to what all the information is that is getting written to 
the host.err file and why for each block of this information the slave server is flushing the tables.  
And, since there is nothing in the documentation, I have no clue what commands on how to stop 
this information from getting written to the host.err file.  It gets written at such a rate that the 
host.err file will be about 40M after about 10 hours.
[6 Dec 2003 13:12] Chris Waskowich
One more thing, if the URL you referenced was due to the lack of  the 'max_relay_log_size' and 
'max_binlog_size' in the my.cnf file, then I have a refute to that.

I did have these following lines in the my.cnf file:

max_relay_log_size = 128M
max_binlog_size = 128M

However, it fails to recognize them and still increments the relay-bin file after about 1024B of info.  
Additionally, if I turn on the log-bin option, then the bin-log does the same thing.  That is, it starts 
to increment regardless of what these options state.  Just so we are clear, I did indeed cycle the 
mysqld between changes of the my.cnf.

I know that mysqld is reading the my.cnf file because even after deleting the master.info file and 
restarting mysqld, it will conenct to the master.  The only my.cnf file on the file system is the one 
that has this information.
[6 Dec 2003 13:21] Chris Waskowich
Just for giggles, I tried enabling the 'log-bin' option again.  Well, the mysqld crashed.
[6 Dec 2003 14:18] Alexander Keremidarski
Please explain what exactly the problem is.

Reading your intital report I was under impression you are complaining that Relay log gets rotated and rotates too fast. As this only mean your Replication is working well and your Slave server catches up quickly with Master I closed bug report refering to manual.

If you experience problem with rotating Relay log you can simply shut down SQL  replication thread making sure I/O thread is running and then observe Relay log expansion and rotation.
At some places you refer to relay log while on others to binary log. Please clarify all of the above.
[6 Dec 2003 14:35] Chris Waskowich
Yea, sorry for not being clear enough.  First, the binary logs that I'm referring to are the 'host-
relay-bin.xxxx' logs.  If I enable 'log-bin' in the my.cnf, then the mysqld crashes (I can provide a 
trace if required) and then there are several 'host-bin.xxx' files.  However, the more I think about 
it, this is probably due to mysqld crashing and creating a new 'host-bin.xxx' for each time it is 

I have no problem with the relay-bin.xxxx files rotating, however, it is ignoring the 
'max_relay_log_size' option in the my.cnf file which I now have set to 128MB.  Per the 
documentation, the relay-bin files can get rotated when a flush logs command is being issued - 
I'm not issuing this command, nor is any other process on the slave server.  Can the master issue a 
'flush logs' command to the slave?

Additionally, when getting a 'status', the 'flush tables' is exceptionally high and seems to 
correspond with the index number of the relay-bin file.  I'm assuming the two are related, but I'm 
not sure which one is the cause/effect.

I just issued a 'STOP SLAVE SQL_THREAD' command and am watching the relay-bin log files.  It has 
been about 1 minute and there are about 200 of them.  The largest one is 1134 B.  Upon issuing a 
'START SLAVE SQL_THREAD' command, the relay-bin files start to get removed.
[7 Dec 2003 14:32] Guilhem Bichot
Hi Chris,
The reason why mysqld rotates the relay logs and flushes the tables, is that it receives (for some mysterious reason) a SIGHUP signal: sql/mysql.cc:
    case SIGHUP:
      if (!abort_loop)
	reload_acl_and_cache((THD*) 0,
			     (TABLE_LIST*) 0); // Flush logs
	mysql_print_status((THD*) 0);		// Send debug some info
Now, why does it receive so many SIGHUP? I have understood that the problem did not happen with some older version of Mac OS X. So it can be a bug in MySQL or a  crazy OS...
What's the exact command by which you start mysqld: mysqld_safe (with some options?), mysqld?
What could *maybe* help us is to have a debug trace: you would need to build mysql with debug support: just add --with-debug to your 'configure' options, and build, and install on the slave. Then add 
to the options of the slave's mysqld (in my.cnf).
For example, I have
Then, once a few crazy flushes have happened, shutdown the slave's mysqld, and upload a compressed version of /tmp/mysqld.trace here:
and let us know the filename by adding a comment in this bug report
(note other people that MySQL employees cannot download from this ftp site, so your data is safe there).
Maybe this will help us. Maybe not, as it may not tell us who is sending SIGHUP.
[7 Dec 2003 14:34] Guilhem Bichot
By the way, it would be very interesting to also have a debug file (like I explained formerly) for the moment where you are able to crash mysqld by starting it with log-bin.
So, 2 debug files... Thanks! :)
[7 Dec 2003 14:57] Guilhem Bichot
don't worry about 'log-bin makes mysqld crash': I am able to make mysqld --log-bin crash on my Linux when I send it a SIGHUP (and you have lots of SIGHUPs on your system, as we saw). So this one is a bug which I will fix.
Remains: why these SIGHUP on your slave?
If you can give us a debug trace with --debug...
[7 Dec 2003 15:03] Chris Waskowich

Thanks for the replies.  I'll try to get to it this evening, but I'm not sure I'll have time, as I have to 
rebuild mysql with the --with-debug option.  Not a huge deal... but I'm moving into a new place at 
the beginning of this week and my time is short.  I will get to it this week, though - as I really want 
to find a resolution or give you guys the info to get this fixed in future builds.
[9 Dec 2003 6:57] Guilhem Bichot
I have entered the bug "SIGHUP crashes mysqld --log-bin" as a separate bug, number 2045, which is fixed in the coming 4.0.17.
Remains the question why mysqld receives so many SIGHUP.
[22 Dec 2003 11:28] Paul Dubois
Some additional information pertaining to this problem.
I see much the same thing for MySQL 4.1.x after upgrading
from Mac OS X 10.2 to 10.3. I suspect a signal-handling
difference between Jaguar (10.2) and Panther (10.3).
I'm running Mac OS X non-Server, by the way, so it's
not specific to Mac OS X Server.

The problem does not depend on replication in any way,
though I can see why it would appear to be related. What's
actually happening is that SIGHUP signals are being processed
for *every query*.

Just the bare act of connecting to the server generates three
new binlog files. Issuing a simple SELECT 1 query generates
two more binlog files.  The general log shows this kind of

/var/mysql/40102/libexec/mysqld, Version: 4.1.2-alpha-log, started with:
Tcp port: 40102  Unix socket: /var/mysql/40102/mysql.sock
Time                 Id Command    Argument
031222 13:25:53       2 Connect     paul@localhost on
/var/mysql/40102/libexec/mysqld, Version: 4.1.2-alpha-log, started with:
Tcp port: 40102  Unix socket: /var/mysql/40102/mysql.sock
Time                 Id Command    Argument
/var/mysql/40102/libexec/mysqld, Version: 4.1.2-alpha-log, started with:
Tcp port: 40102  Unix socket: /var/mysql/40102/mysql.sock
Time                 Id Command    Argument
031222 13:26:01       2 Query       SELECT 1
/var/mysql/40102/libexec/mysqld, Version: 4.1.2-alpha-log, started with:
Tcp port: 40102  Unix socket: /var/mysql/40102/mysql.sock
Time                 Id Command    Argument
/var/mysql/40102/libexec/mysqld, Version: 4.1.2-alpha-log, started with:
Tcp port: 40102  Unix socket: /var/mysql/40102/mysql.sock
Time                 Id Command    Argument
[22 Dec 2003 12:46] Paul Dubois
Some further investigation shows that the behavior I reported
in my previous comment occurs only when the connection
from mysql to the server uses SSL.  If I connect without using
SSL, then the logging behavior is normal.

I also tried SSL-enabled connections on 4.1.2 on Red Hat 7.
The odd logging behavior does not occur there.
[22 Dec 2003 15:27] Paul Dubois
More info:

The problems with SIGHUP causing binlogs to be generated
for each query with SSL connections appear to be pervasive
on Mac OS X 10.3. I can repeat the problem on 4.0.18,
4.1.2, and 5.0.0.  With non-SSL connections, there is no

On the theory that the problem may be with Apple's SSL
libraries, I attempted to build OpenSSL 0.9.6l so that I
could link MySQL to that.  But I could not get it to build.
[16 Jan 2004 5:42] Guilhem Bichot
We were able to verify it on two Mac OS X 10.3 machines.
When using SSL connections, there are 1 SIGHUP when connecting and 2 SIGHUP for each query.
When not using SSL connections, there is 1 SIGHUP when 'mysqladmin shutdown'.
Sending TERM apparently triggers HUP then TERM.
We don't know yet if this is a bug in MySQL or in something new introduced in Mac OS X 10.3.
We will continue investigation after we upgrade some of our internal machines to 10.3.
Until then, as you build from source, you can work around the problem with this patch (for MySQL 4.0) for sql/mysqld.cc:
*** 1762,1768 ****
    (void) sigaddset(&set,SIGQUIT);
    (void) sigaddset(&set,SIGTERM);
!   (void) sigaddset(&set,SIGHUP);
    (void) sigaddset(&set,SIGTSTP);

--- 1762,1768 ----
    (void) sigaddset(&set,SIGQUIT);
    (void) sigaddset(&set,SIGTERM);
! //  (void) sigaddset(&set,SIGHUP);
    (void) sigaddset(&set,SIGTSTP);

It will disable catching of SIGHUP by the signal handler.
[21 Jan 2004 23:21] Chris Waskowich

I'm not yet had time to do all the debug dump stuff.  However, I did try to compile mysql-4.0.17 
with the change to the code as noted.  After building and installing: when I run the slave server, it 
immediately shuts down.  It should be noted that I'm not using SSL.  Here is the configure 

./configure --prefix=/usr/local/mysql --enable-shared=yes --enable-static=yes --without-
debug --with-mysqld-user=mysql --with-charset=latin1 --with-extra-charsets=none --without-
docs --without-bench --without-openssl
[22 Jan 2004 0:04] Chris Waskowich
The plot thickens.

I compiled mysql-4.0.17 with this configure:

./configure --prefix=/usr/local/mysql --enable-shared=yes --enable-static=yes --with-debug 
--with-mysqld-user=mysql --with-charset=latin1 --with-extra-charsets=none --without-docs 
--without-bench --without-openssl

I also added this line to the [mysqld] section in the my.cnf file:


When I start mysql, this is what gets dumped to the prompt:

[kyle:local/src/mysql-4.0.17] root# /System/Library/StartupItems/MySQL/MySQL start
Starting MySQL
[kyle:local/src/mysql-4.0.17] root# Starting mysqld daemon with databases from /usr/local/
/usr/local/mysql/bin/mysqld_safe: line 1: 15611 Abort trap              nohup /usr/local/mysql/
libexec/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/var --user=mysql --pid-
file=/usr/local/mysql/var/kyle.pid --port=3306 --socket=/tmp/mysql.sock >>/usr/local/mysql/
var/kyle.err 2>&1
040122 03:00:00  mysqld ended

Here is what gets logged in the kyle.err file:

040122 03:00:00  mysqld started
Got error 22 when trying to lock mutex at log.cc, line 1689
040122 03:00:00  mysqld ended

I've uploaded the mysqld.trace file to the FTP server you noted previously.  The name of the file is 

Also, when I comment out the debug file in the my.cnf file and try to start mysql, I get the same 
error on the prompt and the same data gets written to the kyle.err file.  Obviously, though, there is 
not mysqld.trace file.
[22 Jan 2004 6:32] Guilhem Bichot
Thanks for your patience. I appreciate all this help to us.
Presently you are hitting a known bug which is already fixed in the  not-released-yet MySQL 4.0.18 :( (we attempted to use a function to print an error in the error log, which incidentally acquires a mutex, but the mutex was not initialized yet). The intention was good (fix #BUG 13, I suggest you take a look at it) but the fix was a bit incorrect, hence the crash.
But we need to move forward with debugging our original problem, so we need to disable this crashing you get; for this just comment this line in sql/mysqld.cc:
    sql_print_error("Warning: Setting lower_case_table_names=2 because file system for %s is case insensitive", mysql_real_data_home);
(the wording is a bit different in your sql/mysqld.cc which is older; it is probably saying "=1" but nevermind, just comment).
Then please tell me how your system goes after that. Normally it should not crash and it should not flush your logs all the time.
[23 Jan 2004 9:17] Chris Waskowich
Well, I commented out the sql_print_error line you noted (I'm still on 4.0.17). and them compiled 
with debug and added the debug line to the my.cnf file.  I did not, however, comment out the (void) 
sigaddset(&set,SIGHUP); line.  I wanted to run it without this fix, so I could get a trace.  However, 
when I run it, it still bails out, however, it seems to get a little further along.  I've uploaded the 
trace file to the FTP site.  The name of the file is 'chris-waskowich-20040123a-mysqld.trace.gz'
[23 Jan 2004 14:22] Chris Waskowich
I did another test.  This time I made the the two changes, i.e., commenting out the two lines:

(void) sigaddset(&set,SIGHUP);
sql_print_error("Warning: Setting lower_case_table_names=1 becasue file system %s is case 
insensitve", mysql_real_data_home);

It also bailed immediately after running.  I've uploaded the trace file to the FTP server and its name 
is 'chris-waskowich-20040123b-mysqld.trace.gz'.
[24 Jan 2004 4:18] Guilhem Bichot
Hi Chris,
Looking at your latest debug trace file, I see:
T@3    : | >connect_to_master
T@3    : | | >mc_mysql_connect
T@3    : | | | enter: host: database.ampcast.com  db: (Null)  user: repl  connect_time_out: 30  read_timeout:
T@3    : | | | info: Server name: 'database.ampcast.com'.  TCP sock: 3306
T@3    : | | | >thr_alarm
T@3    : | | | | enter: thread: T@3  sec: 30
T@3    : | | | | info: reschedule
T@2    : | info: Got signal: 3  abort_loop: 0
T@4    : >kill_server
T@4    : | >sql_print_error
T@4    : | | error: /usr/local/mysql/libexec/mysqld: Normal shutdown

connect_to_master() is called by the I/O slave thread to connect to the master (when replication is starting); it immediately triggers (but why??) a signal 3 (SIGQUIT!!) to be sent to mysqld; and logically the signal handler reacts by calling kill_server() to make mysqld terminate.
So it looks like connecting to the master has the consequence of SIGQUIT being sent. Before, when you had not commented the sigaddset(), you did not get SIGQUIT but SIGHUP instead. So I wonder if the signals really exist in this bug, or if it's just the MySQL signal handler doing crazy things. Or the signals are really sent but randomly...
The fact that this happens when opening a connection is consistent with tests I have run on other machines.
To sum up, this is becoming crazier and crazier.
We talked to our contact at Apple but he was not aware of signal problems in Mac OS X 10.3.
We have seen problems too with suspending/resuming MySQL on Mac OS X (Ctrl-Z).
I think I have asked you all possible tests and you did them, thanks a lot!!
Now we at MySQL have to debug all this thoroughly, it may take some time.
Until then, the only workaround I see for you now is to disable SIGQUIT catching too.
In the code below (mysqld.cc), comment the "case SIGQUIT"). The problem if you disable SIGTERM is that you won't be able to shutdown mysqld anymore (unless you use kill -SIGKILL but that would certainly corrupt your slave's data, so don't do that). We can just hope that SIGTERM is not sent everytime there is a connection. So please let me know if disabling SIGHUP and SIGQUIT leads to finally hide the problem or not.
    switch (sig) {
    case SIGTERM:
    case SIGQUIT:
    case SIGKILL:
      sql_print_error("Got signal %d to shutdown mysqld",sig);
      DBUG_PRINT("info",("Got signal: %d  abort_loop: %d",sig,abort_loop));
      if (!abort_loop)
	abort_loop=1;				// mark abort for threads
	pthread_t tmp;
	if (!(opt_specialflag & SPECIAL_NO_PRIOR))
	if (pthread_create(&tmp,&connection_attrib, kill_server_thread,
			   (void*) sig))
	  sql_print_error("Error: Can't create thread to kill server");
	kill_server((void*) sig);	// MIT THREAD has a alarm thread
[25 Jan 2004 19:17] Chris Waskowich
Modified sql/mysqld.cc source file for mysql-4.0.17 and Mac OS X 10.3.x

Attachment: mysqld.cc (, text), 178.75 KiB.

[25 Jan 2004 19:19] Chris Waskowich

Well, I made the three changes noted in your prior posts.  I've added the file to the 'Files' section 
here for public access and so you can do a diff to verify the changes I've made.

I do realize that this is just a band-aide on the skull-fracture, but at least I can utilize the slave on 
the intended server now.  I'll let you know if anything odd pops up.

Thanks for the support, its truly appreciated!
[3 Jun 2004 13:21] Chris Waskowich
Has there been any progress on providing an actual fix for this yet, or has the above been incorporated 
into the latest (4.0.20) until a fix is found?
[3 Jun 2004 15:11] Guilhem Bichot
Hi Chris,
No, nothing has been incorporated in 4.0.20. Because we had trouble with this bug, as it was not totally repeatable (could not repeat it on our main Mac OS X Panther build host, only on some colleagues workstations). Hence we could not find the real cause of the problem.
But I agree the band-aide is better than nothing. So I am going to disable SIGHUP and SIGQUIT catching on Mac OS X on 4.0.21. Could you please test this patch (it's just a bit different from yours: I am trying if just commenting the sigaddset()s is enough, without commenting the 'case SIGHUP') and see if it makes a workable server at your site?
===== configure.in 1.300 vs edited =====
*** /home/guilhem/tmp/bk_configure.in-1.300_XRPlrP	2004-05-12 01:22:58 +02:00
--- edited/configure.in	2004-06-03 15:05:50 +02:00
*** 1031,1037 ****
      if test "$ac_cv_prog_gcc" = "yes"
--- 1031,1037 ----
      if test "$ac_cv_prog_gcc" = "yes"
===== sql/mysqld.cc 1.495 vs edited =====
*** /home/guilhem/tmp/bk_mysqld.cc-1.495_kbCuMC	2004-05-19 15:51:02 +02:00
--- edited/sql/mysqld.cc	2004-06-03 14:43:37 +02:00
*** 1643,1651 ****
-   sigaddset(&set,SIGTERM);
    /* Fix signals if blocked by parents (can happen on Mac OS X) */
--- 1645,1655 ----
+ #endif
+   sigaddset(&set,SIGTERM);
    /* Fix signals if blocked by parents (can happen on Mac OS X) */
*** 1728,1738 ****
    (void) sigaddset(&set,THR_SERVER_ALARM);	// For alarms
    (void) sigaddset(&set,SIGQUIT);
-   (void) sigaddset(&set,SIGTERM);
    (void) sigaddset(&set,SIGHUP);
    (void) sigaddset(&set,SIGTSTP);
    /* Save pid to this process (or thread on Linux) */
--- 1732,1744 ----
    (void) sigaddset(&set,THR_SERVER_ALARM);	// For alarms
    (void) sigaddset(&set,SIGQUIT);
    (void) sigaddset(&set,SIGHUP);
+ #endif
+   (void) sigaddset(&set,SIGTERM);
    (void) sigaddset(&set,SIGTSTP);
    /* Save pid to this process (or thread on Linux) */

It's supposed to disable SIGHUP and SIGQUIT on Darwin7 kernels which are those of Mac OS X 10.3 if I am not mistaken.
After you have applied this patch, you will need to run "autoconf" in the source directory, so that it updates "configure" from "configure.in". If that does not work (if you don't have autoconf?) then better update "configure" manually with the same change as in "configure.in". Then ./configure; make; .
If yes, I will incorporate that in 4.0.21. Thank you for your patience!
[3 Jun 2004 15:12] Guilhem Bichot
Hey, beware of the browser wrapping patch lines here. The added "-DIGNOREetc" must be on the same line as the others existing "-D".
[13 Aug 2004 15:42] Guilhem Bichot
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at

Additional info:

As we need to get 4.0.21 out, and hopefully without this bug, I am including the patch in MySQL 4.0.21.
Please let me know, if you upgrade to 4.0.21, if problem still persists.
Thank you.
ChangeSet@1.1958, 2004-08-13 15:29:47+02:00, guilhem@mysql.com