Bug #2030 | relay bin log on slave resets multiple times a second | ||
---|---|---|---|
Submitted: | 6 Dec 2003 9:54 | Modified: | 13 Aug 2004 15:42 |
Reporter: | Chris Waskowich | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 4.0.16 | OS: | MacOS (Mac OS X Server 10.3) |
Assigned to: | Guilhem Bichot | CPU Architecture: | Any |
[6 Dec 2003 9:54]
Chris Waskowich
[6 Dec 2003 9:59]
Chris Waskowich
Oh, and another thing, here is the configure command I used to compile mysql-4.0.16: ./configure --prefix=/usr/local/mysql --enable-thread-safe-client --enable-shared=yes -- enable-static=yes --without-debug --with-mysqld-user=mysql --with-charset=latin1 --with- extra-charsets=none --without-docs --without-bench I've also configured it without the '--enable-thread-safe-client' option and had the same results.
[6 Dec 2003 12:48]
Alexander Keremidarski
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.mysql.com/documentation/ and the instructions on how to report a bug at http://bugs.mysql.com/how-to-report.php Additional info: Relay log behaviour is described at: http://www.mysql.com/doc/en/Replication_Implementation_Details.html
[6 Dec 2003 12:56]
Chris Waskowich
Alexander, Thanks for the response, however, I'm confused to as why there is a difference between the bin-log functionality when the only thing that changed was the OS. The my.cnf file was the same when the slave was on 10.1.5 and mysql was the same version as well (with the same configure). Maybe the bug is with the OS, but there is nothing I can find anywhere, even in the URL you reference that aids me in the problem I'm having.
[6 Dec 2003 13:04]
Chris Waskowich
Additionally, There is nothing in the documentation as to what all the information is that is getting written to the host.err file and why for each block of this information the slave server is flushing the tables. And, since there is nothing in the documentation, I have no clue what commands on how to stop this information from getting written to the host.err file. It gets written at such a rate that the host.err file will be about 40M after about 10 hours.
[6 Dec 2003 13:12]
Chris Waskowich
One more thing, if the URL you referenced was due to the lack of the 'max_relay_log_size' and 'max_binlog_size' in the my.cnf file, then I have a refute to that. I did have these following lines in the my.cnf file: max_relay_log_size = 128M max_binlog_size = 128M However, it fails to recognize them and still increments the relay-bin file after about 1024B of info. Additionally, if I turn on the log-bin option, then the bin-log does the same thing. That is, it starts to increment regardless of what these options state. Just so we are clear, I did indeed cycle the mysqld between changes of the my.cnf. I know that mysqld is reading the my.cnf file because even after deleting the master.info file and restarting mysqld, it will conenct to the master. The only my.cnf file on the file system is the one that has this information.
[6 Dec 2003 13:21]
Chris Waskowich
Just for giggles, I tried enabling the 'log-bin' option again. Well, the mysqld crashed.
[6 Dec 2003 14:18]
Alexander Keremidarski
Please explain what exactly the problem is. Reading your intital report I was under impression you are complaining that Relay log gets rotated and rotates too fast. As this only mean your Replication is working well and your Slave server catches up quickly with Master I closed bug report refering to manual. If you experience problem with rotating Relay log you can simply shut down SQL replication thread making sure I/O thread is running and then observe Relay log expansion and rotation. At some places you refer to relay log while on others to binary log. Please clarify all of the above.
[6 Dec 2003 14:35]
Chris Waskowich
Yea, sorry for not being clear enough. First, the binary logs that I'm referring to are the 'host- relay-bin.xxxx' logs. If I enable 'log-bin' in the my.cnf, then the mysqld crashes (I can provide a trace if required) and then there are several 'host-bin.xxx' files. However, the more I think about it, this is probably due to mysqld crashing and creating a new 'host-bin.xxx' for each time it is launched. I have no problem with the relay-bin.xxxx files rotating, however, it is ignoring the 'max_relay_log_size' option in the my.cnf file which I now have set to 128MB. Per the documentation, the relay-bin files can get rotated when a flush logs command is being issued - I'm not issuing this command, nor is any other process on the slave server. Can the master issue a 'flush logs' command to the slave? Additionally, when getting a 'status', the 'flush tables' is exceptionally high and seems to correspond with the index number of the relay-bin file. I'm assuming the two are related, but I'm not sure which one is the cause/effect. I just issued a 'STOP SLAVE SQL_THREAD' command and am watching the relay-bin log files. It has been about 1 minute and there are about 200 of them. The largest one is 1134 B. Upon issuing a 'START SLAVE SQL_THREAD' command, the relay-bin files start to get removed.
[7 Dec 2003 14:32]
Guilhem Bichot
Hi Chris, The reason why mysqld rotates the relay logs and flushes the tables, is that it receives (for some mysterious reason) a SIGHUP signal: sql/mysql.cc: case SIGHUP: if (!abort_loop) { reload_acl_and_cache((THD*) 0, (REFRESH_LOG | REFRESH_TABLES | REFRESH_FAST | REFRESH_STATUS | REFRESH_GRANT | REFRESH_THREADS | REFRESH_HOSTS), (TABLE_LIST*) 0); // Flush logs mysql_print_status((THD*) 0); // Send debug some info } break; Now, why does it receive so many SIGHUP? I have understood that the problem did not happen with some older version of Mac OS X. So it can be a bug in MySQL or a crazy OS... What's the exact command by which you start mysqld: mysqld_safe (with some options?), mysqld? What could *maybe* help us is to have a debug trace: you would need to build mysql with debug support: just add --with-debug to your 'configure' options, and build, and install on the slave. Then add debug=d:t:i:O,some_absolute_path_and_filename to the options of the slave's mysqld (in my.cnf). For example, I have debug=d:t:i:O,/tmp/mysqld.trace Then, once a few crazy flushes have happened, shutdown the slave's mysqld, and upload a compressed version of /tmp/mysqld.trace here: ftp://support.mysql.com/pub/mysql/secret and let us know the filename by adding a comment in this bug report (note other people that MySQL employees cannot download from this ftp site, so your data is safe there). Maybe this will help us. Maybe not, as it may not tell us who is sending SIGHUP.
[7 Dec 2003 14:34]
Guilhem Bichot
By the way, it would be very interesting to also have a debug file (like I explained formerly) for the moment where you are able to crash mysqld by starting it with log-bin. So, 2 debug files... Thanks! :)
[7 Dec 2003 14:57]
Guilhem Bichot
Chris, don't worry about 'log-bin makes mysqld crash': I am able to make mysqld --log-bin crash on my Linux when I send it a SIGHUP (and you have lots of SIGHUPs on your system, as we saw). So this one is a bug which I will fix. Remains: why these SIGHUP on your slave? If you can give us a debug trace with --debug...
[7 Dec 2003 15:03]
Chris Waskowich
Guilhem, Thanks for the replies. I'll try to get to it this evening, but I'm not sure I'll have time, as I have to rebuild mysql with the --with-debug option. Not a huge deal... but I'm moving into a new place at the beginning of this week and my time is short. I will get to it this week, though - as I really want to find a resolution or give you guys the info to get this fixed in future builds.
[9 Dec 2003 6:57]
Guilhem Bichot
I have entered the bug "SIGHUP crashes mysqld --log-bin" as a separate bug, number 2045, which is fixed in the coming 4.0.17. Remains the question why mysqld receives so many SIGHUP.
[22 Dec 2003 11:28]
Paul DuBois
Some additional information pertaining to this problem. I see much the same thing for MySQL 4.1.x after upgrading from Mac OS X 10.2 to 10.3. I suspect a signal-handling difference between Jaguar (10.2) and Panther (10.3). I'm running Mac OS X non-Server, by the way, so it's not specific to Mac OS X Server. The problem does not depend on replication in any way, though I can see why it would appear to be related. What's actually happening is that SIGHUP signals are being processed for *every query*. Just the bare act of connecting to the server generates three new binlog files. Issuing a simple SELECT 1 query generates two more binlog files. The general log shows this kind of thing: /var/mysql/40102/libexec/mysqld, Version: 4.1.2-alpha-log, started with: Tcp port: 40102 Unix socket: /var/mysql/40102/mysql.sock Time Id Command Argument 031222 13:25:53 2 Connect paul@localhost on /var/mysql/40102/libexec/mysqld, Version: 4.1.2-alpha-log, started with: Tcp port: 40102 Unix socket: /var/mysql/40102/mysql.sock Time Id Command Argument /var/mysql/40102/libexec/mysqld, Version: 4.1.2-alpha-log, started with: Tcp port: 40102 Unix socket: /var/mysql/40102/mysql.sock Time Id Command Argument 031222 13:26:01 2 Query SELECT 1 /var/mysql/40102/libexec/mysqld, Version: 4.1.2-alpha-log, started with: Tcp port: 40102 Unix socket: /var/mysql/40102/mysql.sock Time Id Command Argument /var/mysql/40102/libexec/mysqld, Version: 4.1.2-alpha-log, started with: Tcp port: 40102 Unix socket: /var/mysql/40102/mysql.sock Time Id Command Argument
[22 Dec 2003 12:46]
Paul DuBois
Some further investigation shows that the behavior I reported in my previous comment occurs only when the connection from mysql to the server uses SSL. If I connect without using SSL, then the logging behavior is normal. I also tried SSL-enabled connections on 4.1.2 on Red Hat 7. The odd logging behavior does not occur there.
[22 Dec 2003 15:27]
Paul DuBois
More info: The problems with SIGHUP causing binlogs to be generated for each query with SSL connections appear to be pervasive on Mac OS X 10.3. I can repeat the problem on 4.0.18, 4.1.2, and 5.0.0. With non-SSL connections, there is no problem. On the theory that the problem may be with Apple's SSL libraries, I attempted to build OpenSSL 0.9.6l so that I could link MySQL to that. But I could not get it to build.
[16 Jan 2004 5:42]
Guilhem Bichot
We were able to verify it on two Mac OS X 10.3 machines. When using SSL connections, there are 1 SIGHUP when connecting and 2 SIGHUP for each query. When not using SSL connections, there is 1 SIGHUP when 'mysqladmin shutdown'. Sending TERM apparently triggers HUP then TERM. We don't know yet if this is a bug in MySQL or in something new introduced in Mac OS X 10.3. We will continue investigation after we upgrade some of our internal machines to 10.3. Until then, as you build from source, you can work around the problem with this patch (for MySQL 4.0) for sql/mysqld.cc: *************** *** 1762,1768 **** (void) sigaddset(&set,SIGQUIT); (void) sigaddset(&set,SIGTERM); #if THR_CLIENT_ALARM != SIGHUP ! (void) sigaddset(&set,SIGHUP); #endif (void) sigaddset(&set,SIGTSTP); --- 1762,1768 ---- (void) sigaddset(&set,SIGQUIT); (void) sigaddset(&set,SIGTERM); #if THR_CLIENT_ALARM != SIGHUP ! // (void) sigaddset(&set,SIGHUP); #endif (void) sigaddset(&set,SIGTSTP); It will disable catching of SIGHUP by the signal handler.
[21 Jan 2004 23:21]
Chris Waskowich
Guilhem, I'm not yet had time to do all the debug dump stuff. However, I did try to compile mysql-4.0.17 with the change to the code as noted. After building and installing: when I run the slave server, it immediately shuts down. It should be noted that I'm not using SSL. Here is the configure command: ./configure --prefix=/usr/local/mysql --enable-shared=yes --enable-static=yes --without- debug --with-mysqld-user=mysql --with-charset=latin1 --with-extra-charsets=none --without- docs --without-bench --without-openssl
[22 Jan 2004 0:04]
Chris Waskowich
The plot thickens. I compiled mysql-4.0.17 with this configure: ./configure --prefix=/usr/local/mysql --enable-shared=yes --enable-static=yes --with-debug --with-mysqld-user=mysql --with-charset=latin1 --with-extra-charsets=none --without-docs --without-bench --without-openssl I also added this line to the [mysqld] section in the my.cnf file: debug=d:t:i:O,/tmp/mysqld.trace When I start mysql, this is what gets dumped to the prompt: [kyle:local/src/mysql-4.0.17] root# /System/Library/StartupItems/MySQL/MySQL start Starting MySQL [kyle:local/src/mysql-4.0.17] root# Starting mysqld daemon with databases from /usr/local/ mysql/var /usr/local/mysql/bin/mysqld_safe: line 1: 15611 Abort trap nohup /usr/local/mysql/ libexec/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/var --user=mysql --pid- file=/usr/local/mysql/var/kyle.pid --port=3306 --socket=/tmp/mysql.sock >>/usr/local/mysql/ var/kyle.err 2>&1 040122 03:00:00 mysqld ended Here is what gets logged in the kyle.err file: 040122 03:00:00 mysqld started Got error 22 when trying to lock mutex at log.cc, line 1689 040122 03:00:00 mysqld ended I've uploaded the mysqld.trace file to the FTP server you noted previously. The name of the file is 'chris-waskowich-mysqld.trace.gz' Also, when I comment out the debug file in the my.cnf file and try to start mysql, I get the same error on the prompt and the same data gets written to the kyle.err file. Obviously, though, there is not mysqld.trace file.
[22 Jan 2004 6:32]
Guilhem Bichot
Chris, Thanks for your patience. I appreciate all this help to us. Presently you are hitting a known bug which is already fixed in the not-released-yet MySQL 4.0.18 :( (we attempted to use a function to print an error in the error log, which incidentally acquires a mutex, but the mutex was not initialized yet). The intention was good (fix #BUG 13, I suggest you take a look at it) but the fix was a bit incorrect, hence the crash. But we need to move forward with debugging our original problem, so we need to disable this crashing you get; for this just comment this line in sql/mysqld.cc: sql_print_error("Warning: Setting lower_case_table_names=2 because file system for %s is case insensitive", mysql_real_data_home); (the wording is a bit different in your sql/mysqld.cc which is older; it is probably saying "=1" but nevermind, just comment). Then please tell me how your system goes after that. Normally it should not crash and it should not flush your logs all the time.
[23 Jan 2004 9:17]
Chris Waskowich
Well, I commented out the sql_print_error line you noted (I'm still on 4.0.17). and them compiled with debug and added the debug line to the my.cnf file. I did not, however, comment out the (void) sigaddset(&set,SIGHUP); line. I wanted to run it without this fix, so I could get a trace. However, when I run it, it still bails out, however, it seems to get a little further along. I've uploaded the trace file to the FTP site. The name of the file is 'chris-waskowich-20040123a-mysqld.trace.gz'
[23 Jan 2004 14:22]
Chris Waskowich
I did another test. This time I made the the two changes, i.e., commenting out the two lines: (void) sigaddset(&set,SIGHUP); AND sql_print_error("Warning: Setting lower_case_table_names=1 becasue file system %s is case insensitve", mysql_real_data_home); It also bailed immediately after running. I've uploaded the trace file to the FTP server and its name is 'chris-waskowich-20040123b-mysqld.trace.gz'.
[24 Jan 2004 4:18]
Guilhem Bichot
Hi Chris, Looking at your latest debug trace file, I see: T@3 : | >connect_to_master T@3 : | | >mc_mysql_connect T@3 : | | | enter: host: database.ampcast.com db: (Null) user: repl connect_time_out: 30 read_timeout: 3600 T@3 : | | | info: Server name: 'database.ampcast.com'. TCP sock: 3306 T@3 : | | | >thr_alarm T@3 : | | | | enter: thread: T@3 sec: 30 T@3 : | | | | info: reschedule T@2 : | info: Got signal: 3 abort_loop: 0 T@4 : >kill_server T@4 : | >sql_print_error T@4 : | | error: /usr/local/mysql/libexec/mysqld: Normal shutdown connect_to_master() is called by the I/O slave thread to connect to the master (when replication is starting); it immediately triggers (but why??) a signal 3 (SIGQUIT!!) to be sent to mysqld; and logically the signal handler reacts by calling kill_server() to make mysqld terminate. So it looks like connecting to the master has the consequence of SIGQUIT being sent. Before, when you had not commented the sigaddset(), you did not get SIGQUIT but SIGHUP instead. So I wonder if the signals really exist in this bug, or if it's just the MySQL signal handler doing crazy things. Or the signals are really sent but randomly... The fact that this happens when opening a connection is consistent with tests I have run on other machines. To sum up, this is becoming crazier and crazier. We talked to our contact at Apple but he was not aware of signal problems in Mac OS X 10.3. We have seen problems too with suspending/resuming MySQL on Mac OS X (Ctrl-Z). I think I have asked you all possible tests and you did them, thanks a lot!! Now we at MySQL have to debug all this thoroughly, it may take some time. Until then, the only workaround I see for you now is to disable SIGQUIT catching too. In the code below (mysqld.cc), comment the "case SIGQUIT"). The problem if you disable SIGTERM is that you won't be able to shutdown mysqld anymore (unless you use kill -SIGKILL but that would certainly corrupt your slave's data, so don't do that). We can just hope that SIGTERM is not sent everytime there is a connection. So please let me know if disabling SIGHUP and SIGQUIT leads to finally hide the problem or not. switch (sig) { case SIGTERM: case SIGQUIT: case SIGKILL: #ifdef EXTRA_DEBUG sql_print_error("Got signal %d to shutdown mysqld",sig); #endif DBUG_PRINT("info",("Got signal: %d abort_loop: %d",sig,abort_loop)); if (!abort_loop) { abort_loop=1; // mark abort for threads #ifdef USE_ONE_SIGNAL_HAND pthread_t tmp; if (!(opt_specialflag & SPECIAL_NO_PRIOR)) my_pthread_attr_setprio(&connection_attrib,INTERRUPT_PRIOR); if (pthread_create(&tmp,&connection_attrib, kill_server_thread, (void*) sig)) sql_print_error("Error: Can't create thread to kill server"); #else kill_server((void*) sig); // MIT THREAD has a alarm thread #endif } break;
[25 Jan 2004 19:17]
Chris Waskowich
Modified sql/mysqld.cc source file for mysql-4.0.17 and Mac OS X 10.3.x
Attachment: mysqld.cc (, text), 178.75 KiB.
[25 Jan 2004 19:19]
Chris Waskowich
Hurray! Well, I made the three changes noted in your prior posts. I've added the file to the 'Files' section here for public access and so you can do a diff to verify the changes I've made. I do realize that this is just a band-aide on the skull-fracture, but at least I can utilize the slave on the intended server now. I'll let you know if anything odd pops up. Thanks for the support, its truly appreciated!
[3 Jun 2004 13:21]
Chris Waskowich
Has there been any progress on providing an actual fix for this yet, or has the above been incorporated into the latest (4.0.20) until a fix is found?
[3 Jun 2004 15:11]
Guilhem Bichot
Hi Chris, No, nothing has been incorporated in 4.0.20. Because we had trouble with this bug, as it was not totally repeatable (could not repeat it on our main Mac OS X Panther build host, only on some colleagues workstations). Hence we could not find the real cause of the problem. But I agree the band-aide is better than nothing. So I am going to disable SIGHUP and SIGQUIT catching on Mac OS X on 4.0.21. Could you please test this patch (it's just a bit different from yours: I am trying if just commenting the sigaddset()s is enough, without commenting the 'case SIGHUP') and see if it makes a workable server at your site? ===== configure.in 1.300 vs edited ===== *** /home/guilhem/tmp/bk_configure.in-1.300_XRPlrP 2004-05-12 01:22:58 +02:00 --- edited/configure.in 2004-06-03 15:05:50 +02:00 *************** *** 1031,1037 **** *darwin7*) if test "$ac_cv_prog_gcc" = "yes" then ! FLAGS="-DHAVE_DARWIN_THREADS -D_P1003_1B_VISIBLE -DSIGNAL_WITH_VIO_CLOSE -DSIGNALS_DONT_BREAK_READ" CFLAGS="$CFLAGS $FLAGS" CXXFLAGS="$CXXFLAGS $FLAGS" MAX_C_OPTIMIZE="-O" --- 1031,1037 ---- *darwin7*) if test "$ac_cv_prog_gcc" = "yes" then ! FLAGS="-DHAVE_DARWIN_THREADS -D_P1003_1B_VISIBLE -DSIGNAL_WITH_VIO_CLOSE -DSIGNALS_DONT_BREAK_READ -DIGNORE_SIGHUP_SIGQUIT" CFLAGS="$CFLAGS $FLAGS" CXXFLAGS="$CXXFLAGS $FLAGS" MAX_C_OPTIMIZE="-O" ===== sql/mysqld.cc 1.495 vs edited ===== *** /home/guilhem/tmp/bk_mysqld.cc-1.495_kbCuMC 2004-05-19 15:51:02 +02:00 --- edited/sql/mysqld.cc 2004-06-03 14:43:37 +02:00 *************** *** 1643,1651 **** sigaddset(&set,SIGPIPE); #endif sigaddset(&set,SIGINT); sigaddset(&set,SIGQUIT); - sigaddset(&set,SIGTERM); sigaddset(&set,SIGHUP); /* Fix signals if blocked by parents (can happen on Mac OS X) */ sigemptyset(&sa.sa_mask); --- 1645,1655 ---- sigaddset(&set,SIGPIPE); #endif sigaddset(&set,SIGINT); + #ifndef IGNORE_SIGHUP_SIGQUIT sigaddset(&set,SIGQUIT); sigaddset(&set,SIGHUP); + #endif + sigaddset(&set,SIGTERM); /* Fix signals if blocked by parents (can happen on Mac OS X) */ sigemptyset(&sa.sa_mask); *************** *** 1728,1738 **** #ifdef USE_ONE_SIGNAL_HAND (void) sigaddset(&set,THR_SERVER_ALARM); // For alarms #endif (void) sigaddset(&set,SIGQUIT); - (void) sigaddset(&set,SIGTERM); #if THR_CLIENT_ALARM != SIGHUP (void) sigaddset(&set,SIGHUP); #endif (void) sigaddset(&set,SIGTSTP); /* Save pid to this process (or thread on Linux) */ --- 1732,1744 ---- #ifdef USE_ONE_SIGNAL_HAND (void) sigaddset(&set,THR_SERVER_ALARM); // For alarms #endif + #ifndef IGNORE_SIGHUP_SIGQUIT (void) sigaddset(&set,SIGQUIT); #if THR_CLIENT_ALARM != SIGHUP (void) sigaddset(&set,SIGHUP); #endif + #endif + (void) sigaddset(&set,SIGTERM); (void) sigaddset(&set,SIGTSTP); /* Save pid to this process (or thread on Linux) */ It's supposed to disable SIGHUP and SIGQUIT on Darwin7 kernels which are those of Mac OS X 10.3 if I am not mistaken. After you have applied this patch, you will need to run "autoconf" in the source directory, so that it updates "configure" from "configure.in". If that does not work (if you don't have autoconf?) then better update "configure" manually with the same change as in "configure.in". Then ./configure; make; . If yes, I will incorporate that in 4.0.21. Thank you for your patience!
[3 Jun 2004 15:12]
Guilhem Bichot
Hey, beware of the browser wrapping patch lines here. The added "-DIGNOREetc" must be on the same line as the others existing "-D". Thanks!
[13 Aug 2004 15:42]
Guilhem Bichot
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bugfix, yourself. More information about accessing the source trees is available at http://www.mysql.com/doc/en/Installing_source_tree.html Additional info: Hi, As we need to get 4.0.21 out, and hopefully without this bug, I am including the patch in MySQL 4.0.21. Please let me know, if you upgrade to 4.0.21, if problem still persists. Thank you. Guilhem ChangeSet@1.1958, 2004-08-13 15:29:47+02:00, guilhem@mysql.com