Bug #21781 | Replication slave io thread hangs | ||
---|---|---|---|
Submitted: | 22 Aug 2006 12:23 | Modified: | 15 Mar 2007 16:41 |
Reporter: | Andrew Tulloch | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 5.0.24 | OS: | FreeBSD (FreeBSD 6.1-p3, Linux, all) |
Assigned to: | Magnus Blåudd | CPU Architecture: | Any |
Tags: | bfsm_2007_02_15, openssl, SSL |
[22 Aug 2006 12:23]
Andrew Tulloch
[29 Aug 2006 20:52]
Sveta Smirnova
Thank you for the report. I can not repeat the problem using current BK sources. Could you please provide your ktrace file?
[29 Sep 2006 22:21]
Justin Swanhart
I can reproduce on FreeBSD 4.10 and 4.11 using 4.0.26 and 4.1.18 (both of which are older version but these are in production) easy test scenario: install MySQL on machine 'A' ensure log-bin is set in my.cnf grant all on *.* to repl@'%' identified by 'repl' (for convenience) install MySQL on machine 'B' change master to master_host = 'MachineA' master_log_file = 'machinea-bin.000001' master_pass = 'repl'; on Machine B: stop slave; (gets stuck w/ status 'Killing Slave', see processlist below) Once you try to kill the slave, anything else slave related (like show slave status) also hangs as demonstrated from this 'show full processlist' after I shut the slave down on Machine 'B'. Any event that writes to the binary log on Machine 'A' will end the slave i/o thread, such as 'flush logs'. Killing the binlog dump process on the master will also stop the i/o thread on the slave. ------------------ mysql> show full processlist \G *************************** 1. row *************************** Id: 28 User: root Host: localhost db: NULL Command: Query Time: 254 State: Killing slave <----------- STUCK IN KILLING SLAVE Info: stop slave *************************** 2. row *************************** Id: 31 User: system user Host: db: NULL Command: Connect Time: 263 State: Waiting for master to send event Info: NULL *************************** 3. row *************************** Id: 32 User: system user Host: db: NULL Command: Connect Time: 263 State: Has read all relay log; waiting for the slave I/O thread to update it Info: NULL *************************** 4. row *************************** Id: 33 User: root Host: localhost db: NULL Command: Query Time: 153 State: NULL Info: show slave status <-- ALSO STUCK AFTER STOP SLAVE ISSUE *************************** 5. row *************************** Id: 34 User: root Host: localhost db: NULL Command: Query Time: 0 State: NULL Info: show full processlist 5 rows in set (0.00 sec)
[29 Sep 2006 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[2 Oct 2006 8:05]
Andrew Tulloch
I attached the ktrace file as requested, but have seen no response since, I would've attached it sooner, but was on holiday for a week.
[3 Oct 2006 17:55]
Kris Karas
I was about to submit a bug for the Linux OS, but this appears to be the same issue. If I do: Mysql> start slave; Mysql> stop slave; The mysql client will hang indefinitely attempting to stop the slave. The only options at that point are (A) "killall -9 mysqld" or (B) log into the master machine and kill the slave's replication process. Additionally, rotating logs seems to break the connection between master and slave. A processlist on the slave shows it attempting to reconnect to the master, yet on the master, the slave process is still in existence (waiting for some data to send back). This is on a (Slackware 10.2) Linux platform running kernel 2.6.18, glibc 2.3.5, mysql 5.0.24a compiled against openssl 0.9.8d and with a slave user that connects via SSL. MySQL is compiled by hand with configure patched to correctly find the location of the openssl libs.
[7 Oct 2006 0:57]
Daniel Bakken
I see the same bug on Debian unstable, running MySQL 5.0.24a compiled from source with OpenSSL enabled. STOP SLAVE hangs unless: a) The master is connected to the slave and b) Either the master or the slave executes a SQL query such as INSERT or FLUSH LOGS Both a) and b) must occur in that order for STOP SLAVE to return. Can we have an ETA for a fix?
[9 Oct 2006 18:08]
Daniel Bakken
I have tested again on MySQL 5.0.24a compiled with OpenSSL support, Debian unstable. This bug does not occur if SSL is disabled. Connect to master via SSL and STOP SLAVE hangs. Connect to master without SSL and STOP SLAVE returns immediately. Conclusion: something is wrong with MySQL's implementation of replication using SSL.
[11 Oct 2006 9:34]
Sveta Smirnova
Thank you for the feedback and comments. Could you all please try using 5.0.26 version accessible from http://dev.mysql.com/downloads/mysql/5.0.html?
[20 Oct 2006 13:19]
[ name withheld ]
I've reproduce this bug with 5.0.26 on freebsd 5.4 Mysql compiled from sources with linuxthreads. Exactly, on slave "mysqladmin shutdown" or STOP SLAVE hangs until FLUSH LOGS or shutdown master.
[20 Oct 2006 16:58]
[ name withheld ]
When using native (KSE) threads in freebsd 5.4, STOP SLAVE working right.
[24 Oct 2006 19:04]
Daniel Bakken
Tested MySQL 5.0.26 compiled with openssl support. STOP SLAVE still hangs as before, unless an SQL statement such as FLUSH LOGS is executed on the master. Running Debian Unstable (Etch) with NPTL 2.3.6 on a 2.6.18 Linux kernel.
[2 Nov 2006 16:02]
Liam Gretton
I can confirm that the same problem exists with the following builds: Red Hat EL 4, MySQL 5.0.24a Solaris 9 (SPARC), MySQL 5.0.24a, 5.0.27 Solaris 10 (x86), MySQL 5.0.24a (64bit), 5.0.27 (64bit) On my Solaris systems, I've tried MySQL linked to OpenSSL 0.9.8c and 0.9.8d. If the user on the slave server used to perform the replication doesn't use SSL, then the slave server can be shut down without having to flush the logs on the master server.
[2 Nov 2006 20:05]
Sveta Smirnova
Liam, please, provide your configure options and name of compiler you use for Solaris 10 x86 builds.
[6 Nov 2006 9:32]
Liam Gretton
My options for building on Solaris 10 x86, using Sun Studio 11: setenv CFLAGS -xarch=amd64 setenv CXXFLAGS -xarch=amd64 setenv LDFLAGS "-R/usr/local/openssl/lib -xarch=amd64" ./configure --prefix=/usr/local/mysql --enable-thread-safe-client --with-openssl=/usr/local/openssl OpenSSL build is obviously also 64bit, built with the same compiler. On Solaris 9 I only build in 32bits. Even using the same compiler, it's necessary to add a couple of other options to CFLAGS and CXXFLAGS: setenv CFLAGS "-D_POSIX_C_SOURCE=199506L setenv CXXFLAGS -D__EXTENSIONS__""-D_POSIX_C_SOURCE=199506L -D__EXTENSIONS__"
[6 Nov 2006 14:46]
Markus Wernig
Hi all I've two notices: 1) Tried with various versions on various platforms. Replication over SSL works well with 5.0.18, broken since 5.0.24. 2) I noticed that only the IO thread seems to hang: STOP SLAVE SQL_THREAD returns with no errors, while STOP SLAVE IO_THREAD locks up the slave server. In the wake of testing I had one case (not reproducible right now, though), where the slave server locked up during normal operation, without the replication slave thread being explicitly stopped, and the master server locked up shortly afterwards. hth
[7 Nov 2006 8:32]
Markus Wernig
Could please someone (Sveta?) change the OS Tag of this bug? It's really cross-plattform, and I think it's quite a showstopper.
[10 Nov 2006 8:17]
Markus Wernig
Hi This bug is still in the status "Need Feedback". Is there anything we can do to get it accepted and worked on?
[12 Nov 2006 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[14 Nov 2006 8:54]
Liam Gretton
Lots of feedback has been provided. What other information could we provide that can help?
[14 Nov 2006 11:14]
Sveta Smirnova
Thank you, Liam, for the comments and configuration string. I was away from my computer last week and therefore didn't change status of the report in time. Automatically message "No feedback" generates if status of bug was "Need feedback" and original reporter do not provide feedback.
[16 Nov 2006 15:54]
Liam Gretton
No problem Sveta, let us know if there's anything else we can do to help.
[22 Nov 2006 17:57]
Daniel Lafraia
I just had the same kind of problem using 5.0.27 and no SSL replication. For some reason the slave replication hangs with no further explanation. It started with a lot of errors like this in the slave server: 061122 14:21:35 [ERROR] Got error 134 when reading table './dbname/table' 061122 14:21:36 [ERROR] Got error 134 when reading table './dbname/table' 061122 14:21:36 [ERROR] Got error 134 when reading table './dbname/table' 061122 14:21:37 [ERROR] Got error -1 when reading table './dbname/table' 061122 14:21:37 [ERROR] Got error -1 when reading table './dbname/table' Then the error: mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=524288000 read_buffer_size=1044480 max_used_connections=701 max_connections=700 threads_connected=682 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 3376394 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. You seem to be running 32-bit Linux and have 682 concurrent connections. If you have not changed STACK_SIZE in LinuxThreads and built the binary yourself, LinuxThreads is quite likely to steal a part of the global heap for the thread stack. Please read http://www.mysql.com/doc/en/Linux.html thd=0x9ac6ed98 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... Cannot determine thread, fp=0xbe61f088, backtrace may not be correct. Stack range sanity check OK, backtrace follows: 0x80a3877 0x82f21d8 0x82bb1af 0x815d5cb 0x80b4e70 0x80b3af4 0x80b3044 0x82ef98c 0x83192ca New value of fp=(nil) failed sanity check, terminating stack trace! Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trac e. Resolved stack trace is much more helpful in diagnosing the problem, so please do resolve it Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at (nil) is invalid pointer thd->thread_id=1154060 The manual page at http://www.mysql.com/doc/en/Crashing.html contains information that should help you find out what is causing the crash. Number of processes running now: 0 061122 14:16:19 mysqld restarted 061122 14:16:19 [Warning] Asked for 196608 thread stack, but got 126976 061122 14:16:19 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.0.27-standard' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Edition - Standard (GPL) 061122 14:16:19 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000130' at position 102123129, relay log './v6-relay-bin.000006' position: 102123266 values ( "1169895826", "1994", "3ade68b7g5d465ea3", "18971238", now() )', Error_code: 126 ========================================= After that, it crashed again... mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=524288000 read_buffer_size=1044480 max_used_connections=608 max_connections=700 threads_connected=520 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 3376394 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. You seem to be running 32-bit Linux and have 520 concurrent connections. If you have not changed STACK_SIZE in LinuxThreads and built the binary yourself, LinuxThreads is quite likely to steal a part of the global heap for the thread stack. Please read http://www.mysql.com/doc/en/Linux.html thd=0x9a9cbd38 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... Cannot determine thread, fp=0xbcc9d1f8, backtrace may not be correct. Stack range sanity check OK, backtrace follows: 0x80a3877 0x82f21d8 0x8285f67 0x1 New value of fp=(nil) failed sanity check, terminating stack trace! Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved stack trace is much more helpful in diagnosing the problem, so please do resolve it Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0xa15ab50 = SELECT p.id FROM l,p,b WHERE b.id = p.bID AND p.id = l.pID AND MATCH(content) AGAINST ("search string here") LIMIT 100 thd->thread_id=380 The manual page at http://www.mysql.com/doc/en/Crashing.html contains information that should help you find out what is causing the crash. Number of processes running now: 21 mysqld process hanging, pid 8822 - killed mysqld process hanging, pid 8557 - killed I appreciate your attention. Best Regards, Daniel http://www.webcit.com.br/
[22 Dec 2006 8:55]
Sveta Smirnova
test case
Attachment: rpl_bug21781.test (application/octet-stream, text), 579 bytes.
[22 Dec 2006 8:57]
Sveta Smirnova
Bug is not repeatable with last BK sources compiled using BUILD/compile-ppc-debug-max script on Intel Mac. For testing used attached test case.
[3 Jan 2007 12:59]
Markus Wernig
OK, this seems to be stalling. The problem does not appear when using the bundled yaSSL libs, or when using any source prior to 5.0.23 (5.0.22 and below is - contrary to my earlier posting - not affected) The SETUP.sh script called from compile-ppc-debug-max uses yaSSL, so it will not reproduce the error: [...] # SSL library to use. SSL_LIBRARY=--with-yassl [...] Please try to reproduce the bug with openssl. Various people here have done so, on various platforms, and they all report the same symptom. Also please note that most of us used two distinct machines for testing. As far as I understand from the few lines of the test case (Sveta, please provide the files included in the first two lines as well), it was run on one single machine. Please ask for any further information you might need to work on this. (I will add an attachment with a step-by-step description of how to reproduce - which does not differ from earlier posts, though) Hope this helps /markus
[3 Jan 2007 13:08]
Markus Wernig
Description of compilation and how to reproduce: http://xfer.ch/files/reproduce_bug21781.txt (couldn't add file to bug)
[15 Jan 2007 12:32]
Sveta Smirnova
Thank you for the feedback. Please upgrade to current 5.0.33 server and try with our example certificates located in the source-dir/mysql-test/std_data directory. We have bug report (Bug #25189) about not forgiving behaviour if certificates contain leading white-space symbols. I want to check if your case something correlated with that.
[31 Jan 2007 20:25]
Torrey Hoffman
I have the same (or a very similar bug) with MySQL 5.0.33 on Linux. I set up a replicating master/slave pair going using the same compile and run time configuration which I have used successfully with MySQL 5.0.15. But the slave stops replicating after a few minutes, with nothing in the logs. At that point, connections to MySQL which attempt to "STOP SLAVE", or "SHOW SLAVE STATUS" will also hang -- including the mysql command line client. If I leave it "hung" like this, after about 7 minutes it spontaneously (?) unfreezes itself and this appears in the slave's mysql.err log: 070131 12:00:36 [Note] Slave I/O thread killed while waiting to reconnect after a failed read 070131 12:00:36 [Note] Slave I/O thread exiting, read up to log 'frodo.000002', position 98 070131 12:00:36 [Note] Error reading relay log event: slave SQL thread was killed My mysql configuration file includes this: # How many seconds to wait for master before deciding the connection is broken and retrying, default is 3600 slave-net-timeout=600 # Seconds to wait between reconnect retries. Default is 60. master-connect-retry=10
[31 Jan 2007 20:36]
Torrey Hoffman
Having read through the comments on this bug, I'm concerned that it is still marked as "Need Feedback". Many people have provided feedback. This is a showstopper bug for us as well. What additional feedback is required? Master/slave replication not working seems like it ought to be a very high priority to have fixed as soon as possible! Can we get an ETA on a fix for this? Ot at least a status update?
[31 Jan 2007 21:26]
Sveta Smirnova
Hi Torrey, bug is in the "Need feedback" status, because nobody from mysql.com haven't repeated this bug yet. But because many people outside can repeat, bug is open. I tried to repeat this bug at least on 5 different machines without success. To check my guesswork what problem can be certificate handling, I asked to try our certificates we use for tests. It is mean why it is in the "Need feedback" status.
[1 Feb 2007 1:16]
Torrey Hoffman
Another update: I have reproduced this problem again using MySQL 5.0.33, and the SSL keys which came with the MySQL distribution package in the mysql-test/std_data directory. I set the system up as I always have before. A database snapshot was obtained on the master using mysqldump, copied to the slave machine, customized to include the MASTER_HOST, MASTER_USER, MASTER_PASSWORD, and MASTER_SSL=1 information, and then installed. I then started slave replication with 'START SLAVE'. At that point, replication appeared to be working -- for a few seconds at least. The output of 'SHOW SLAVE STATUS' included: Slave_IO_State: Waiting for master to send event Master_Host: frodo.lockdownnetworks.com Master_User: ha Master_Port: 3306 Connect_Retry: 15 Master_Log_File: frodo.000002 Read_Master_Log_Pos: 98 Relay_Log_File: sam-relay-bin.000002 Relay_Log_Pos: 231 Relay_Master_Log_File: frodo.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 98 Relay_Log_Space: 231 Until_Condition: None Master_SSL_Allowed: Yes Master_SSL_CA_File: Master_SSL_CA_Path: /etc/mysql Master_SSL_Cert: /etc/mysql/client-cert.pem Master_SSL_Cipher: Master_SSL_Key: /etc/mysql/client-key.pem Seconds_Behind_Master: 0 However, a few seconds later, replication stopped: there were no messages in the mysql.err log on either the master or the slave, but the output of 'SHOW SLAVE STATUS' changed as follows (showing the changed lines -- in particular, all the *_Log_Pos lines were the same): Slave_IO_State: Waiting to reconnect after a failed master event read Relay_Master_Log_File: frodo.000002 Slave_IO_Running: No Slave_SQL_Running: Yes Seconds_Behind_Master: NULL At this point I tried to 'SHOW SLAVE STATUS' again, and the command hung. With another connection, I tried 'STOP SLAVE' and that hung too. 'Ctrl-C' at the mysql prompt results in 'Query aborted by Ctrl+C' but it does not actually return to a prompt, it is still hung. A second 'Ctrl-C' returns me to the shell command prompt. I followed the tip reported by Justin Swanhart and issued 'FLUSH LOGS' on the master. That un-froze the slave database. I then discovered that if the slave database replication stops, with the "Slave_IO_State: Waiting to reconnect after a failed master event read" state, I can issue a 'FLUSH LOGS' on the master and it will correct the problem... for a few seconds at least, it will go back to "Slave_IO_State: Waiting for master to send event". It gets more interesting as I continue to experiment... If I repeatedly issue an UPDATE command -- even if it doesn't change anything in the database -- the slave system will maintain the good "Slave_IO_State: Waiting for master to send event" state, and the output of 'SHOW SLAVE STATE' will show the Read_Master_Log_Pos counter incrementing. But if I stop issuing that do-nothing UPDATE command, within a few seconds the slave will revert to "Slave_IO_State: Waiting to reconnect after a failed master event read". This is completely reliable... database activity -- at least, any activity which might modify the database -- keeps the slave running, but if nothing happens on the slave, replication stops! So... it seems like a "workaround" is to keep issuing do-nothing update commands or flush logs on the master machine every second!
[1 Feb 2007 10:54]
Sveta Smirnova
Thank you all for the feedback. Please try to create core file as described at http://dev.mysql.com/doc/refman/5.0/en/using-gdb-on-mysqld.html and attach your configuration files for master and slave.
[1 Feb 2007 17:47]
Rick James
I have maintained a 4.0 master / multi-slave setup for 4 years. It has intra- and inter-colo hops. I don't think I have ever seen this problem. Of note is that my system is NOT using SSL. (FreeBSD, Many 4.0 Mysql versions, currently 4.0.26; often mixed master-slave versions.)
[1 Feb 2007 18:49]
Rick James
Suggestion: Downgrade to ports/linuxthreads-2.2.3_19 (fwd from Jay J.)
[2 Feb 2007 8:35]
Sveta Smirnova
Please also provide output of the command getconf GNU_LIBPTHREAD_VERSION
[2 Feb 2007 19:11]
Torrey Hoffman
My machines are based on Debian Sarge. I will attach the mysql configuration files. Sveta Smirnova asked for the output of "getconf GNU_LIBPTHREAD_VERSION", it is: NPTL 0.60 Here is a backtrace from GDB. This is on the slave system ("sam"), after it gets into the bad state with "Slave_IO_State: Waiting to reconnect after a failed master event read" and "Slave_IO_Running: No". The system has not crashed, if an event comes in from the master, it will unfreeze and carry on. So, I attached gdb to the running process. It is simply waiting in select(). root@sam:~# gdb /usr/sbin/mysqld 2881 GNU gdb 6.3-debian ... Attaching to program: /usr/sbin/mysqld, process 2881 (no debugging symbols found) `system-supplied DSO at 0xffffe000' has disappeared; keeping its symbols. Reading symbols from /lib/tls/librt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/tls/librt.so.1 Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/tls/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/tls/libdl.so.2 Reading symbols from /usr/lib/i686/cmov/libssl.so.0.9.7...(no debugging symbols found)...done. Loaded symbols for /usr/lib/i686/cmov/libssl.so.0.9.7 Reading symbols from /usr/lib/i686/cmov/libcrypto.so.0.9.7...(no debugging symbols found)...done. Loaded symbols for /usr/lib/i686/cmov/libcrypto.so.0.9.7 Reading symbols from /lib/tls/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] [New Thread -1211911072 (LWP 2881)] [New Thread -1304106064 (LWP 2898)] [New Thread -1303974992 (LWP 2897)] [New Thread -1303843920 (LWP 2894)] [New Thread -1267905616 (LWP 2892)] [New Thread -1267774544 (LWP 2891)] [New Thread -1267643472 (LWP 2890)] [New Thread -1295455312 (LWP 2889)] [New Thread -1287066704 (LWP 2888)] [New Thread -1278678096 (LWP 2887)] [New Thread -1257116752 (LWP 2885)] [New Thread -1248728144 (LWP 2884)] [New Thread -1240339536 (LWP 2883)] [New Thread -1231950928 (LWP 2882)] Loaded symbols for /lib/tls/libpthread.so.0 Reading symbols from /lib/tls/libcrypt.so.1... (no debugging symbols found)...done. Loaded symbols for /lib/tls/libcrypt.so.1 Reading symbols from /lib/tls/libnsl.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/tls/libnsl.so.1 Reading symbols from /lib/tls/libm.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/tls/libm.so.6 Reading symbols from /lib/tls/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/tls/libc.so.6 Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/tls/libnss_compat.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/tls/libnss_compat.so.2 Reading symbols from /lib/tls/libnss_nis.so.2... (no debugging symbols found)...done. Loaded symbols for /lib/tls/libnss_nis.so.2 Reading symbols from /lib/tls/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/tls/libnss_files.so.2 Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libgcc_s.so.1 0xb7d0da27 in select () from /lib/tls/libc.so.6 (gdb) bt #0 0xb7d0da27 in select () from /lib/tls/libc.so.6 #1 0x0816efbe in handle_connections_sockets () #2 0x0816ea78 in main ()
[2 Feb 2007 19:19]
Torrey Hoffman
I am not allowed to attach files to this bug. Sorry for the very long comment, but here is the slave configuration file. The master configuration file is identical, except that it has a different server_id, and has "frodo" wherever the slave has "sam". # slave [client] port = 3306 socket = /var/run/mysqld/mysqld.sock [mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /var/tmp language = /usr/share/mysql/english default-character-set=utf8 init-connect=SET NAMES utf8 ssl-capath = /etc/mysql ssl-cert = /etc/mysql/server-cert.pem ssl-key = /etc/mysql/server-key.pem master-ssl-capath = /etc/mysql master-ssl-cert = /etc/mysql/client-cert.pem master-ssl-key = /etc/mysql/client-key.pem skip-external-locking skip-bdb log-bin = sam log-bin-index = sam log-error = sam report-host = sam relay-log = sam-relay-bin server_id = 1040428527 master-user = secret master-password = secret slave-net-timeout=45 master-connect-retry=15 relay-log-purge=1 key_buffer = 16M max_allowed_packet = 1M thread_stack = 128K set-variable = key_buffer=2M set-variable = myisam_sort_buffer_size=8M set-variable = join_buffer=1M set-variable = record_buffer=1M set-variable = sort_buffer=2M set-variable = thread_cache_size=256 set-variable = max_connect_errors=4294967295 set-variable = max_connections=500 query_cache_limit = 1M query_cache_size = 8M query_cache_type = 1M old_passwords = 1 log-slow-queries = /var/log/mysql/slow.log long_query_time = 5 log-slow-admin-statements log-warnings [mysqldump] quick quote-names max_allowed_packet = 1M [isamchk] key_buffer = 16M
[2 Feb 2007 20:15]
Markus Wernig
Hello I do agree that Torrey's bug is similar, yet definitely different from the one we are all experiencing. It might even warrant an own thread. As for the output of getconf GNU_LIBPTHREAD_VERSION: That variable does not exist on the Solaris 9 systems I reproduced the bug on. The only "thread" relevant sysvars are: POSIX_THREAD_ATTR_STACKADDR: 1 POSIX_THREAD_ATTR_STACKSIZE: 1 POSIX_THREAD_PRIORITY_SCHEDULING: 1 POSIX_THREAD_PRIO_INHERIT: 1 POSIX_THREAD_PRIO_PROTECT: 1 POSIX_THREAD_PROCESS_SHARED: 1 POSIX_THREAD_SAFE_FUNCTIONS: 1 PTHREAD_DESTRUCTOR_ITERATIONS: undefined PTHREAD_KEYS_MAX: undefined PTHREAD_STACK_MIN: undefined PTHREAD_THREADS_MAX: undefined _POSIX_THREADS: 1 _XOPEN_REALTIME_THREADS: 1 mysqld is linked against /usr/lib/libthread.so.1, which appears to be the pthread library shipped with Solaris. And: I've tried 5.0.32 with the keys provided in src/mysql-test/std_data, and - as expected - the behaviour doesn't change. The slave still hangs when issuing a "STOP SLAVE" command. And it hangs forever, or until a "flush logs" or similar is issued on the master. As to Rick's post: The bug seems to appear in 5.0.23, so 4.0 versions will not be affected. And it only bites when using SSL encryption with openssl (not even yaSSL).
[2 Mar 2007 14:47]
Magnus Blåudd
Related to http://bugs.mysql.com/bug.php?id=25203
[2 Mar 2007 15:07]
Magnus Blåudd
Also related to http://bugs.mysql.com/bug.php?id=24148
[5 Mar 2007 10:36]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/21111 ChangeSet@1.2458, 2007-03-05 10:07:22+01:00, msvensson@pilot.blaudden +2 -0 Bug#21781 Replication slave io thread hangs - Add test case that shows how slave server hangs in "STOP SLAVE" when run on MySQL version 5.0.33 compiled with OpenSSL. Works fine with latest version of MySQL since that problem has been fixed by patch for bug#24148. The fix has been noted in the changelog for MySQL 5.0.36
[8 Mar 2007 22:11]
Timothy Smith
pushed to 5.0.38, 5.1.17
[11 Mar 2007 12:33]
Markus Wernig
I can confirm that the bug is no longer present in 5.0.37. Thanks!
[15 Mar 2007 16:41]
Paul DuBois
Noted in 5.0.36, 5.1.15 changelogs. SSL connections could hang at connection shutdown.