Bug #11354 | Crashing 5.0 replication slaves | ||
---|---|---|---|
Submitted: | 15 Jun 2005 14:38 | Modified: | 15 Sep 2005 1:08 |
Reporter: | Beat Vontobel (Silver Quality Contributor) (OCA) | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 5.0.11 Beta | OS: | Linux (SuSE Linux Professional (2.4.21)) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[15 Jun 2005 14:38]
Beat Vontobel
[16 Jun 2005 8:06]
Beat Vontobel
Just tried the brand new 5.0.7.-build this morning: Same effect, the slaves crashes as soon as it replicates the production server.
[21 Jul 2005 1:54]
MySQL Verification Team
Could you please provide your my.cnf files for master and slave for to setup an initial test on my side and if possible the database schema with some limited data which permits to run the queries which provoke the slave crash. Thanks in advance.
[21 Jul 2005 19:23]
Beat Vontobel
Thanks Miguel, I'll just attach you the complete schema.sql and the my.cnf for the master. On the slave I tried the same file as on the master as well as the default my-medium.cnf, with the same crashing result (just changed sever-id for the slave, of course). But these files won't probably be of any use to you, as long as I can not tell you which single queries crash the slave. And that's my actual problem! I ran hundreds of test queries on my testing environment and never got the slave to crash. But as soon as I hook it up to the production master server (4.1.x) under heavy load, the 5.0.x slave crashes after a few seconds to minutes. What I do: Set up a clean 5.0.x server, install our latest database backup (~9 GB). The 5.0.x server runs perfectly with this data. Then I set it up as a slave (CHANGE MASTER TO position of the backup) and start replication (at this time there is probably already some 500MB of binary log to catch up for it on the master). While catching up to the master, the slave crashes. My problem is: All the data that's left after the crash does not leave enough clues for me to determine the exact query in the ~500 MB binary log that made the slave crash (I hope there actually is one query - and it's not a combination of queries that are the reason). As you see the above crash report says "thd->query at 0x4ce3fa12 is invalid pointer", the info in the master.info file doesn't seem to be accurate after the crash (I still tried all the queries around the latest position and none of them crashed the server). How can I really find out, where exactly the slave crashed? How can I reproduce this in a small testing environment? I just need a hint from you and I'd love to provide you with a really neat and small test case as I did with my latest bug reports...
[21 Jul 2005 19:37]
MySQL Verification Team
Sorry, We need fully repeatable test case. In order to catch a query that produces a crash, start general log on the slave.
[17 Aug 2005 7:28]
Beat Vontobel
Listen, I really spent hours on this during the last weeks: I just can't find a small test case. Of course I activated binary logging on the slave with log-slave set. Then of course the last query written to the binary log of the slave should be the one before the crashing query, that's what i thought, at least... But actually the log stops everytime just in front of a different query. The only common thing amongst all these crashing queries that I can see is, that they're all of the type "REPLACE INTO ... SELECT ..." with some JOINS in the select part. So I executed these queries in a clean testing environment (same server version, same configuration - but no other data manipulating queries on master and slave and only the tables involved in the suspect queries plus the mysql system database) - and it just passes without crashing! I must be missing something that's different between the production and testing environment - but I just can't get it (must be on the master 4.1.x master as I use the exactly same slave for testing against the production and the testing master). Is it possible that the queries before and after the suspect queries somehow interact? But the bug is definitely repeatable: I checked out _every_ build since 5.0.6 (5.0.11 this morning). The 5.0.x slaves _always_ crash after working through some dozen MB of our binary log (generated by 4.1.x masters, always the latest build, right now 4.1.13). I'm really desperate because I can't get this to reproduce in a small testing environment convenient for upload. It's a real time eater as well: It takes an hour every time to set up a testing environment with all our production data (backup, copy and load data into testing master and slave). What can I do? I'm sure you want to fix all the crashing bugs before you go from beta to production with 5.0.x. And I'm really keen to switch as quickly as possible from 4.1.x to 5.0.x as soon as you declare it stable (and this bug is fixed)...
[11 Sep 2005 7:46]
Beat Vontobel
This issue seems to be solved in 5.0.12! I just hooked up a 5.0.12 build as slave against our 4.0.14 master and it is up and running flawless since about 36 hours while builds 5.0.11 and before still crash after a few seconds. Was this a duplicate for one of the other fixed crashing bugs or was the problem solved "by accident"?
[11 Sep 2005 9:49]
Valeriy Kravchuk
May be related to http://bugs.mysql.com/bug.php?id=12532, that is fixed in 5.0.12, as mentioned in http://dev.mysql.com/doc/mysql/en/news-5-0-12.html. Can we mark it as "not a bug", starting with version 5.0.12?
[11 Sep 2005 15:02]
Beat Vontobel
#12532 is probably a candidate for a duplicate to this one, but I was just contacted by one of your developpers (John David Duncan) directly to investigate this bug. He checks with someone of the replication team to see if this bug is really a solved duplicate. I'll wait for their answer and if we're sure that this one's really solved and shouldn't pop up again (and didn't just disappear in 5.0.12 by coincidence) I'll mark it as closed. Is this okay?
[14 Sep 2005 13:43]
Lars Thalmann
Hi Beat, Thank you very much for all your efforts in keeping us informed about this failure. I think that this indeed is a duplicate of BUG#12532 and suggest we close this report as "Duplicate".
[14 Sep 2005 20:50]
Beat Vontobel
Hi Lars, I think that's fine, even if BUG#12532 was actually a duplicate for this one ;-) and not the other way round. Thanks for all your efforts at MySQL! I'm really happy to finally have a Beta build to hook up to our production master and be able to really start to check out all the new features. Beat