Bug #29232 | Replication not usable because of corrupted slave relay logs | ||
---|---|---|---|
Submitted: | 20 Jun 2007 8:49 | Modified: | 27 Mar 2008 17:29 |
Reporter: | Stef Cant | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | Server version: 5.0.42 Gentoo Linux | OS: | Linux (Gentoo Linux) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | corrupt relay, replication |
[20 Jun 2007 8:49]
Stef Cant
[20 Jun 2007 8:50]
Stef Cant
slave status after failure
Attachment: show_slave_status.txt (text/plain), 2.29 KiB.
[20 Jun 2007 8:51]
Stef Cant
part of the master bin log file
Attachment: mysqlbinlog_master_firefoot-bin.000010.txt (text/plain), 25.09 KiB.
[20 Jun 2007 8:51]
Stef Cant
part of the slave relay log file
Attachment: mysqlbinlog_slave_relay-strider.000022.txt (text/plain), 6.48 KiB.
[20 Jun 2007 8:51]
Stef Cant
my thoughts about what happened
Attachment: investigation_faulty_query.txt (text/plain), 2.85 KiB.
[20 Jun 2007 9:35]
Andrei Elkin
Hi Stef. There was Bug #27583 which might relate and it appeared to be fixed only in 5.1 version. Before we merge that fix you can check the patch I putting here. *** /tmp/geta22959 2007-06-20 12:30:49.000000000 +0300 --- log_event.cc 2007-06-20 12:30:45.000000000 +0300 *************** int Rotate_log_event::exec_event(struct *** 3397,3402 **** --- 3397,3404 ---- memcpy(rli->group_master_log_name, new_log_ident, ident_len+1); rli->notify_group_master_log_name_update(); rli->group_master_log_pos= pos; + strmake(rli->group_relay_log_name, rli->event_relay_log_name, + sizeof(rli->group_relay_log_name) - 1); rli->group_relay_log_pos= rli->event_relay_log_pos; DBUG_PRINT("info", ("group_master_log_name: '%s' group_master_log_pos:\ %lu",
[20 Jun 2007 10:36]
Sveta Smirnova
Stef, please apply patch as Andrei Elkin suggested, try with it and inform us if patch is solved (or doesn't) the problem. Thanks in advance.
[20 Jun 2007 10:39]
Stef Cant
Ok, I have patched the master and the 3 slaves. I'll update this ticket when replication fails again, or in a few days when everything turns out to be ok.
[20 Jun 2007 10:47]
Sveta Smirnova
Stef, thank you for the feedback. Will wait results of your tests and set status of the report to "Need feedback".
[21 Jun 2007 8:13]
Stef Cant
Bad news, 2 slaves failed again, and the cause is almost identical. I'll add some files concerning this new failure, but you will see the scenario is almost a perfect copy of the first set of files.
[21 Jun 2007 8:14]
Stef Cant
output of 'show slave status'
Attachment: 2_show_slave_status.txt (text/plain), 2.29 KiB.
[21 Jun 2007 8:14]
Stef Cant
part of mysqlbinlog (master )
Attachment: 2_mysqlbinlog_master_firefoot-bin.000008.txt (text/plain), 48.79 KiB.
[21 Jun 2007 8:14]
Stef Cant
part of mysqlbinlog (slave)
Attachment: 2_mysqlbinlog_slave_relay-strider.000018.txt (text/plain), 22.68 KiB.
[21 Jun 2007 8:15]
Stef Cant
3 relevant queries
Attachment: 2_investigation_faulty_query.txt (text/plain), 2.46 KiB.
[28 Jun 2007 13:31]
Stef Cant
It's been a week now. Each day at least one slave failed because of a bad query. So that patch didn't help at all ... any other ideas or code/patches I can try?
[29 Jun 2007 13:46]
Stef Cant
Still no answers ... I would like to mention one more thing: the patch you suggested concerns an incoming rotation event from the master, but I'm not sure there was such an event, because there is a gap of at least 90 seconds between the failing query and the slave rotation.
[29 Jun 2007 13:53]
Sveta Smirnova
Stef, thank you for the feedback. We currently thinking about ways to reproduce this bug and which additional information can be needed if any. This is why we don't answer.
[19 Sep 2007 8:38]
jerry nieuviarts
Hi, We avec strictly the same problem. Here is our architecture : 2 masters with cross replication 18 slaves connected to of one of the two masters. Approximatively once a day, the replication crash on one of the 18 slave with an error like : 070919 2:39:57 [ERROR] Slave: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ó' at line 1' on query. Default database: 'monitoring'. Query: 'insert low_priority into hotdownload(sere,datemaj) values ('ftpclubic37.clubic.com','firmware_ts02_3517.exe','1213241',1190160423)ó', Error_code: 1064 We can see that the query is corrupted, the good one was : insert low_priority into hotdownload(serveur,fichier, taille,datemaj) values ('ftpclubic37.clubic.com','firmware_ts02_3517.exe','1213241',1190160423) Some characters are missing and others are added ("ó") We use the latest available production release ie : 5.0.45-community We are available to test thinks as soon as it is very difficule for us to work like that. Few weeks ago, we used 4.1.22 relase for month with no problem of this kind. Thank you for your help.
[16 Dec 2007 4:34]
Dave Juntgen
Has there been any progress on this? Or is it because no one can reproduce the problem?
[16 Jan 2008 8:18]
MySQL Verification Team
looks related to bug #26489
[27 Feb 2008 17:29]
Sveta Smirnova
Thank you for the report. Your case looks like similar to bug #26489 which has been fixed in version 5.0.56. Please upgrade to this version (or wait next Community release), try with it and say inform us if problem still exists.
[28 Mar 2008 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[20 May 2008 4:27]
James Day
If you get a replication error beginning with "You have an error in your SQL syntax;" please try getting a version that has the fix for bug #26489 in it, 5.0.56, 5.1.24, and 6.0.5 and later. We think that this will fix almost all problems of this type. If you do not have that fix yet, you can do this as a temporary workaround each time it happens: SHOW SLAVE STATUS STOP SLAVE; CHANGE MASTER TO master_log_file='insert the value from Relay_Master_Log_File', master_log_pos=insert the value from Exec_master_log_pos; START SLAVE; If you have a version with the fix for bug #26489 in it and still regularly see this problem please open a new bug report. It's expected that you'll see it occasionally due to corruption in transit but this shouldn't be many times a day or even many times a week. James Day, Support Engineer, MySQL/Sun