Bug #21923 relay log gets corrupted when communication with master is lost in a long read
Submitted: 30 Aug 2006 14:07 Modified: 31 Oct 2007 13:20
Reporter: Pau Aliagas Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:mysql-server-5.0.22-1.FC5.1 OS:Linux (Fedora Core 5)
Assigned to: CPU Architecture:Any
Tags: communication, corrupt, interrupted, relay, replica, socket

[30 Aug 2006 14:07] Pau Aliagas
Description:
First reported on the Fedora Bugzilla:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=204595

Description of problem:

The scenario is:
-we have a remote master and a replica
-communication between both is lost temporarily (1-2 minutes)
-the replica is in the middle of a long read AFAICT: for example reading an
insert of a blob

When this happens, the replica writes garbage to its relay log and, when it
tries to execute it, it finds that the sql in that position is just corrupted.

I'd say that if it can recover the socket connection, it does not write garbage,
but when it loses it, it writes the unfinished position in the log, actually
creating the problem.

How to repeat:
I'd say that every time that the connection is lost and the socket connection
times out.

Steps to Reproduce:
1. Start master and replica
2. Create a long INSERT that takes time to cross the network (better to have a
slow enough connection betwen them to make it easy to catch it)
3. Cut communicationfor 2 minutes
  
Actual results:

Garbage in the replica relay. In example inserting a 62576 bytes long blob:

060829 21:36:50 [Note] Slave I/O thread: connected to master
'satchmo@satchmo.smsarena.com:3306',  replication started in log
'satchmo-bin.000032' at position 808604296
060830  8:08:57 [ERROR] Error reading packet from server: Lost connection to
MySQL server during query ( server_errno=2013)
060830  8:08:57 [Note] Slave I/O thread: Failed reading log event, reconnecting
to retry, log 'satchmo-bin.000033' position 563828083
060830  8:08:57 [ERROR] Slave: Error 'You have an error in your SQL syntax;
check the manual that corresponds to your MySQL server version for the right
syntax to use near 'EMISeruanSuci', tamany='62576', id_tipus_contingut='599',
id_content_type='13', ' at line 1' on query. Default database: 'smsarena'.
Query: 'INSERT INTO contingut SET codi_contingut='EMIGerimis', tamany='47558',
id_tipus_contingut='590', id_content_type='1', contingut='#!AMR
<|<F4><82><B4>nX\0\0ao<BA>\0^S03<C0>\0\0\0<FE>~ESC<D4>\0\0\0\0=<FF><FB><90>*   
P<B0>T#$*n<F2><DD>c<FA><A5><E6>^?
<81><C9>$<A4>nP1^_<F6>*<B7>4<97>0<&l_D^Y<8F><98><E1><FE>}<C9><D7>&ã£*<F7><EE><E7>c<F0>/.Þ(<FA>llp<
7<A0>QY<80><C0>
^R<CC>~<B6><8F>it$P<8D>1Þ©v<A2>Þª%\"^W<85>.<DF>p<^\<88>\'^X^Xt<94><BE>^A<83>ZB<AF>n<E5>*^X<CA>Q?<E0>&<D3>:<B6><84><<D5>Z5^X^Y
^P<BE>^D<DD>|9M<86>\\D<C5>;{<B3>v<A0><AF><AA><E4><E7>t{>=0<^^m<9A>Z<B8><C0>^?>^A<87><9A>B_<FD><DB><FE>
<A1>v<FA><DE>!G<91>^W<BE><E9>\\<95><E0>#<80><<D8>?<90>\"<B3>^C9^^Q<81>5w<DC>Lr^Q!WH.<99>-<CE>m><U+07EF>^^1<E1><90><&m<9A>P^X<F0>
<FE>^S*^Z<A6><9F><C1><98><F3><F6>4@<F1><A7><A6>^\<C9><EA>^^^YE<BC><97><A0><
>*^P^Y<88><96>>     <80>^G^W\"<B6>z<D6>2<C3>j<97>^B<80>&<A4><F6>_^N<C1>}I<F0><
q<9A>H<B2><D9> ^^^W<83><B8>`<DD><C8><F9>^Dm^Y^T<9C><F8><BD><C7>^Cq^US<C2>6
<C7><F0><^\<C8>^Q@^Y<80>D~^U(<C6>g^Q^Y^X<80><8B>^O&<A4><92>><A8><AB><AA>{P$
à <^Z>!y<BC>F
~^W)<DA>(<B1><FB>^M^N<A0><FF><D1><C8>^D<D4>^S<90>1<C0>MqZ<A1><F0><<D8>É<91>ï¼~^O<87><C6>I+<E1>)N><A3>^\<91>^B&<E5>
<80>sV<CC>^E<8A>(<E0><<D8>ÉM<E5>D^Q<FE>^Wm<BB><E7><B1>O<97>M<F8><94><F0><EB>ESCU78+
  Y^?<99><81> <<DE><U+0243>3
<B6>b^W^,<9C>^EH<99><A4><A2><8E><86><87><C0><BE><CF>e.^K2<8E><AE><DF>q#<B0><D?<90>\'<ED>L4<9E>^OTH<E4>Ê£<8C>^SCi<A8>
ѼG<AE>!XL<96>P<AC><8C><80><<D8>>$9^]&^P<FE>^_/<FB>|9.<C0>w<BB><F1>l<D4>Ze^\mESCm<D2>.<C1>;<F0>
060830  8:08:57 [ERROR] Error running query, slave SQL thread aborted. Fix the
problem, and restart the slave SQL thread with "SLAVE START". We stopped at log
'satchmo-bin.000033' position 563779430

Expected results:

I'd expect that if the position (in mysql relay file terms) is not fully read,
nothing would be written to the relay log.

Suggested fix:
Do no write the log position to disk until we have it 100% complete, even if it is long (like a blob insert could be) or if the timeout takes long.

The problem is that an erroneous entry is written, if not there would be no problem, it would continue once the communication is recoverd.
[30 Aug 2006 14:33] Valeriy Kravchuk
Thank you for a problem report. Please, send the results of SHOW CREATE TABLE for that contingut table mentioned.
[30 Aug 2006 14:40] Pau Aliagas
As requested:

CREATE TABLE `contingut` (
  `id_contingut` int(11) unsigned NOT NULL auto_increment,
  `id_tipus_contingut` int(11) unsigned NOT NULL default '0',
  `codi_contingut` varchar(20) NOT NULL default '',
  `contingut` longblob,
  `tamany` int(11) unsigned NOT NULL default '0',
  `id_content_type` int(11) unsigned NOT NULL default '0',
  `data_alta` datetime NOT NULL default '0000-00-00 00:00:00',
  `data_darrer_us` datetime NOT NULL default '0000-00-00 00:00:00',
  `data_insert` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id_contingut`),
  UNIQUE KEY `id_tipus_contingut_codi_contingut` (`id_tipus_contingut`,`codi_contingut`),
  KEY `codi_contingut` (`codi_contingut`),
  KEY `id_tipus_contingut` (`id_tipus_contingut`),
  KEY `id_content_type` (`id_content_type`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=10000000 AVG_ROW_LENGTH=20000
[31 Aug 2006 12:07] Valeriy Kravchuk
Is it possible for you to upload original binary log and corrupted relay log for this case?
[31 Aug 2006 14:43] Pau Aliagas
We create 1Gb binary logs :( Do you want the previous and folowing positions or the whole file?

About the relay, I did not keep it and it was automatically deleted, I'll post it the next time it crashes.

Thanks for your help
Pau
[4 Sep 2006 17:22] Pau Aliagas
I've uploaded a file to the ftp: master_logs_and_relay_err.tar.bz2

It contains a few master positions that never reached the relay which I had to execute by hand.

I also contains one relay with errors (one of the previous). I can have more as the last time it crashed like 10 times in a row.

If you need further data,  please ask me.
Thanks
Pau
[28 Nov 2006 14:29] Valeriy Kravchuk
Please, try to repeat with a newer version, 5.0.27, and, in case of similar problem, check all your hardware and send my.cnf from both master and slave.
[29 Dec 2006 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[31 Oct 2007 10:10] James Day
Shane Bester from the Support team may have reproduced this one. More later as the tests continue.
[31 Oct 2007 13:20] MySQL Verification Team
Marking as a duplicate of Bug #26489