MySQL Bugs: #30223: Replication failing with error 1236

Bug #30223	Replication failing with error 1236
Submitted:	3 Aug 2007 8:29	Modified:	7 Nov 2015 7:28
Reporter:	Gary Wilson	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.0.41-community-nt	OS:	Windows (2003 Server)
Assigned to:	Andrei Elkin	CPU Architecture:	Any
Tags:	1236, replication

Description:
The slave loses its position in the master binlog, in fact it goes ahead of where the master is.  Replication runs fine for a while and then the slave just seems to randomly jump miles ahead of the master and it becomes impossible to replicate data.

070725 12:32:27 [Note] Slave SQL thread initialized, starting replication in
log 'mysql-bin.000041' at position 61787343, relay log
'.\cvp-db-slave1-relay-bin.000005' position: 98
070725 12:32:27 [Note] Slave I/O thread: connected to master
'replicator@192.168.1.204:3306',  replication started in log 'mysql-bin.000041'
at position 61787343
070725 12:32:27 [ERROR] Error reading packet from server: Client requested
master to start replication from impossible position ( server_errno=1236)
070725 12:32:27 [ERROR] Got fatal error 1236: 'Client requested master to start
replication from impossible position' from master when reading data from binary
log
070725 12:32:27 [Note] Slave I/O thread exiting, read up to log
'mysql-bin.000041', position 61787343

But the master was in position.....

mysql> show master status;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000041 | 28743668 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

The latest is:

070731 16:37:41 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=C:\programs\mysql\mysql5\Data\cvp-db-slave1-relay-bin' to avoid this problem.
070731 16:37:41 [Note] C:\programs\mysql\mysql5\bin\mysqld-nt: ready for connections.
Version: '5.0.41-community-nt'  socket: ''  port: 3306  MySQL Community Edition (GPL)
070731 16:37:48 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000041' at position 7468483, relay log 'C:\programs\mysql\mysql5\Data\cvp-db-slave1-relay-bin.000002' position: 7468620
070731 16:37:48 [Note] Slave I/O thread: connected to master 'replicator@192.168.1.204:3306',  replication started in log 'mysql-bin.000041' at position 41298646
070801  1:05:40 [ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236)
070801  1:05:40 [ERROR] Got fatal error 1236: 'Client requested master to start replication from impossible position' from master when reading data from binary log
070801  1:05:40 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000041', position 66452281
070802 15:32:13 [Note] Error reading relay log event: slave SQL thread was killed
070802 15:32:16 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000041' at position 66452281, relay log 'C:\programs\mysql\mysql5\Data\cvp-db-slave1-relay-bin.000006' position: 25153870
070802 15:32:16 [Note] Slave I/O thread: connected to master 'replicator@192.168.1.204:3306',  replication started in log 'mysql-bin.000041' at position 66452281
070802 15:32:16 [ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236)
070802 15:32:17 [ERROR] Got fatal error 1236: 'Client requested master to start replication from impossible position' from master when reading data from binary log
070802 15:32:17 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000041', position 66452281

How to repeat:
Rebuild replication from scratch, restart replication and wait for X amount of time - slave goes ahead????

Both the slave and master are Win2003 server and both are running 5.0.41-community-nt

Thank you for the report.

If look attention at the timestamps from your error log is possible to notice slave never run normally, but started from impossible position and immediately stoped:

070725 *12:32:27* [Note] Slave I/O thread: connected to master
'replicator@192.168.1.204:3306',  replication started in log 'mysql-bin.000041'
at position 61787343
070725 *12:32:27* [ERROR] Error reading packet from server: Client requested
master to start replication from impossible position ( server_errno=1236)

Please describe step by step how you set up replication "from scratch".

Replication starts fine and ran in this instance from 070731 16:37:48 until 070801  1:05:40 (as shown below)

070731 16:37:48 [Note] Slave I/O thread: connected to master
'replicator@192.168.1.204:3306',  replication started in log 'mysql-bin.000041' at
position 41298646
070801  1:05:40 [ERROR] Error reading packet from server: Client requested master to start
replication from impossible position ( server_errno=1236)
070801  1:05:40 [ERROR] Got fatal error 1236: 'Client requested master to start
replication from impossible position' from master when reading data from binary log
070801  1:05:40 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000041',
position 66452281

##################

As for how replication is setup, here is a step by step overview of the process.

On MASTER open my.ini and add

log-bin=mysql-bin
server-id=1

Open MySQL Administrator and on user root Add Host User can connect from and add IP address of SLAVE (192.168.1.207)

Create user replicator on MASTER and Add Host User can connect from and add IP address of SLAVE (192.168.1.207) and localhost

mysql -u root -p
GRANT REPLICATION SLAVE ON *.*
TO 'replicator'@'localhost' IDENTIFIED BY 'password';

ON MASTER
mysql> GRANT REPLICATION SLAVE ON *.*
    -> TO 'replicator'@'localhost' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT REPLICATION SLAVE ON *.*
    -> TO 'replicator'@'192.168.1.207' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT SUPER ON *.*
    -> TO 'replicator'@'localhost' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT SUPER ON *.*
    -> TO 'replicator'@'192.168.1.207' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

mysql> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000041 |       98 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

Stop MySQL MASTER

On SLAVE stop MySQL and add to my.ini

log-bin=slave1-mysql-bin
server-id=2
skip-slave-start

ENSURE THAT THE innodb_log_file_size=190M is set to the SAME size in both the master and slave my.ini!!!

On the SLAVE move/delete all files in the mysql\data dir.  Now copy ALL the data from the directory and subdirectories (\mysql\data dir) to a temp directory on the slave. MAKE SURE TO INCLUDE THE ibdata1 FILE!!!!! Do not include any ib_logfile* or mysql-bin.###### files.

Start MySQL on the MASTER

Start MySQL on SLAVE

ON SLAVE:
CHANGE MASTER TO
MASTER_HOST='192.168.1.204',
MASTER_USER='replicator',
MASTER_PASSWORD='password',
MASTER_LOG_FILE='mysql-bin.000041',
MASTER_LOG_POS=98;
START SLAVE;

ON MASTER:
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000041 |     6578 |              |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

ON SLAVE:
mysql> show slave status;
<snip>
|                | 192.168.1.204 | replicator  |        3306 |            60 | m
ysql-bin.000041 |                6578 | cvp-db-slave1-relay-bin.000001 |
<snip>
1 row in set (0.00 sec)

On SLAVE edit my.ini and comment (#):
#skip-slave-start
Save file.

Replication ...... DONE.

Is there any update on this?

This got more serious as we now require replication due to load.

Is this a bug????

I can confirm that this bug does exist; and I have experienced this issue under the a personal set of machines under the configuration as described by Gary Wilson.

From the error state page; error 1236 is described as:
Error: 1236 SQLSTATE: HY000 (ER_MASTER_FATAL_ERROR_READING_BINLOG)
Message: Got fatal error %d: '%s' from master when reading data from binary log

Can anybody give some indication as to the best places to look?

Thank you for the feedback. Could you please point me where our Manual
recommend the below step reported:

"On the SLAVE move/delete all files in the mysql\data dir.  Now copy ALL the data from the
directory and subdirectories (\mysql\data dir) to a temp directory on the slave. MAKE SURE
TO INCLUDE THE ibdata1 FILE!!!!! Do not include any ib_logfile* or mysql-bin.######
files."

Thanks in advance.

Hi Miguel

Thanks for getting back to this.  In answer to your question, here (http://dev.mysql.com/doc/refman/5.0/en/replication-howto-rawdata.html) I apologise if my decription of the procedure was not 100% clear.

However, in addition to this we have solve the one issue but we are still experiencing the issue of the Duplicate entry when we first start replication if I do not add:

slave-skip-errors=1062

To the config file as the replication process always fails on start with 'Duplicate Entry' error, however if I start it skipping error 1062 and the stop the slave, comment out the slave-skip-errors and start it again, then it works fine thereafter.  So only on the initial start does that occur, thereafter it works fine.  So this bug is resolved on our side and this new condition is either a procedural or configuration issue on my part or a different bug.

Changed Severity level

Please, try to repeat with a newer version, 5.0.51a, and inform about the results.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Gerry, Dean,

we still need to know if you have this issue when using newer MySQL version. The newest version at the moment is 5.0.51a.

Hi

Apologies for the lack of feedback.  We are still on 5.0.41 at the moment and we are migrating onto Linux so this is not as critical and my workaround for the 
 'Duplicate Entry' error is still working.  So for the moment this is the best we can do as we do not have time to test this on 5.0.51 and the old mantra of "if it ain't broken, don't try to fix it... when you don't have time" applies here at the moment.

Sorry I cannot confirm that 5.0.51 will solve our Duplicate Entry problem at the moment.

Dean,

what about you? Please, could you test if you still have this issues by using MySQL 5.0.51a.

I too am unable to commit time to this issue;

Thanks,
Dean.

NOTE:  I am using 5.0.51a and seeing this issue

Inside of my /etc/my.cnf file I have the following line
slave-skip-errors=1062 

Yet at times I get the following

       Replicate_Ignore_DB: 
         Replicate_Do_Table: 
     Replicate_Ignore_Table: 
    Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
                 Last_Errno: 1062
                 Last_Error: Error 'Duplicate entry '178424' for key 1' on query. Default database: ............. "insert statement here"
               Skip_Counter: 0
        Exec_Master_Log_Pos: 353126939
            Relay_Log_Space: 8247464033
            Until_Condition: None
             Until_Log_File: 
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File: 
         Master_SSL_CA_Path: 
            Master_SSL_Cert: 
          Master_SSL_Cipher: 
             Master_SSL_Key: 
      Seconds_Behind_Master: NULL
1 row in set (0.03 sec)

I have a couple of systems setup this way and am only seeing it on one.  The only difference between the systems is I added the following in my.cnf

read_buffer_size = 8192K
read_rnd_buffer_size = 8192K

I understand that these are larger than normal.  I will back these sizes down and see if the issue still happens

Note:

Did not fix it.. I bumped the following down

read_buffer_size = 2048K
read_rnd_buffer_size = 2048K

Please disregard my prior notes.. It was user error.. I had 2 lines in my my.cnf file that had skip slave errors.. 1062 was the first line the second line was a different error # that I didnt notice was there

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Problem still exists with mysql debian (6/squeeze) package 5.1.49-3 (x86).

Replication is not fail-safe regarding master *and* slave failures (crashes).

The reported error occurs when the master crashes. Other errors (duplicate
inserts or invalid bin logs) occur when the slave crashes.

Slave gets ahead also with community edition 5.5.11 (i686) on debian 6.0.1a when crashing the master. It does not matter whether one uses async or semi-sync replication.

Very annoying.

Is this by design or just badly tested code?

Regarding the criticality of this behavior: is it really non-critical? I assume that I have to re-setup the entire replication (copying/syncing the entire mysql db from master to slave while the master server is off-line), so for mission crticial stuff that is an absolute "no go".

BTW: the duplicate INSERTs when the slave crashes indicate that you end up with duplicated rows on the slave where there are no UNIQUE constraints preventing that duplication.

That alone constitutes severity S1 (IMHO) because it destroys your data consistency and may render your whole replicated database useless.

Mark,

For better resilience against master and slave crashes causing replication to fail, you should use innodb only, and configure the following options:

sync_relay_log_info,
sync_master_info,
sync_frm,
sync_binlog,
innodb_support_xa,
innodb_flush_log_at_trx_commit

What do you have those set to?

i forgot to mention sync_relay_log above ^^

Thanks for pointing out the ^^ Shane... I like that look up: ^^

Sorry for the post but ... I could not help it :) Hey its a good bug man... ;)

Hi Mark Stier

I am the original bug reporter. Was going to ask... Did you try the workaround from [6 Oct 2007 19:08] Gary Wilson, I cannot remember what the first problem was, but that was fixed or worked around as well, see above.

However, I can confirm if your slave is a Windows machine and it dies and you replace it with a Linux machine with a different version of MySQL, it will probably go away as well. However that will introduce you to the interesting world of case insensitive replication (one of the oldest Linux/Windows wars) and get you involved with the nasty end of another stick.

Be warned, ALL. Especially when you have to rely on developers to honour a rule set too.... I need say no more.... we know where that roads goes.

However, you are clearly not affected by that particular nasty stick. However seeing as you never stated otherwise, I imagine that both master and slave are running the same image and all that goes with it. Try change your slave to i386. Your should be fine (belief from experience, but not necessarily true under all conditions.. stop) telegram to mysql, stop. is it a bug? stop. @saulwilliams "what up nigga?"

Anyway, good to see the old fellow has still got legs in him yet :) Not gone away yet.

Looking over that was kinda of like remembering an old story you once heard in a pub a long time ago when you did not manage 162720 production CPU hours and life was simple as simple as 1236.. look how far we have come.

Hi Deano... skype this weekend sometime, Sab wants to say Hi to :)

Sorry Mark, just saying hello to an old friend, actually 2 old friends.

Anyway, honestly Mark I can attest that goes away with.... once sec.... searching back....

It existed with a Windows 64-bit master and a Windows 32 bit slave. It went away for good with a Windows 64-bit master and Linux 64-bit .. so maybe x86_64 master and i386 slave will make it go away and you will not have to update the bug LOADS :)

Good luck.. hell I am going to make sure that I plan this into future DB architecture plans .... "feature.arrrgh.avoid.bug30223.sh or .pp" :) seems like a long runner :)

Hello Shane,

I'll try these.

However, I don't see how they should be of any help:

sync_relay_log_info, -- I don't use relaying
sync_master_info, -- how should that improve the problem?
sync_frm, -- DDL was not part of the problem
sync_binlog, -- already activated
innodb_support_xa, -- activated by default
innodb_flush_log_at_trx_commit -- set to 1

Tested the case also with InnoDB. Not better.

The central problem seems to be that innodb_flush_log_at_trx_commit=1 is not sufficient. The master seems to send the binary log to the slave before it has flushed it to its own disk... thereby the slave ends up with more complete data than the master's binary log after a master failure.

Bug #60937 was marked as a duplicate of this one.

Another note about "crash-proof" slaves, 5.6 has a better chance of surviving.
http://mysqlmusings.blogspot.com/2011/04/crash-safe-replication.html

About the "sync" options, this assumes the OS, disk controllers, disks aren't lying about syncing... Many do. It should be tested to be sure they really are syncing when told to.

ok, I did some tests with 5.5.11 (i686) on Debian 6.0.1a.

Results:

1.) disable relay-specific sync stuff, or your replication slave will sync one or two rows a second max on regular hardware...

2.) enable binlog syncing and master info sync. Then replication of InnoDB tables is ok. Caveat: the slave does not recognize a master crash/reset. You need to do "stop slave; start slave;" manually. Is there an option to automatically recognize a lost connection to the master?

2.1.) When using myisam tables, the client usually ends up with an additional event (the last one that wasn't  committed by the master), ie. when doing mass inserts, the slave usually has more row than the master...

this report is too old.  crash-safe replication has been a reality for quite a while on 5.6 and 5.7.   I already mentioned the *sync* options.

"Crash-safe replication"
http://dev.mysql.com/doc/refman/5.7/en/slave-logs.html