Bug #30223 | Replication failing with error 1236 | ||
---|---|---|---|
Submitted: | 3 Aug 2007 8:29 | Modified: | 7 Nov 2015 7:28 |
Reporter: | Gary Wilson | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.0.41-community-nt | OS: | Windows (2003 Server) |
Assigned to: | Andrei Elkin | CPU Architecture: | Any |
Tags: | 1236, replication |
[3 Aug 2007 8:29]
Gary Wilson
[3 Aug 2007 8:31]
Gary Wilson
Both the slave and master are Win2003 server and both are running 5.0.41-community-nt
[3 Aug 2007 8:59]
Sveta Smirnova
Thank you for the report. If look attention at the timestamps from your error log is possible to notice slave never run normally, but started from impossible position and immediately stoped: 070725 *12:32:27* [Note] Slave I/O thread: connected to master 'replicator@192.168.1.204:3306', replication started in log 'mysql-bin.000041' at position 61787343 070725 *12:32:27* [ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236) Please describe step by step how you set up replication "from scratch".
[3 Aug 2007 10:23]
Gary Wilson
Replication starts fine and ran in this instance from 070731 16:37:48 until 070801 1:05:40 (as shown below) 070731 16:37:48 [Note] Slave I/O thread: connected to master 'replicator@192.168.1.204:3306', replication started in log 'mysql-bin.000041' at position 41298646 070801 1:05:40 [ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236) 070801 1:05:40 [ERROR] Got fatal error 1236: 'Client requested master to start replication from impossible position' from master when reading data from binary log 070801 1:05:40 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000041', position 66452281 ################## As for how replication is setup, here is a step by step overview of the process. On MASTER open my.ini and add log-bin=mysql-bin server-id=1 Open MySQL Administrator and on user root Add Host User can connect from and add IP address of SLAVE (192.168.1.207) Create user replicator on MASTER and Add Host User can connect from and add IP address of SLAVE (192.168.1.207) and localhost mysql -u root -p GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'localhost' IDENTIFIED BY 'password'; ON MASTER mysql> GRANT REPLICATION SLAVE ON *.* -> TO 'replicator'@'localhost' IDENTIFIED BY 'password'; Query OK, 0 rows affected (0.00 sec) mysql> FLUSH PRIVILEGES; Query OK, 0 rows affected (0.00 sec) mysql> GRANT REPLICATION SLAVE ON *.* -> TO 'replicator'@'192.168.1.207' IDENTIFIED BY 'password'; Query OK, 0 rows affected (0.00 sec) mysql> FLUSH PRIVILEGES; Query OK, 0 rows affected (0.00 sec) mysql> GRANT SUPER ON *.* -> TO 'replicator'@'localhost' IDENTIFIED BY 'password'; Query OK, 0 rows affected (0.00 sec) mysql> FLUSH PRIVILEGES; Query OK, 0 rows affected (0.00 sec) mysql> GRANT SUPER ON *.* -> TO 'replicator'@'192.168.1.207' IDENTIFIED BY 'password'; Query OK, 0 rows affected (0.00 sec) mysql> FLUSH PRIVILEGES; Query OK, 0 rows affected (0.00 sec) mysql> FLUSH TABLES WITH READ LOCK; Query OK, 0 rows affected (0.00 sec) mysql> SHOW MASTER STATUS; +------------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +------------------+----------+--------------+------------------+ | mysql-bin.000041 | 98 | | | +------------------+----------+--------------+------------------+ 1 row in set (0.00 sec) Stop MySQL MASTER On SLAVE stop MySQL and add to my.ini log-bin=slave1-mysql-bin server-id=2 skip-slave-start ENSURE THAT THE innodb_log_file_size=190M is set to the SAME size in both the master and slave my.ini!!! On the SLAVE move/delete all files in the mysql\data dir. Now copy ALL the data from the directory and subdirectories (\mysql\data dir) to a temp directory on the slave. MAKE SURE TO INCLUDE THE ibdata1 FILE!!!!! Do not include any ib_logfile* or mysql-bin.###### files. Start MySQL on the MASTER Start MySQL on SLAVE ON SLAVE: CHANGE MASTER TO MASTER_HOST='192.168.1.204', MASTER_USER='replicator', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin.000041', MASTER_LOG_POS=98; START SLAVE; ON MASTER: mysql> SHOW MASTER STATUS; +------------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +------------------+----------+--------------+------------------+ | mysql-bin.000041 | 6578 | | | +------------------+----------+--------------+------------------+ 1 row in set (0.00 sec) ON SLAVE: mysql> show slave status; <snip> | | 192.168.1.204 | replicator | 3306 | 60 | m ysql-bin.000041 | 6578 | cvp-db-slave1-relay-bin.000001 | <snip> 1 row in set (0.00 sec) On SLAVE edit my.ini and comment (#): #skip-slave-start Save file. Replication ...... DONE.
[15 Aug 2007 13:29]
Gary Wilson
Is there any update on this?
[20 Aug 2007 11:56]
Gary Wilson
This got more serious as we now require replication due to load.
[4 Sep 2007 12:44]
Gary Wilson
Is this a bug????
[7 Sep 2007 14:35]
Dean Taylor
I can confirm that this bug does exist; and I have experienced this issue under the a personal set of machines under the configuration as described by Gary Wilson. From the error state page; error 1236 is described as: Error: 1236 SQLSTATE: HY000 (ER_MASTER_FATAL_ERROR_READING_BINLOG) Message: Got fatal error %d: '%s' from master when reading data from binary log Can anybody give some indication as to the best places to look?
[6 Oct 2007 16:27]
MySQL Verification Team
Thank you for the feedback. Could you please point me where our Manual recommend the below step reported: "On the SLAVE move/delete all files in the mysql\data dir. Now copy ALL the data from the directory and subdirectories (\mysql\data dir) to a temp directory on the slave. MAKE SURE TO INCLUDE THE ibdata1 FILE!!!!! Do not include any ib_logfile* or mysql-bin.###### files." Thanks in advance.
[6 Oct 2007 17:08]
Gary Wilson
Hi Miguel Thanks for getting back to this. In answer to your question, here (http://dev.mysql.com/doc/refman/5.0/en/replication-howto-rawdata.html) I apologise if my decription of the procedure was not 100% clear. However, in addition to this we have solve the one issue but we are still experiencing the issue of the Duplicate entry when we first start replication if I do not add: slave-skip-errors=1062 To the config file as the replication process always fails on start with 'Duplicate Entry' error, however if I start it skipping error 1062 and the stop the slave, comment out the slave-skip-errors and start it again, then it works fine thereafter. So only on the initial start does that occur, thereafter it works fine. So this bug is resolved on our side and this new condition is either a procedural or configuration issue on my part or a different bug.
[6 Oct 2007 17:08]
Gary Wilson
Changed Severity level
[30 Jan 2008 11:46]
Valeriy Kravchuk
Please, try to repeat with a newer version, 5.0.51a, and inform about the results.
[1 Mar 2008 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[12 Mar 2008 19:51]
Susanne Ebrecht
Gerry, Dean, we still need to know if you have this issue when using newer MySQL version. The newest version at the moment is 5.0.51a.
[12 Mar 2008 20:01]
Gary Wilson
Hi Apologies for the lack of feedback. We are still on 5.0.41 at the moment and we are migrating onto Linux so this is not as critical and my workaround for the 'Duplicate Entry' error is still working. So for the moment this is the best we can do as we do not have time to test this on 5.0.51 and the old mantra of "if it ain't broken, don't try to fix it... when you don't have time" applies here at the moment. Sorry I cannot confirm that 5.0.51 will solve our Duplicate Entry problem at the moment.
[19 Mar 2008 16:28]
Susanne Ebrecht
Dean, what about you? Please, could you test if you still have this issues by using MySQL 5.0.51a.
[29 Mar 2008 14:30]
Dean Taylor
I too am unable to commit time to this issue; Thanks, Dean.
[10 Apr 2008 14:08]
Derek Meyer
NOTE: I am using 5.0.51a and seeing this issue Inside of my /etc/my.cnf file I have the following line slave-skip-errors=1062 Yet at times I get the following Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1062 Last_Error: Error 'Duplicate entry '178424' for key 1' on query. Default database: ............. "insert statement here" Skip_Counter: 0 Exec_Master_Log_Pos: 353126939 Relay_Log_Space: 8247464033 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL 1 row in set (0.03 sec) I have a couple of systems setup this way and am only seeing it on one. The only difference between the systems is I added the following in my.cnf read_buffer_size = 8192K read_rnd_buffer_size = 8192K I understand that these are larger than normal. I will back these sizes down and see if the issue still happens
[10 Apr 2008 16:07]
Derek Meyer
Note: Did not fix it.. I bumped the following down read_buffer_size = 2048K read_rnd_buffer_size = 2048K
[10 Apr 2008 22:29]
Derek Meyer
Please disregard my prior notes.. It was user error.. I had 2 lines in my my.cnf file that had skip slave errors.. 1062 was the first line the second line was a different error # that I didnt notice was there
[19 Apr 2008 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[20 Apr 2011 18:41]
Mark Stier
Problem still exists with mysql debian (6/squeeze) package 5.1.49-3 (x86). Replication is not fail-safe regarding master *and* slave failures (crashes). The reported error occurs when the master crashes. Other errors (duplicate inserts or invalid bin logs) occur when the slave crashes. Slave gets ahead also with community edition 5.5.11 (i686) on debian 6.0.1a when crashing the master. It does not matter whether one uses async or semi-sync replication. Very annoying. Is this by design or just badly tested code?
[20 Apr 2011 18:43]
Mark Stier
Regarding the criticality of this behavior: is it really non-critical? I assume that I have to re-setup the entire replication (copying/syncing the entire mysql db from master to slave while the master server is off-line), so for mission crticial stuff that is an absolute "no go".
[20 Apr 2011 18:57]
Mark Stier
BTW: the duplicate INSERTs when the slave crashes indicate that you end up with duplicated rows on the slave where there are no UNIQUE constraints preventing that duplication. That alone constitutes severity S1 (IMHO) because it destroys your data consistency and may render your whole replicated database useless.
[20 Apr 2011 19:12]
MySQL Verification Team
Mark, For better resilience against master and slave crashes causing replication to fail, you should use innodb only, and configure the following options: sync_relay_log_info, sync_master_info, sync_frm, sync_binlog, innodb_support_xa, innodb_flush_log_at_trx_commit What do you have those set to?
[20 Apr 2011 19:44]
MySQL Verification Team
i forgot to mention sync_relay_log above ^^
[20 Apr 2011 20:03]
Gary Wilson
Thanks for pointing out the ^^ Shane... I like that look up: ^^ Sorry for the post but ... I could not help it :) Hey its a good bug man... ;) Hi Mark Stier I am the original bug reporter. Was going to ask... Did you try the workaround from [6 Oct 2007 19:08] Gary Wilson, I cannot remember what the first problem was, but that was fixed or worked around as well, see above. However, I can confirm if your slave is a Windows machine and it dies and you replace it with a Linux machine with a different version of MySQL, it will probably go away as well. However that will introduce you to the interesting world of case insensitive replication (one of the oldest Linux/Windows wars) and get you involved with the nasty end of another stick. Be warned, ALL. Especially when you have to rely on developers to honour a rule set too.... I need say no more.... we know where that roads goes. However, you are clearly not affected by that particular nasty stick. However seeing as you never stated otherwise, I imagine that both master and slave are running the same image and all that goes with it. Try change your slave to i386. Your should be fine (belief from experience, but not necessarily true under all conditions.. stop) telegram to mysql, stop. is it a bug? stop. @saulwilliams "what up nigga?" Anyway, good to see the old fellow has still got legs in him yet :) Not gone away yet. Looking over that was kinda of like remembering an old story you once heard in a pub a long time ago when you did not manage 162720 production CPU hours and life was simple as simple as 1236.. look how far we have come. Hi Deano... skype this weekend sometime, Sab wants to say Hi to :) Sorry Mark, just saying hello to an old friend, actually 2 old friends. Anyway, honestly Mark I can attest that goes away with.... once sec.... searching back.... It existed with a Windows 64-bit master and a Windows 32 bit slave. It went away for good with a Windows 64-bit master and Linux 64-bit .. so maybe x86_64 master and i386 slave will make it go away and you will not have to update the bug LOADS :) Good luck.. hell I am going to make sure that I plan this into future DB architecture plans .... "feature.arrrgh.avoid.bug30223.sh or .pp" :) seems like a long runner :)
[20 Apr 2011 22:51]
Mark Stier
Hello Shane, I'll try these. However, I don't see how they should be of any help: sync_relay_log_info, -- I don't use relaying sync_master_info, -- how should that improve the problem? sync_frm, -- DDL was not part of the problem sync_binlog, -- already activated innodb_support_xa, -- activated by default innodb_flush_log_at_trx_commit -- set to 1 Tested the case also with InnoDB. Not better. The central problem seems to be that innodb_flush_log_at_trx_commit=1 is not sufficient. The master seems to send the binary log to the slave before it has flushed it to its own disk... thereby the slave ends up with more complete data than the master's binary log after a master failure.
[21 Apr 2011 3:18]
Valeriy Kravchuk
Bug #60937 was marked as a duplicate of this one.
[21 Apr 2011 4:23]
MySQL Verification Team
Another note about "crash-proof" slaves, 5.6 has a better chance of surviving. http://mysqlmusings.blogspot.com/2011/04/crash-safe-replication.html About the "sync" options, this assumes the OS, disk controllers, disks aren't lying about syncing... Many do. It should be tested to be sure they really are syncing when told to.
[22 Apr 2011 11:01]
Mark Stier
ok, I did some tests with 5.5.11 (i686) on Debian 6.0.1a. Results: 1.) disable relay-specific sync stuff, or your replication slave will sync one or two rows a second max on regular hardware... 2.) enable binlog syncing and master info sync. Then replication of InnoDB tables is ok. Caveat: the slave does not recognize a master crash/reset. You need to do "stop slave; start slave;" manually. Is there an option to automatically recognize a lost connection to the master? 2.1.) When using myisam tables, the client usually ends up with an additional event (the last one that wasn't committed by the master), ie. when doing mass inserts, the slave usually has more row than the master...
[7 Nov 2015 7:28]
MySQL Verification Team
this report is too old. crash-safe replication has been a reality for quite a while on 5.6 and 5.7. I already mentioned the *sync* options. "Crash-safe replication" http://dev.mysql.com/doc/refman/5.7/en/slave-logs.html