MySQL Bugs: #30828: Problem Replicating Data after a LOAD DATA statement, error reading log file

Bug #30828	Problem Replicating Data after a LOAD DATA statement, error reading log file
Submitted:	5 Sep 2007 12:02	Modified:	4 Oct 2007 15:16
Reporter:	Dinko Ostric	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.0.41-community-nt	OS:	Windows (WINDOWS 2003 Server, SP 1)
Assigned to:		CPU Architecture:	Any
Tags:	error 1236, error reading log, LOAD DATA, replication, windows 2003

Description:
We have a replication (statement replication) set up width 1 master and 1 slave, all servers width only MYISAM tables (innoDB is turned of - width skip-innodb).
Replication runs fine till the "LOAD DATA INFILE" command is executed. After the "LOAD DATA INFILE" command replication breaks on the slave server width the following log entry:

Version: '5.0.41-community-nt' socket: '' port: 3306 MySQL Community Edition (GPL)
070905 10:55:32 [ERROR] Error reading packet from server: error reading log entry ( server_errno=1236)
070905 10:55:32 [ERROR] Got fatal error 1236: 'error reading log entry' from master when reading data from binary log
070905 10:55:32 [Note] Slave I/O thread exiting, read up to log 'ULFS01_LOGbv2.000001', position 10663779

The error happens only on a Windows 2003 OS as a master server; on a master - slave system both Windows XP SP2, the replication works fine.

If we open the master binary log (usually *.000001, it is filled width data statements, and the position mentioned by the error file is just before the DATA (the same date that is in the file) of the "LOAD DATA INFILE" statement.
It seems that Windows 2003 server, ( AND ONLY on Win 2003!!!, on Win XP SP2 LOAD DATA INFILE works fine ) builds a corrupted log when creating a binary log of a "LOAD DATA INFILE" statement.
The slaves read the log data till the mentioned position - 10663779. After that slave(s) issue an error: "Got fatal error 1236: 'error reading log entry' from master when reading data from binary log"...
The only thing to solve this is to stop both master & slave(s) and rebuild the replication.

Creating a new database (new database and a blank new table) and testing replication width a LOAD DATA command on the master server, produces the same result - breaking the replication on slave(s) server width the same error.
The error has nothing to do width the slave OS, as we tested different combinations and concluded that the problem is the creation of the binary log on the master.
Replication width LOAD DATA statements works fine on Windows XP SP2 systems - both master and slave OS is Windows XP SP2.
We have this problem for a year now, and upgrading from previous versions to new ones did not solve the problem.

Note: We figured out that it is not important the DB that is in the replication schema. We created a new DB on the master server (Win 2003) - this DB was not replicating on the slave, and after a LOAD DATA INFILE on that table (on the master) the replication stopped again on slave server width the same error. It is clear that after every LOAD DATA INFILE the slaves can not read the log anymore and produce the error 1236.

How to repeat:
Set up a master-slave replication on a "Windows 2003 Server SP 1" system as a master. Slave OS is not important.(we tested width XP SP2 and Windows 2003 Standard).
Create a new table width a custom structure on the master server.
Perform a LOAD DATA INFILE statement like:
LOAD DATA INFILE 'c:\\datafile.txt' INTO TABLE `table_name` (column_name).

Suggested fix:
The data in the binary log on the master is unreadable by slaves, fix the log creation on Windows 2003 OS.

Thank you for a problem report. Please, check if this is a duplicate of bug #30435. Namely, compare values of read_buffer_size on different machines and sizes of files loaded.

The problem described in this bug was in:
read_buffer_size, read_rnd_buffer_size, sort_buffer_size. It seems that if the values of buffer sizes are not set properly (larger - in MB or GB) the error happens. After setting the buffers to lover values (128 / 512 / 512 KB) replication started to work perfectly. It still replicates after several weeks.

Problem solved.

This bug is similar to Bug #8215, only that the error message is different and harder to solve. (only load data in file statement breaks the replication...)

So, this looks more like a misconfiguration issue, not a bug.

Hi,

I am not sure about your opinion because everything replicates fine till the LOAD DATA statement is performed. Also if the server (master server!) is misconfigured It should report in the error log something about that. The only thing that is reported as an error is on the SLAVE!! server: "Error reading packet from server: error reading log entry (server_errno=1236)...". 

It took us time to locate the problem, and also It took us time to find the solution. If the buffer parameters (on the MASTER) are huge and the master server can't handle that parameters it should warn about that, but instead only the slave server trows an error as stops replicating. The master server continues to write the log file, logging all queries (not throwing any errors), but the slaves can't read it from the moment of the LOAD DATA statement.

I am not sure if the server builds the log (not warning about any error) and the slave can't read that log, the only! problem is in misconfiguration.