Bug #24262 Replication error
Submitted: 13 Nov 2006 14:05 Modified: 25 Dec 2006 8:03
Reporter: Neil Gailliez Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.0.26 OS:Linux (Red Hat ES 4 & Centos 4.2)
Assigned to: CPU Architecture:Any

[13 Nov 2006 14:05] Neil Gailliez
Description:
Hi,

I am experiencing a recurring problem in replication, the problem manifests itself as a syntax error in a statement in the slave relay log, the following shows the error log entry from the last time this occurred:

Slave: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right synt
ax to use near 'º^B' at line 1' on query. Default database: 'baylis'. Query: 'INSERT INTO CDS_GRAB_LCSTOREDPALLETS_NOW (batchlet_number,pallet_type,part_palle
t,previous_location,product_code,prod_pall_num,receipt_date,receipt_number,receipt_temp,receipt_time,supplier,time_place,who) VALUES ('','2007-11-02','','','
96','1','PY','2006-10-23','WO','CASES','','4E108C','20061107080001','RM','129','PALLETS','118407','GKN','N','','PY 7006292820','150600510711\0','2006-10-23',
'2494','','17:01:00','','17:01:26','CRON')º^B', Error_code: 1064

The value of Exec_Master_Log_Pos, at the point at which the SQL_Thread terminated due to this error was 157919392.

This is very strange as this statement does actually appear to have been executed correctly, if I try to re-execute it (without the trailing º^B - which incidently does not appear in this statement in the binary log of the server that generated it) then a duplicate entry error will be returned. 

As well as this the entry in the master's binary log referred to by the Exec_Master_Log_Pos (157919392) is actually the statement directly after the one that is reported to be causing an error on the slave, the statement reporting the error on the slave was actually at 157918700 in the master log. When the statement at position 157919392 was manually extracted from the master's binary log and manually executed on the salve there were no problems with it.

To verify that there are no problems with the slave server we have run a 24 hour system/disk/io check on it with all tests revealing that there no problems in that area. The slave that experiences this problem is on a different network to the master and communicates with it over an internet connection, the master does have other slave servers that are on the same network as it and which have never experienced this problem. The problem itself only seems to occur during periods of very heavy updates against the CDS_GRAB_LCSTOREDPALLETS_NOW table, this table has the following structure:

+----------------------+-----------+------+-----+-------------------+-------+
| Field                | Type      | Null | Key | Default           | Extra |
+----------------------+-----------+------+-----+-------------------+-------+
| company_code         | int(2)    | NO   |     | 0                 |       |
| location             | char(6)   | NO   |     |                   |       |
| pallet_number        | char(15)  | NO   | PRI |                   |       |
| prod_pall_num        | char(12)  | NO   |     |                   |       |
| product_code         | char(50)  | NO   | MUL |                   |       |
| bbe_date             | date      | NO   |     | 0000-00-00        |       |
| cases_stored         | int(7)    | NO   |     | 0                 |       |
| cases_in             | int(7)    | NO   |     | 0                 |       |
| cases_out            | int(7)    | NO   |     | 0                 |       |
| part_pallet          | char(1)   | NO   |     |                   |       |
| receipt_date         | date      | NO   |     | 0000-00-00        |       |
| receipt_time         | time      | NO   |     | 00:00:00          |       |
| hold_code            | char(4)   | NO   | MUL |                   |       |
| supplier             | char(10)  | NO   |     |                   |       |
| receipt_number       | char(6)   | NO   |     |                   |       |
| batch_number         | char(15)  | NO   |     |                   |       |
| time_place           | time      | NO   |     | 00:00:00          |       |
| date_place           | date      | NO   |     | 0000-00-00        |       |
| operator             | char(3)   | NO   |     |                   |       |
| pallet_type          | char(5)   | NO   |     |                   |       |
| operative            | char(5)   | NO   |     |                   |       |
| handling_unit        | char(8)   | NO   |     |                   |       |
| pallet_configuration | char(8)   | NO   |     |                   |       |
| receipt_temp         | int(3)    | NO   |     | 0                 |       |
| previous_location    | char(6)   | NO   |     |                   |       |
| depot_code           | char(2)   | NO   | PRI |                   |       |
| customer_code        | char(2)   | NO   | PRI |                   |       |
| who                  | char(50)  | NO   |     |                   |       |
| mod_time             | timestamp | NO   | PRI | CURRENT_TIMESTAMP |       |
+----------------------+-----------+------+-----+-------------------+-------+

How to repeat:
I am currently working on reproducing this problem by creating a copy of the database on two servers that communicate over an Asynchronous DSL line, setting up replication using the same version of mysql with the same configuration and then sending as many updates as possible to the CDS_GRAB_LCSTOREDPALLETS_NOW table over a extended period of time while the communication line between the servers is under heavy load.
[25 Nov 2006 8:03] Valeriy Kravchuk
Thank you for a problem report. Do you have any success in your attempts to repeat the behaviour described? If that º^B was only in slave's log, it can be some kind of hardware (network) failure... Is your slave using the same MySQL version?
[26 Dec 2006 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".