MySQL Bugs: #7836: REPLACE INTO dies with Duplicate Key Error

Bug #7836	REPLACE INTO dies with Duplicate Key Error
Submitted:	12 Jan 2005 14:24	Modified:	13 Feb 2005 20:15
Reporter:	Rich Smith	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	4.0.22	OS:	Linux (Redhat 9.0)
Assigned to:		CPU Architecture:	Any

Description:
We have our boxes replicating for backup purposes.  Replication has now failed on two boxes with similar errors.  ON the one box, the table it was failing on was not that important, so I was able to mess with the primary key to keep it running.  On this box, its not working.

My understanding of REPLACE INTO is that its NOT SUPPOSED to be possible for a duplicate key error.

How to repeat:
Snippet from logfile on slave:

ERROR: 1062  Duplicate entry '1025-916248' for key 1
050112  9:06:42 Slave: Error 'Duplicate entry '1025-916248' for key 1' on query. Default database: 'forums-devshed'. Query: 'REPLACE INTO postindex
                                (wordid, postid, score, intitle)
                                VALUES
                                 (45839, 916248, 2, 2), (336, 916248, 3, 2), (493, 916248, 2, 2), (773, 916248, 3, 2), (1402, 91624
8, 2, 2), (2413, 916248, 3, 2), (2016, 916248, 1, 0), (9881, 916248, 1, 0), (41233, 916248, 1, 0), (781991, 916248, 1, 0), (1868, 9
16248, 1, 0), (781992, 916248, 1, 0), (15586, 916248, 1, 0), (781993, 916248, 1, 0), (781994, 916248, 1, 0), (2169, 916248, 1, 0), 
(20073, 916248, 1, 0), (415, 916248, 1, 0), (58963, 916248, 1, 0), (157, 916248, 2, 0), (18150, 916248, 1, 0), (95, 916248, 1, 0), 
(781995, 916248, 1, 0), (781996, 916248, 1, 0), (40, 916248, 2, 0), (334, 916248, 1, 0), (33, 916248, 1, 0), (409, 916248, 1, 0), (
782, 916248, 3, 0), (159, 916248, 1, 0), (411, 916248, 2, 0), (847, 916248, 1, 0), (1683, 916248, 1, 0), (2326, 916248, 1, 0), (226
9, 916248, 1, 0), (175809, 916248, 1, 0), (1268, 916248, 2, 0), (1270, 916248, 1
050112  9:06:42 Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START"
. We stopped at log 'devshed-d5-bin.009' position 18988394

The primary key is wordid, postid.  There are no full text indexes (saw bug #6784).  No other indexes exist.

Not enough information was provided for us to be able
to handle this bug. Please re-read the instructions at
http://bugs.mysql.com/how-to-report.php

If you can provide more information, feel free to add it
to this bug and change the status back to 'Open'.

Thank you for your interest in MySQL.

Additional info:

Hi! 
Actualy replication commands can produce 'dublicate key error', for example if master and slave have  different default character set. But obviously not in this case (numeric data).
Is it possible to create repeatable test case for this error?
If you have any ideas how to reproduce it please let us know.

I wish I could tell you a way to reproduce.  I can tell you that we are currently running 6 db servers, 3 masters and 3 slaves.  This is the second time this has happened, on a different box and different table.

I can tell you that its just not working now on our servers.  Ive tried to do a total re-initialization of the database (re-create slave from scratch) and it dies a few hours later.  Ive optimized and repaired all tables in the DB as well.

I would be more than happy to give you access to the box in question (it is a backup server), or implement any kind of debugging or tracing (or even test code).

I cant, however, tell you how to reproduce on your servers.  It ran for 3 months with zero errors on this server before dying.

Ok, after some serious digging and tinkering, I figured out what the (simple) solution was.  Seems the indexes and/or table was JUST messed up enough to be erroring out on an replace into call, when the particular key did not exist.  It was erroring out on an entry that did NOT exist in the replicated DB.  I ran a repair on that table, and replication picked up again.

This did not even occur to me, that the replicated tables can become corrupted when the masters are still running just fine.  

In anycase, issue resolved.  So sorry to have wasted your braincells on this...