Bug #4787 Duplicate entry occuring on slave when it should not.
Submitted: 28 Jul 2004 14:38 Modified: 5 Aug 2004 15:08
Reporter: Henri-Maxime Ducoulombier Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:4.0.13 OS:Windows (Windows 2K)
Assigned to: CPU Architecture:Any

[28 Jul 2004 14:38] Henri-Maxime Ducoulombier
Description:
I encounter a duplicate entry on key 1 when replication is running, and this is happening because slaves 'jump' one or two auto_increment values at a time, when they should not.

How to repeat:
I have a table, named ACCOUNTS, with a structure like this :

CREATE TABLE `ACCOUNTS` (
  `REFA` int(11) NOT NULL auto_increment,
  `DATEEX datetime default NULL,
  `REFB int(10) default NULL,
  `REFC` int(10) default NULL,
  `REFP` int(10) default NULL,
  `PROD` varchar(100) NOT NULL default '',
  `DATEO` datetime default NULL,
  `CODEO` varchar(4) default NULL,
  `REFO` varchar(12) default NULL,
  `INSERTDATE` timestamp(14) NOT NULL,
  PRIMARY KEY  (`REFA`),
  KEY `REFC` (`REFC`),
  KEY `CODEO` (`CODEO`),
  KEY `REFO` (`REFO`),
  KEY `DATEO` (`DATEO`)
) TYPE=MyISAM; 

The master table has more than 235000 records and it is working perfectly fine. I recently altered the table to destroy the primary key and rebuilt it using an ALTER TABLE with an ORDER BY option to order the records using DATEO. So my auto_increment key has no problem and there is no 'hole' or 'jump' between 2 keys.

There is only one way to insert datas in this table, and it's done every 15 minutes by a schedule VBS. This VBS is made on purpose to synchronise different tables, and sort of merge them in ACCOUNTS. This script works like this, and uses manually created temporary tables:

--------

1> I drop, if it exists, my temporary table, tTEMP
2> I create a temporary table, named tTEMP, with one field, indexed, name REFO. The table type is HEAP.
3> I insert into this table all the REFO from table ACCOUNT with CODEO in ('ZORD', 'XORD')
4> I drop, if exists, another temporary table, tTEMP2
5> I create tTEMP2 with a field name REFF, indexed. Table type is HEAP.
6> I fill tTEMP2 with and (INSERT INTO ... SELECT) that takes datas from another table, 'S_F' :

INSERT INTO tTEMP2 (REFF) SELECT DISTINCTROW S_F.REFFACTURE FROM S_F LEFT JOIN tTEMP AS A_C ON S_F.REFF = A_C.REFOPE WHERE S_F.DATEV >= DATE_SUB(CurDate(), INTERVAL 1 MONTH) AND S_F.REFB <> 5 AND S_F.DATEA Is NOT NULL AND (A_C.REFO IS NULL)

It's done like this not to insert into tTEMP2 values that are already in ACCOUNTS.

7> At this point, I have in tTEMP2 the REFF of things I want inserted in my ACCOUNTS table, and I do my final insert :

INSERT INTO ACCOUNTS (....) SELECT (.....) FROM ('a lot of inner joins of several tables datas are coming from').

------

As said above, it's working great on the master server.

Now I have set up 2 replication slaves, on 2 other servers.
- I stopped the master.
- I copied the datas from the master to the slaves
- I restarted the master
- I started the slaves with all the replication options and parameters I want, and it's working fine, until something awful happens.

Something like 2 hours after replication started and was working great, there is, at one time, a problem.
As I said, my VBS script is running every 15 minutes and thus worked about 7 or 8 times before the crash, inserting datas without any problem. But the last one created a 'jump' in the primary key (REFA) on both slaves, at the very same time, with the very same key.

See, on the master, it's like that in the datas :
REFA      REFB        REFC         .....
237088   ABC         123
237087   DEF         456
237086   GHI          789
237085   KLM         012
237084   NOP         345

Where on slaves it's like this :
REFA      REFB        REFC         .....
237089   ABC         123
237088   DEF         456
237086   GHI          789
237084   KLM         012
237083   NOP         345

And then, in the error log on both slaves, I have an error saying :
040715 4:17:46 Slave I/O thread: Failed reading log event, reconnecting to retry, log 'dataserver-bin.044' position 196161
ERROR: 1062 Duplicate entry '237088' for key 1

Strange because it shouldn't happen, and because it's not even the last record in the slave tables...

Please note that no inserts are done in the table but with the VBS, and that SELECT only are made. It's happening only with this table, and at random time after the slave is restarted. Slaves are only used for replication/backup machines, nothing else.

I'm using mySQL 4.0.13 on windows 2K server. Slaves and masters are same version.
[28 Jul 2004 16:52] Guilhem Bichot
Bonjour Henri-Maxime,
4.0.13 is too old. I have fixed some "INSERT...SELECT + autoincrement + replication" bugs since. So please upgrade your master and slave to 4.0.20, clear all data from the slave,  restart replication from fresh master's data (follow the procedure for initial replication setup explained in our manual), and it should work better.
If it does not, then we will need a short testcase which reproduces the problem on your machines, then you can re-open this bug report.
If upgrade fixes the problem, you can also mark it here by clicking on "adding a comment".
Good luck.
[2 Aug 2004 11:48] Henri-Maxime Ducoulombier
Thank you for your reply !

We are going to upgrade our server and slaves to 4.0.20 and hopefully the replication will run fine after that. In case it would not, I'll reopen the bug with an updated version number.
[5 Aug 2004 10:07] Henri-Maxime Ducoulombier
Upgrading to 4.0.20 really seems to have fixed the problem.

Thank you
[5 Aug 2004 15:08] Guilhem Bichot
Bonjour,
I am glad that it worked. In general, it's a good idea to always stay close to the latest MySQL stable version on master and slave.
Bonne chance.
Guilhem (from Bordeaux, France).