Bug #22782 Replication fails
Submitted: 28 Sep 2006 15:19 Modified: 28 Dec 2006 11:40
Reporter: Guglielmo Ferri Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.0.24a OS:Linux (Linux 2.6.16)
Assigned to: CPU Architecture:Any

[28 Sep 2006 15:19] Guglielmo Ferri
Description:
Since few mounth I migrate my databases on Mysql 5.

Since then I cant replicate them on a slave machine.

Infact, I set up a master-slave environment and after few time (less then one our) I get similar error on slave.err:

060925 11:20:03 [ERROR] Slave: Error 'Duplicate entry '869' for key 1'
on query.
 Default database: 'ebusiness4'. Query: '...'
060925 11:20:03 [ERROR] Error running query, slave SQL thread aborted.
Fix the p
roblem, and restart the slave SQL thread with "SLAVE START". We stopped at log '
mysql-bin.000003' position 147467664

Indeed, I follow the following step:

1) dump the single db on master with mysqldump and --master-data setted
2) copy the dump on slave
3) install the mysql db (with mysql_db_install utility)
4) configure mysql as slave. start database server and stop slave.
5) restore the database dumped at 1)
6) start slave

After a few time happens error like that above.

How to repeat:
1) dump the single db on master with mysqldump and --master-data setted
2) copy the dump on slave
3) install the mysql db (with mysql_db_install utility)
4) configure mysql as slave. start database server and stop slave.
5) restore the database dumped at 1)
6) start slave

Suggested fix:
I think that the proble is that the auto increment either on master and slave are not sycronized.
[28 Sep 2006 17:56] Valeriy Kravchuk
Thank you for a problem report. What version is running as master? 5.0.24a? Please, send the exact query that gives you these duplicate entry messages.
[29 Sep 2006 8:17] Guglielmo Ferri
Yes. Either master and slave run the same version of MySQL.

I have restored many time replication between my two MySQL server, but after few hour (and sometimes after less than an hour) replication fails. It fails always for problems with auto_increment field. In practice, if a row was inserted on the master with a value for the auto_increment primary key field (that is FK in an other table), that row was replicated with another id on the slave. So as soon as on the master is executed a query that references that row, the same query replicated on slave will fail.

Here an example of query failing:

060928  9:21:45 [ERROR] Slave I/O thread: error reconnecting to master 'replicator@emma.sttspa.intranet:3306': Error: 'Lost connection to MySQL server during
query'  errno: 2013  retry-time: 60  retries: 86400
060928  9:22:45 [Note] Slave: connected to master 'replicator@emma.sttspa.intranet:3306',replication resumed in log 'mysql-bin.000007' at position 14158878
060928  9:48:57 [ERROR] Slave: Error 'Cannot add or update a child row: a foreign key constraint fails (`ebusiness4/hd_pop_link`, CONSTRAINT hd_pop_link_ibfk_
2 FOREIGN KEY (porta_sorgente) REFERENCES hd_porte_pop (id) ON DELETE SET NULL)' on query. Default database: 'ebusiness4'. Query: 'INSERT INTO hd_pop_link
                        ( sito_pop, sito_id, tipo, carrier_id, tipo_linea, tipo_circuito, velocita_download, mcr_download, tipo_backup, link_di_backbone , por
ta_sorgente  , apparato_destinazione   , td             , ip_wan   , ip_loopback     , router_filiale_marca   , router_filiale_modello   , vrf_id     , veloci
ta_upload   , mcr_upload   , pcr             , numero_telefonico                 , stato   , data_attivazione   , login  )
                VALUES
                        ( 29138, 29134, 1, 13, 'PP', 'PTR', 10240000, 256000, 2, '0' , 7560  , '01'    , 'PR000021'             , 8794   , 8795     , 'Cisco'
  , '2610'   , 40     , 10240000   , 256000   , 10240000             , '0363 907608'                 , '02'   , '2006-06-14'   , 'crighett'  )', Error_code: 1
452
060928  9:48:57 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log
 'mysql-bin.000008' position 3422478
060928 14:50:01 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000008' at position 3422478, relay log './marta-relay-bin.000013'
position: 3422615
060928 14:50:01 [ERROR] Slave: Error 'Cannot add or update a child row: a foreign key constraint fails (`ebusiness4/hd_pop_link`, CONSTRAINT hd_pop_link_ibfk_
2 FOREIGN KEY (porta_sorgente) REFERENCES hd_porte_pop (id) ON DELETE SET NULL)' on query. Default database: 'ebusiness4'. Query: 'INSERT INTO hd_pop_link
                        ( sito_pop, sito_id, tipo, carrier_id, tipo_linea, tipo_circuito, velocita_download, mcr_download, tipo_backup, link_di_backbone , por
ta_sorgente  , apparato_destinazione   , td             , ip_wan   , ip_loopback     , router_filiale_marca   , router_filiale_modello   , vrf_id     , veloci
ta_upload   , mcr_upload   , pcr             , numero_telefonico                 , stato   , data_attivazione   , login  )
                VALUES
                        ( 29138, 29131, 1, 13, 'PP', 'PTR', 10240000, 256000, 2, '0' , 7561  , '01'    , 'PR000022'             , 8796   , 8797     , 'Cisco'
  , '2610'   , 40     , 10240000   , 256000   , 10240000             , '0363 914954'                 , '02'   , '2006-06-14'   , 'crighett'  )', Error_code: 1
452
060928 14:50:01 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log
 'mysql-bin.000008' position 3471673

BR,

rocsca
[4 Oct 2006 8:48] Guglielmo Ferri
Hello Valeriy Kravchuk,

Have you any news for the problem that I have posted some day ago?

TIA,

rocsca
[11 Oct 2006 10:04] Brian Robinson
hi 

I am also experiencing this problem

We have a single master single slave setup.
Running on Suse Linux 10 with mysql 5.0.24.

We recently had a couple of crashes of the master. I then found in the slave error log and also on the show slave status command the error at the bottom 

"Key 1" I believe is referring to the primary key. This is an auto increment column. (and is actually the only unique key on the table so it must be that!!)

My point is that this can "never" happen in our situation.

We NEVER in the actual application code issue an Insert into this table and supply the auto increment value (and therefore override the value from the auto increment functionality)

1) If replication works by sending over the actual statement - then the autoincrement value comes from the slave itself - so how can this ever create a duplicate key????

2) If replication is sending over the data then the autoincrement values comes from the server and so again can never create a duplicate key. I am assuming (as stated in the documentation) that only completed TX's are put in the binary log.

The only suggestion I have is that this is perhaps coming from the database recovery after a crash of the master as it rolled forward and back on restart - is that possible?.

The problem for me is that we are now unsecure about the state of our slave. And we cannot really check all rows - we have some tables (including the one in the query below) with more than 100 million rows.

===========================================================================

1062 | Error 'Duplicate entry '620839953' for key 1' on query.
bo'. Query: 'INSERT INTO FileInfo(PathId,ChildrenPathId,RemoteClientId,ParentFile,RootFile,Fil
,IsFolder, FileStoreId,TnStatus) VALUES(52088394,52088505,1160473061703,620838031,620838031,'M
ULL,'F')' |            0 |                 735 |      2624789935 | None            |
[23 Oct 2006 12:50] Valeriy Kravchuk
Please, send the results of:

SHOW CREATE TABLE hd_pop_link\G

Please, try to repeat with a newer version, 5.0.26, on both master and slave, and inform about the results.
[3 Nov 2006 8:34] Guglielmo Ferri
Sorry for the delay.. Here what i've requested me..

mysql> SHOW CREATE TABLE hd_pop_link\G
*************************** 1. row ***************************
       Table: hd_pop_link
Create Table: CREATE TABLE `hd_pop_link` (
  `id` int(11) unsigned NOT NULL auto_increment,
  `sito_pop` int(11) NOT NULL default '0',
  `porta_sorgente` int(11) unsigned default NULL,
  `sito_id` int(11) NOT NULL default '0',
  `apparato_destinazione` varchar(16) default NULL,
  `porta_destinazione` varchar(32) default NULL,
  `descrizione` varchar(255) default NULL,
  `td` varchar(32) default NULL,
  `nua` varchar(32) default NULL,
  `tgu_prefisso` varchar(8) default NULL,
  `tgu_numero` varchar(16) default NULL,
  `diramata` char(2) default NULL,
  `ip_dialer` int(11) unsigned default NULL,
  `ip_lan` int(11) unsigned default NULL,
  `ip_wan` int(11) unsigned default NULL,
  `ip_loopback` int(11) unsigned default NULL,
  `tipo` int(3) NOT NULL default '0',
  `link_di_backbone` enum('0','1') NOT NULL default '0',
  `router_filiale_marca` varchar(32) default NULL,
  `router_filiale_modello` varchar(32) default NULL,
  `vrf_id` int(11) unsigned default NULL,
  `carrier_id` int(11) NOT NULL default '0',
  `dlci` varchar(16) default NULL,
  `tipo_linea` varchar(8) NOT NULL default '',
  `tipo_circuito` varchar(8) NOT NULL default '',
  `velocita_download` int(11) NOT NULL default '0',
  `velocita_upload` int(11) default NULL,
  `mcr_download` int(11) NOT NULL default '0',
  `mcr_upload` int(11) default NULL,
  `pcr` int(11) default NULL,
  `vp_raccolta` int(11) default NULL,
  `vc_raccolta` int(11) default NULL,
  `vp_periferia` int(11) default NULL,
  `vc_periferia` int(11) default NULL,
  `tipo_backup` int(3) NOT NULL default '0',
  `note_backup` text,
  `numero_telefonico` varchar(32) default NULL,
  `ordine_siebel` varchar(32) default NULL,
  `codice_lotto` varchar(32) default NULL,
  `canali` int(11) default NULL,
  `sdlc_address` varchar(16) default NULL,
  `sdlc_idblock` varchar(8) default NULL,
  `sdlc_idnum` varchar(8) default NULL,
  `note` mediumtext,
  `stato` char(2) default NULL,
  `data_attivazione` date default NULL,
  `login` varchar(16) default NULL,
  `datalog` timestamp NULL default CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `pop_link_FKIndex2` (`sito_id`),
  KEY `pop_link_FKIndex3` (`carrier_id`),
  KEY `pop_link_FKIndex5` (`porta_sorgente`),
  KEY `hd_pop_link_FKIndex4` (`vrf_id`),
  KEY `td` (`td`),
  KEY `nua` (`nua`),
  KEY `tgu_prefisso` (`tgu_prefisso`),
  KEY `tgu_numero` (`tgu_numero`),
  KEY `sito_pop` (`sito_pop`),
  KEY `hd_pop_link_ibfk_6` (`ip_lan`),
  KEY `hd_pop_link_ibfk_7` (`ip_wan`),
  KEY `hd_pop_link_ibfk_8` (`ip_loopback`),
  KEY `hd_pop_link_ibfk_9` (`ip_dialer`),
  KEY `numero_telefonico` (`numero_telefonico`),
  CONSTRAINT `hd_pop_link_ibfk_10` FOREIGN KEY (`sito_pop`) REFERENCES `siti` (`                                                                              sito_id`),
  CONSTRAINT `hd_pop_link_ibfk_2` FOREIGN KEY (`porta_sorgente`) REFERENCES `hd_                                                                              porte_pop` (`id`) ON DELETE SET NULL,
  CONSTRAINT `hd_pop_link_ibfk_3` FOREIGN KEY (`sito_id`) REFERENCES `siti` (`si                                                                              to_id`),
  CONSTRAINT `hd_pop_link_ibfk_4` FOREIGN KEY (`carrier_id`) REFERENCES `fornito                                                                              ri` (`fornitore_id`),
  CONSTRAINT `hd_pop_link_ibfk_5` FOREIGN KEY (`vrf_id`) REFERENCES `hd_vrf` (`i                                                                              d`),
  CONSTRAINT `hd_pop_link_ibfk_6` FOREIGN KEY (`ip_lan`) REFERENCES `hd_ip` (`id                                                                              `) ON DELETE SET NULL,
  CONSTRAINT `hd_pop_link_ibfk_7` FOREIGN KEY (`ip_wan`) REFERENCES `hd_ip` (`id                                                                              `) ON DELETE SET NULL,
  CONSTRAINT `hd_pop_link_ibfk_8` FOREIGN KEY (`ip_loopback`) REFERENCES `hd_ip`                                                                               (`id`) ON DELETE SET NULL,
  CONSTRAINT `hd_pop_link_ibfk_9` FOREIGN KEY (`ip_dialer`) REFERENCES `hd_ip` (                                                                              `id`) ON DELETE SET NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
[3 Nov 2006 13:56] Valeriy Kravchuk
Please, describe how you setuped your replication. Read the manual (http://dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html):

"- When you restart the MySQL server, InnoDB may reuse an old value that was generated for an AUTO_INCREMENT column but never stored (that is, a value that was generated during an old transaction that was rolled back).

- When an AUTO_INCREMENT column runs out of values, InnoDB wraps a BIGINT to -9223372036854775808 and BIGINT UNSIGNED to 1. However, BIGINT values have 64 bits, so do note that if you were to insert one million rows per second, it would still take nearly three hundred thousand years before BIGINT reached its upper bound. With all other integer type columns, a duplicate-key error results. This is similar to how MyISAM works, because it is mostly general MySQL behavior and not about any storage engine in particular."

Is it possible that something from the above apllies to your case?
[3 Nov 2006 17:57] Guglielmo Ferri
I have tried to replicate dbs after update MySQL 24a->27, but replication fail again..

061103 18:52:54 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './marta-relay-bin.000001' position: 4
061103 18:52:54 [Note] Slave I/O thread: connected to master 'replicator@emma.sttspa.intranet:3306',  replication started in log 'FIRST' at position 4
061103 18:52:57 [ERROR] Slave: Error 'Duplicate entry '4192697' for key 1' on query. Default database: 'ama_records'. Query: 'INSERT INTO call_answered_2006_09
                        ( call_rec_seq_num, cli, customer_connect_date_time, duration, duration_customer, CPN_out   , carrier_connect_date_time_out           , trunk_facility_id_out   , call_type               , profile_in     , id_customer_in       , id_carrier_out   , national_area_in     , wnp_in   , wnp_out   , wnp_customer   , descr_rate_customer   , rate_customer         , wnp_carrier_out   , descr_rate_carrier_out   , rate_carrier_out   , credit_customer           , debit_carrier_out       , is_on_peak_carrier_out      )
                VALUES
                        ( 4192697, '0458262826', '2006-09-27 17:03:07', 105, 107, '336857166'   , '2006-09-27 17:02:57'           , '0201-0045'   , 006               , 3     , 228       , 'TLC'   , '045'     , '045'   , '336'   , '045'   , 'Radiomobile TIM GSM'   , 0.145000         , '336'   , 'Radiomobile TIM GSM'   , 0.123196   , 0.258583333333           , 0.215593       , 1
061103 18:52:57 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000007' position 1181

after only 3 sec..

rocsca
[28 Nov 2006 11:40] Valeriy Kravchuk
Please, send the results of SHOW CREATE TABLE and SHOW TABLE STATUS for that call_answered_2006_09 table.
[29 Dec 2006 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".