Bug #30911 load data local infile creates inconsistent replication state
Submitted: 7 Sep 2007 20:42 Modified: 17 May 2008 18:50
Reporter: Alex Bame Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.0.45 OS:Linux (Fedora Core 5)
Assigned to: CPU Architecture:Any
Tags: LOAD DATA, replication

[7 Sep 2007 20:42] Alex Bame
Description:
Replication was initialized from an SQL dump created by mysqldump with --master-data=1 option.  I then executed "start slave" and after some time processing got the following:

mysql> show slave status \G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: vezina.dc1.tumri.net
                Master_User: repl
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: binlog_vezina-p1_g1.000003
        Read_Master_Log_Pos: 648808522
             Relay_Log_File: logs-reporting-relay-bin.000003
              Relay_Log_Pos: 614472541
      Relay_Master_Log_File: binlog_vezina-p1_g1.000002
           Slave_IO_Running: Yes
          Slave_SQL_Running: No
            Replicate_Do_DB:
        Replicate_Ignore_DB: mysql,information_schema
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table: %.%
Replicate_Wild_Ignore_Table:
                 Last_Errno: 1062
                 Last_Error: Error 'Duplicate entry '86865' for key 1' on query. Default database: 'tumri_publisher_new'. Query: 'Insert into keys
_publisher_merchant (publisher_id, merchant_id, retailer_id)  Select distinct publisher_id,merchant_id,retailer_id from tmp_update_publisher_products CM where not exists ( Select publisher_id,merchant_id,retailer_id from keys_publisher_merchant KM  where CM.publisher_id=KM.publisher_id and CM.merchant_id=KM.merchant_id and CM.retailer_id=KM.retailer_id)   order by publisher_id,merchant_id,retailer_id'
               Skip_Counter: 0
        Exec_Master_Log_Pos: 614472394
            Relay_Log_Space: 1722662668
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert: 
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: NULL
1 row in set (0.00 sec)

Here is a section of the binlog from the master which includes the statement that created the error and a few statements directly before which populated the relevant data:

# at 605432239
#070905  0:55:15 server id 1  end_log_pos 119   Query   thread_id=1085  exec_time=0     error_code=0
SET TIMESTAMP=1188978915;
Truncate table tmp_update_publisher_daily;
# at 605432358
#070905  0:55:15 server id 1  end_log_pos 605432385     Xid = 200264
COMMIT;
# at 605432385
#070905  0:55:15 server id 1  end_log_pos 122   Query   thread_id=1087  exec_time=0     error_code=0
SET TIMESTAMP=1188978915;
Truncate table tmp_update_publisher_products;
# at 605432507
#070905  0:55:15 server id 1  end_log_pos 605432534     Xid = 200271
COMMIT;
# at 605432534
#070905  0:55:34 server id 1  end_log_pos 1048587
#Begin_load_query: file_id: 100  block_len: 1048564
# at 606481121
#070905  0:55:34 server id 1  end_log_pos 2097174
#Append_block: file_id: 100  block_len: 1048564
# at 607529708
#070905  0:55:34 server id 1  end_log_pos 3145761
#Append_block: file_id: 100  block_len: 1048564
# at 608578295
#070905  0:55:34 server id 1  end_log_pos 4194348
#Append_block: file_id: 100  block_len: 1048564
# at 609626882
#070905  0:55:34 server id 1  end_log_pos 5242935
#Append_block: file_id: 100  block_len: 1048564
# at 610675469
#070905  0:55:34 server id 1  end_log_pos 6291522
#Append_block: file_id: 100  block_len: 1048564
# at 611724056
#070905  0:55:34 server id 1  end_log_pos 7340109
#Append_block: file_id: 100  block_len: 1048564
# at 612772643
#070905  0:55:34 server id 1  end_log_pos 8388696
#Append_block: file_id: 100  block_len: 1048564
# at 613821230
#070905  0:55:34 server id 1  end_log_pos 9039608
#Append_block: file_id: 100  block_len: 650889
# at 614472142
#070905  0:55:34 server id 1  end_log_pos 9039833       Execute_load_query      thread_id=1087  exec_time=1     error_code=0
SET TIMESTAMP=1188978934;
load data LOCAL INFILE '/tmp/SQL_LOAD_MB-64-6' REPLACE INTO table tmp_update_publisher_products lines terminated by '\n';
# file_id: 100 
# at 614472367
#070905  0:55:34 server id 1  end_log_pos 614472394     Xid = 200318
COMMIT;
# at 614472394
#070905  0:55:35 server id 1  end_log_pos 28    Intvar
SET INSERT_ID=86865;
# at 614472422
#070905  0:55:35 server id 1  end_log_pos 525   Query   thread_id=1087  exec_time=2     error_code=0
SET TIMESTAMP=1188978935;
Insert into keys_publisher_merchant (publisher_id, merchant_id, retailer_id)  Select distinct publisher_id,merchant_id,retailer_id from tmp
_update_publisher_products CM where not exists ( Select publisher_id,merchant_id,retailer_id from keys_publisher_merchant KM  where CM.publ
isher_id=KM.publisher_id and CM.merchant_id=KM.merchant_id and CM.retailer_id=KM.retailer_id)   order by publisher_id,merchant_id,retailer_
id;
# at 614472919
#070905  0:55:35 server id 1  end_log_pos 614472946     Xid = 200319
COMMIT;

There are many such similar sections in the logs, this is how the developers populate these tables.  The table is InnoDB, has ~100,000 rows, and the SQL_LOAD_MB files range in size from a few K to a few M.

How to repeat:
Unfortunately I do not have a simple test case.
[8 Sep 2007 0:59] Valeriy Kravchuk
Thank you for a problem report. Please, try to use newer version, 5.0.45, and inform about the results.
[25 Sep 2007 18:50] Alex Bame
I am seeing the same results after upgrading the slave to 5.0.45 and starting with a fresh snapshot.  Upgrading the master would be problematic since we are not able to fully test the new version due to replication breaking.
[26 Sep 2007 11:49] Sveta Smirnova
Thank you for the report.

Please provide output of SHOW CREATE TABLE tmp_update_publisher_products and SHOW CREATE TABLE tumri_publisher_new on both master and slave. Also please provide configuration files for both master and slave.
[26 Sep 2007 22:46] Alex Bame
show create table on the master

Attachment: master.show-create-table.txt (text/plain), 1.72 KiB.

[26 Sep 2007 22:46] Alex Bame
show create table on the slave

Attachment: slave.show-create-table.txt (text/plain), 1.74 KiB.

[26 Sep 2007 22:46] Alex Bame
master config

Attachment: master.cnf (application/octet-stream, text), 18.03 KiB.

[26 Sep 2007 22:46] Alex Bame
slave config

Attachment: slave.cnf (application/octet-stream, text), 18.29 KiB.

[26 Sep 2007 22:49] Alex Bame
Attached requested information.  Please note that the current replication state is different from the SHOW SLAVE STATUS I originally posted as we re-attempted initiating the replication after upgrading the slave.  The current SHOW SLAVE STATUS is:

mysql> show slave status \G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: vezina.dc1.tumri.net
                Master_User: repl
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: binlog_vezina-p1_g1.000016
        Read_Master_Log_Pos: 471068830
             Relay_Log_File: logs-reporting-relay-bin.000005
              Relay_Log_Pos: 23822631
      Relay_Master_Log_File: binlog_vezina-p1_g1.000015
           Slave_IO_Running: Yes
          Slave_SQL_Running: No
            Replicate_Do_DB: 
        Replicate_Ignore_DB: mysql,information_schema,mysql,information_schema
         Replicate_Do_Table: 
     Replicate_Ignore_Table: 
    Replicate_Wild_Do_Table: %.%,%.%
Replicate_Wild_Ignore_Table: 
                 Last_Errno: 1062
                 Last_Error: Error 'Duplicate entry '91249' for key 1' on query. Default database: 'tumri_publisher_new'. Query: 'Insert into keys_publisher_merchant (publisher_id, merchant_id, retailer_id)  Select distinct publisher_id,merchant_id,retailer_id from tmp_update_publisher_products CM where not exists ( Select publisher_id,merchant_id,retailer_id from keys_publisher_merchant KM  where CM.publisher_id=KM.publisher_id and CM.merchant_id=KM.merchant_id and CM.retailer_id=KM.retailer_id)   order by publisher_id,merchant_id,retailer_id'
               Skip_Counter: 0
        Exec_Master_Log_Pos: 23840427
            Relay_Log_Space: 1546183439
            Until_Condition: None
             Until_Log_File: 
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File: 
         Master_SSL_CA_Path: 
            Master_SSL_Cert: 
          Master_SSL_Cipher: 
             Master_SSL_Key: 
      Seconds_Behind_Master: NULL
1 row in set (0.00 sec)
[17 May 2008 16:24] Sveta Smirnova
Thank you for the feedback.

Is your slave read only?

I ask, because error "Error 'Duplicate entry '91249' for key 1' on query." shows there is duplicate entry for key PRIMARY KEY publisher_merchant_key_id. It can be happen, because binary log contains INSERT_ID which is used for AUTO_INCREMENT keys to be sure data on master and slave is consistent. so if slave contains record with publisher_merchant_key_id=91249 already this error is expected.
[17 May 2008 17:16] Alex Bame
There were no writes to the slave, it is a cold standby.

This bug was a duplicate of #15126, we ran the data import jobs in parallel with a copy of the master running 5.0.45 and replicating to a 5.0.45 slave. The error occurred only on entries with utf8 characters and only when replicating with the old MySQL version as master.
[17 May 2008 18:28] Sveta Smirnova
Thank you for the feedback.

Does this mean bug is not repeatable with newer master?
[17 May 2008 18:42] Alex Bame
>Does this mean bug is not repeatable with newer master?

Yes.
[17 May 2008 18:50] Sveta Smirnova
Thank you for the feedback.

Report closed as "Can't repeat" according to last comment.