Bug #49513 Weird 'Skipped' records with an auto_increment primary key and LOAD DATA INFILE
Submitted: 7 Dec 2009 16:15 Modified: 8 Jan 2010 6:43
Reporter: Are you mortal Then prepare to die. Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: DDL Severity:S2 (Serious)
Version:5.0.45-log Source distribution OS:Any
Assigned to: CPU Architecture:Any

[7 Dec 2009 16:15] Are you mortal Then prepare to die.
Description:
I'm trying to load data into mysql with a query that looks like this:

DROP   TABLE IF EXISTS pot_wgs_v2_vs_tom_wgs_v1_blast;
CREATE TABLE           pot_wgs_v2_vs_tom_wgs_v1_blast(
 QUERY         CHAR(13)                        NOT NULL,
 HIT           CHAR(20)                        NOT NULL,
 HSP_IDX       MEDIUMINT       UNSIGNED        NOT NULL AUTO_INCREMENT,
 IDENT         FLOAT           UNSIGNED        NOT NULL,
...
 PRIMARY KEY (QUERY, HIT, HSP_IDX)
);

LOAD DATA LOCAL INFILE
".../tom_vs_pot_01.blt"
INTO TABLE
  pot_wgs_v2_vs_tom_wgs_v1_blast(
    QUERY, HIT, IDENT, ...
);

This *should* simply increment the value of HSP_IDX for every unique QUERY-HIT pair, and that's what I've always observed it doing in similar situations the past.  i.e. see:

http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html

However, what I'm seeing is this:

Query OK, 1089124 rows affected (29.65 sec)
Records: 1121580  Deleted: 0  Skipped: 32456  Warnings: 0

I have no idea why records are being skipped. The maximum value of HSP_IDX  is 6996 (well below the limit of MEDIUMINT).

Now it gets very weird... If I change the type of HSP_IDX from MEDIUMINT to SMALLINT, INT or BIGINT, things work fine! I.e. I see all rows added as expected, with no warnings.

Please let me know what additional information you require to track this down. I'll send my data file for you to reproduce this if you like. Hrm... I just made a small version of my input data that does show the bug on my system. It's attached.

Cheers,
Dan.

How to repeat:

It only fails on my exact dataset! I tried to recreate the problem with this code:

perl -e '
  @x=qw(one two thr fou fiv six sev eig nin ten);
  @y=qw(fooo barr bazz quxx quux corg grau garp wald fred plug xyzy thud);
  for(1..1_500_000){
    print join("\t", $x[rand(@x)], $y[rand(@y)]), "\n"
  }

' > flibble

$ head flibble
two     barr
nin     bazz
thr     garp
nin     wald
...

DROP   TABLE IF EXISTS flibblex;
CREATE TABLE           flibblex(
  QUERY         CHAR(3)                        NOT NULL,
  HIT           CHAR(4)                        NOT NULL,

  HSP_IDX       MEDIUMINT       UNSIGNED       NOT NULL AUTO_INCREMENT,
  #
  PRIMARY KEY (QUERY, HIT, HSP_IDX)
);

LOAD DATA LOCAL INFILE
  ".../flibble"
INTO TABLE flibblex
  (QUERY, HIT)
;

Query OK, 1500000 rows affected (20.88 sec)
Records: 1500000  Deleted: 0  Skipped: 0  Warnings: 0

^^^ i.e. no fail! 

Here is code using (a subset of) my dataset (attached):

DROP   TABLE IF EXISTS tom_wgs_v1_vs_pot_wgs_v2_blast_test;
CREATE TABLE           tom_wgs_v1_vs_pot_wgs_v2_blast_test(
  QUERY         CHAR(13)                        NOT NULL,
  HIT           CHAR(20)                        NOT NULL,
  HSP_IDX       MEDIUMINT       UNSIGNED        NOT NULL AUTO_INCREMENT,
  #
  PRIMARY KEY (QUERY, HIT, HSP_IDX),
  INDEX SCAFF_ID_IDX (HIT)
);

LOAD DATA LOCAL INFILE 
  ".../tom_vs_pot_01.500.test.blt"
INTO TABLE
  tom_wgs_v1_vs_pot_wgs_v2_blast_test(
    QUERY, HIT
);
[7 Dec 2009 16:16] Are you mortal Then prepare to die.
The dataset used to demonstrate the bug in question. Use in combination with the suggested code.

Attachment: tom_vs_pot_01.500.test.blt.gz (application/x-gzip, text), 169 bytes.

[7 Dec 2009 16:17] Are you mortal Then prepare to die.
DROP   TABLE IF EXISTS tom_wgs_v1_vs_pot_wgs_v2_blast_test;
CREATE TABLE           tom_wgs_v1_vs_pot_wgs_v2_blast_test(
  QUERY         CHAR(13)                        NOT NULL,
  HIT           CHAR(20)                        NOT NULL,
  HSP_IDX       MEDIUMINT       UNSIGNED        NOT NULL AUTO_INCREMENT,
  #
  PRIMARY KEY (QUERY, HIT, HSP_IDX),
  INDEX SCAFF_ID_IDX (HIT)
);

LOAD DATA LOCAL INFILE 
  "/homes/dbolser/BiO/Research/Mapping/Blast/Potato_WGS_Scaff_Vs_Tomato_WGS_Scaff/Results/tom_vs_pot_01.500.test.blt"
INTO TABLE
  tom_wgs_v1_vs_pot_wgs_v2_blast_test(
    QUERY, HIT
);

Query OK, 498 rows affected (0.01 sec)
Records: 500  Deleted: 0  Skipped: 2  Warnings: 0
[7 Dec 2009 16:24] Are you mortal Then prepare to die.
Ever growing weirdness...

This dataset (below) skips 55 rows when its a MEDIUMINT, 53 rows when its a SMALLINT, ZERO rows when its a TINYINT and ZERO rows when its a BIGINT... sheesh!

DROP   TABLE IF EXISTS tom_wgs_v1_vs_pot_wgs_v2_blast_test;
CREATE TABLE           tom_wgs_v1_vs_pot_wgs_v2_blast_test(
  QUERY         CHAR(13)                        NOT NULL,
  HIT           CHAR(20)                        NOT NULL,
  HSP_IDX       TINYINT       UNSIGNED        NOT NULL AUTO_INCREMENT,
  #
  PRIMARY KEY (QUERY, HIT, HSP_IDX),
  INDEX SCAFF_ID_IDX (HIT)
);
##
LOAD DATA LOCAL INFILE 
  ".../tom_vs_pot_01.500.test.2.blt"
INTO TABLE
  tom_wgs_v1_vs_pot_wgs_v2_blast_test(
    QUERY, HIT
);

Query OK, 139 rows affected (0.00 sec)
Records: 139  Deleted: 0  Skipped: 0  Warnings: 0

DROP   TABLE IF EXISTS tom_wgs_v1_vs_pot_wgs_v2_blast_test;
CREATE TABLE           tom_wgs_v1_vs_pot_wgs_v2_blast_test(
  QUERY         CHAR(13)                        NOT NULL,
  HIT           CHAR(20)                        NOT NULL,
  HSP_IDX       SMALLINT       UNSIGNED        NOT NULL AUTO_INCREMENT,
  #
  PRIMARY KEY (QUERY, HIT, HSP_IDX),
  INDEX SCAFF_ID_IDX (HIT)
);
##
LOAD DATA LOCAL INFILE 
  ".../tom_vs_pot_01.500.test.2.blt"
INTO TABLE
  tom_wgs_v1_vs_pot_wgs_v2_blast_test(
    QUERY, HIT
);

Query OK, 86 rows affected (0.01 sec)
Records: 139  Deleted: 0  Skipped: 53  Warnings: 0

scaffold00002	PGSC0002DMS000002716
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001706
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000001491
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
scaffold00002	PGSC0002DMS000002199
[8 Dec 2009 6:43] Sveta Smirnova
Thank you for the report.

But version 5.0.45 is very old and I can not repeat described behavior with current version.

Please try current version 5.0.88 and if problem still exists indicate if you had rows in this table before calling LOAD DATA statement.
[9 Jan 2010 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".