Bug #93141 Duplicate Records despite index
Submitted: 9 Nov 2018 13:17 Modified: 20 Nov 2018 14:42
Reporter: Daniel Jänecke Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:5.7.22-22-log OS:Linux
Assigned to: CPU Architecture:Any

[9 Nov 2018 13:17] Daniel Jänecke
Description:
I encountered some strange behavior with an existing table / existing data.  

We have this table which atm holds about 2 million records:

> SHOW CREATE TABLE my_source;
+-------------------------+-----------------------------------------------
| Table                   | Create Table                                  
+-------------------------+-----------------------------------------------
| my_source               | CREATE TABLE `my_source` (
  `column_1` mediumint(8) unsigned NOT NULL,
  `column_2` int(10) unsigned NOT NULL,
  PRIMARY KEY (`column_1`,`column_2`),
  UNIQUE KEY `column_1` (`column_1`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german2_ci |
+-------------------------+-----------------------------------------------

Data was about to be partially shifted to another table:
> SHOW CREATE TABLE my_destination;
+-------------------+-----------------------------------------------------
| Table             | Create Table                                        
+-------------------+-----------------------------------------------------
| my_destination    | CREATE TABLE `my_destination` (
  `id` mediumint(8) unsigned NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------------------+-----------------------------------------------------

First try failed:
> INSERT INTO my_destination SELECT column_1 FROM my_source;
ERROR 1062 (23000): Duplicate entry '248579' for key 'PRIMARY'

Second try:
> INSERT IGNORE INTO my_destination SELECT column_1 FROM my_source;
Query OK, 2142190 rows affected, 3 warnings (9.15 sec)
Records: 2142193  Duplicates: 3  Warnings: 3

Warning (Code 1062): Duplicate entry '248579' for key 'PRIMARY'
Warning (Code 1062): Duplicate entry '523902' for key 'PRIMARY'
Warning (Code 1062): Duplicate entry '541921' for key 'PRIMARY'

Looking up these keys without using an index show that there are in fact duplicate keys:
> SELECT column_1 FROM my_source USE INDEX () WHERE column_1 IN (248579,523902,541921);
+-----------+
| column_1  |
+-----------+
|    248579 |
|    248579 |
|    523902 |
|    523902 |
|    541921 |
|    541921 |
+-----------+

Of course it is (now) impossible to create duplicates:
> INSERT INTO my_source VALUES(248579, 42);
ERROR 1062 (23000): Duplicate entry '248579' for key 'column_1'

And since indexes cannot be disabled for InnoDB I have no idea how this data could ever be created. 

How to repeat:
I cannot reproduce creation of this data.

Suggested fix:
A table shall not contain data which according to its indexes must not exist.
[12 Nov 2018 11:55] MySQL Verification Team
Thank you for the bug report. To process this bug report we need a repeatable test case, please comment here when you will able to provide it. Thanks in advance.
[13 Nov 2018 14:14] Daniel Jänecke
I cannot reproduce this, I have no idea how this could happen. All I can say is that this data exists, though the unique index clearly says that it must not exist.

All I can offer is either a dump or the innodb files of the affected table.

By the way, I started to search through all our (non-production) databases and found other cases in four tables. Versions are 5.1, 5.5 and 5.7
[20 Nov 2018 7:21] Daniel Jänecke
Just to get that right - unless I find out how it was possible to have duplicate data in that tables you will not process this issue?
[20 Nov 2018 9:36] MySQL Verification Team
Thank you for the feedback.
I tried test cases from old but similar/related bug's(Bug #69979, Bug #76927, Bug #73170 etc) to reproduce this issue on latest GA but not seeing any issues now and most likely this is the reason our Miguel asked you to provide a reproducible test case in order to proceed further on this.  
Your observed issue is unusual and it is hard to know how it allowed invalid data at first place but if we have a reproducible case or exact steps which triggered this issue then it would be much helpful.

regards,
Umesh