MySQL Bugs: #69979: columns with unique key gets intermittent duplicate values!

Bug #69979	columns with unique key gets intermittent duplicate values!
Submitted:	10 Aug 2013 9:59	Modified:	3 Dec 2013 9:58
Reporter:	Shane Bester (Platinum Quality Contributor)	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S1 (Critical)
Version:	5.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
During a concurrent workload, a column sees duplicate values in the data, 
although there is a unique index on that column.

Testcase prints out results of this, which should never happen
because a is a unique non-null column.

select count(*) cnt,a from t1 group by a having cnt>1;

+------+------+
| @cnt | @a   |
+------+------+
|    2 |    2 |
+------+------+
1 row in set (1 min 16.80 sec)

How to repeat:
drop table if exists t1;
create table t1(a tinyint not null,b tinyint not null,primary key(b),unique key(a))engine=innodb;

drop procedure if exists p1;
delimiter $
create procedure p1()
begin
  declare continue handler for 1062 begin end;
  declare continue handler for 1213 begin end;
  set transaction isolation level repeatable read;
  repeat
    if rand() > 0.5 then start transaction; end if;
    if rand() > 0.5 then replace into t1(a,b) values(floor(3*rand()),floor(3*rand())); end if;
    select count(*) cnt,a into @cnt,@a from t1 group by a having cnt>1 limit 1;
    if @cnt is not null then
     select @cnt,@a;
     select * from t1 where a=@a;
    end if;
    set @cnt:=null,@a:=null;
    if rand() > 0.5 then commit; end if;
  until 1=2 end repeat;

end $

delimiter ;

call p1();  # run this in 2 or more threads

At least 5.0, 5.1, 5.5, 5.6 appear to be affected. Am I missing something here?

if you do a 'select * from t1' in the if block, you'll clearly see:
+---+---+
| a | b |
+---+---+
| 1 | 0 |
| 1 | 1 |
+---+---+
2 rows in set (7 min 5.06 sec)

+---+---+
| a | b |
+---+---+
| 2 | 1 |
| 2 | 2 |
+---+---+
2 rows in set (7 min 5.08 sec)

+---+---+
| a | b |
+---+---+
| 2 | 0 |
| 2 | 2 |
+---+---+
2 rows in set (7 min 5.08 sec)

+---+---+
| a | b |
+---+---+
| 1 | 0 |
| 1 | 1 |
+---+---+
2 rows in set (7 min 5.09 sec)

See "Note" in http://dev.mysql.com/doc/refman/5.5/en/innodb-consistent-read.html

"If a transaction does update or delete rows committed by a different transaction, those changes do become visible to the current transaction."

this happens in read committed mode also...

the read committed case crashes and prints out errors to error log. i filed that bug here:
Bug 17302896 - ASSERTION: REC_GET_DELETED_FLAG(BTR_CUR_GET_REC(BTR_CUR), DICT_TABLE_IS_COMP(NOD

With the same test case? Do you set the isolation level with tx_isolation or set transaction isolation level?

I ask because that the "set transaction isolation level" in the test will only affect the next transaction, yet the loop might start multiple transactions.

tx_isolation=read-committed.  I reported a new internal bug for the index corruption + assertions for the read-committed case. let innodb devs sort that out internally first, then come back here :)  it appears that 'replace' is not a truly atomic statement even in autocommit=1 mode?

The internal bug is a race condition on rollback, which has become easier to repeat with the dedicated purge thread in 5.5 and 5.6:

Bug#17302896 DOUBLE PURGE ON ROLLBACK OF UPDATING A DELETE-MARKED RECORD

tested even with fix for 17302896, p1() is printing out duplicates here!

Attachment: bug69979.sql (application/octet-stream, text), 1.24 KiB.

in above attachment, the fact that p1's select * from t1 is outside the transaction doesn't matter.  putting it inside the trx just means it will always show you the read view exactly.

This bug is not related to
  Bug 17292267 - INCONSISTENCY ON REPEATABLE-READ ISOLATION LEVEL

That Bug was closed because InnoDB is working as designed.  InnoDB implements a WRITE COMMITTED version of REPEATABLE READ where changes committed after the RR transaction started are added to the view of that transaction if they are within the predicate of an UPDATE.

But this bug is a failure in REPLACE to delete all previous conflicts.  This bug happens only occasionally and seems to be very timing dependent.

I modified the testcase a little so that it shows a little more information; 

============================================
drop table if exists t1;
create table t1(a tinyint not null, b tinyint not null, who int, rep_count int, trx_count int, trx_started int, primary key(b), unique key(a)) engine=innodb;

drop procedure if exists p1;
delimiter $
create procedure p1(me int)
begin
  declare continue handler for 1062 begin end;
  declare continue handler for 1213 begin end;
  set transaction isolation level repeatable read;
  set @trx_count:=0;
  set @trx_count:=0;
  set @replace_count:=0;
  set @cnt_a:=null,@cnt_b:=null,@a:=null;
  repeat
    if rand() > 0.5 then
      if @trx_started = 0 then
        set @trx_count:=(@trx_count + 1);
      end if;
      set @trx_started:=@replace_count;
      start transaction;
    end if;
    if rand() > 0.5 then
      set @replace_count:=(@replace_count + 1);
      replace into t1(a,b,who,rep_count,trx_count,trx_started)
         values(floor(3*rand()), floor(3*rand()),
         me, @replace_count, @trx_count, @trx_started);
    end if;
    select count(*) cnt_a,a into
      @cnt_a,@a from t1 group by a having cnt_a > 1 limit 1;
    select count(*) cnt_b,b into
      @cnt_b,@b from t1 group by a having cnt_b > 1 limit 1;
    if (@cnt_a is not null) || (@cnt_b is not null) then
     select * from t1;
    end if;
    set @cnt_a:=null,@cnt_b:=null,@a:=null;
    if rand() > 0.5 then
      set @trx_started:=0;
      commit;
    end if;
  until 1=2 end repeat;
end $
delimiter ;

call p1(77);

No conflicts occur until you start p1(33); in another session. (any number for the p1(me) parameter will do).

If the procedure is changed to start and stop the transaction all the time the bug does not occur.  It will also not occur if transactions are not used.

The output to this testcase shows that;
1) The duplicate records are always a duplicate in field a, which is not the primary key.  It is the secondary key.  If I switch the primary key to a, with the secondary key as b, the conflict is still always on column a.
2) The previous version of the record can be inserted by either session, even the most recent transaction of the current session.
3) If I set innodb_change_buffering=none  the problem still occurs.

Here are some example failures from the testcase above with two sessions;

+---+---+------+-----------+-----------+-------------+
| a | b | who  | rep_count | trx_count | trx_started |
+---+---+------+-----------+-----------+-------------+
| 2 | 0 |   33 |      1313 |       874 |        1311 |
| 2 | 1 |   33 |      1311 |       873 |           0 |
+---+---+------+-----------+-----------+-------------+
2 rows in set (31.15 sec)

+---+---+------+-----------+-----------+-------------+
| a | b | who  | rep_count | trx_count | trx_started |
+---+---+------+-----------+-----------+-------------+
| 1 | 0 |   33 |      1474 |       973 |        1473 |
| 2 | 1 |   33 |      1473 |       970 |        1472 |
| 1 | 2 |   33 |      1471 |       969 |        1469 |
+---+---+------+-----------+-----------+-------------+
3 rows in set (35.02 sec)

+---+---+------+-----------+-----------+-------------+
| a | b | who  | rep_count | trx_count | trx_started |
+---+---+------+-----------+-----------+-------------+
| 0 | 0 |   33 |      1484 |       977 |        1481 |
| 2 | 1 |   33 |      1483 |       977 |        1481 |
| 0 | 2 |   77 |      3571 |      2362 |        3570 |
+---+---+------+-----------+-----------+-------------+
3 rows in set (35.19 sec)

+---+---+------+-----------+-----------+-------------+
| a | b | who  | rep_count | trx_count | trx_started |
+---+---+------+-----------+-----------+-------------+
| 1 | 0 |   77 |      3779 |      2521 |        3778 |
| 1 | 1 |   33 |      1698 |      1106 |        1697 |
+---+---+------+-----------+-----------+-------------+
2 rows in set (40.04 sec)

+---+---+------+-----------+-----------+-------------+
| a | b | who  | rep_count | trx_count | trx_started |
+---+---+------+-----------+-----------+-------------+
| 1 | 0 |   33 |      1733 |      1130 |        1732 |
| 1 | 2 |   33 |      1730 |      1128 |        1729 |
+---+---+------+-----------+-----------+-------------+

+---+---+------+-----------+-----------+-------------+
| a | b | who  | rep_count | trx_count | trx_started |
+---+---+------+-----------+-----------+-------------+
| 0 | 0 |   77 |      3735 |      2487 |        3734 |
| 1 | 1 |   77 |      3734 |      2486 |        3733 |
| 0 | 2 |   77 |      3733 |      2485 |        3732 |
+---+---+------+-----------+-----------+-------------+
3 rows in set (57.86 sec)

+---+---+------+-----------+-----------+-------------+
| a | b | who  | rep_count | trx_count | trx_started |
+---+---+------+-----------+-----------+-------------+
| 0 | 0 |   77 |      3810 |      2542 |        3809 |
| 0 | 1 |   77 |      3809 |      2541 |           0 |
+---+---+------+-----------+-----------+-------------+
2 rows in set (59.63 sec)

+---+---+------+-----------+-----------+-------------+
| a | b | who  | rep_count | trx_count | trx_started |
+---+---+------+-----------+-----------+-------------+
| 2 | 0 |   77 |      3831 |      2558 |        3829 |
| 2 | 2 |   77 |      3829 |      2557 |        3828 |
+---+---+------+-----------+-----------+-------------+
2 rows in set (1 min 0.09 sec)

+---+---+------+-----------+-----------+-------------+
| a | b | who  | rep_count | trx_count | trx_started |
+---+---+------+-----------+-----------+-------------+
| 2 | 0 |   77 |      3930 |      2626 |        3928 |
| 2 | 2 |   33 |      1851 |      1198 |           0 |
+---+---+------+-----------+-----------+-------------+
2 rows in set (1 min 2.46 sec)

+---+---+------+-----------+-----------+-------------+
| a | b | who  | rep_count | trx_count | trx_started |
+---+---+------+-----------+-----------+-------------+
| 2 | 0 |   77 |      3979 |      2670 |        3978 |
| 2 | 2 |   77 |      3978 |      2669 |        3977 |
+---+---+------+-----------+-----------+-------------+
2 rows in set (1 min 3.71 sec)

fwiw, i'm seeing sec index dups even without using REPLACE or INSERT .. ON DUPLICATE KEY ...   

Just using insert, delete, update causes another readonly repeatable read transaction to show duplicates.  Similarly, check table marks the table as corrupt.

I was suspecting that this could be caused by row_undo_mod_remove_clust_low() leaving an orphan delete-marked record in the unique secondary index. When I disabled the calls, my test ran for 48 minutes before hitting the locking assertion failure, again with an orphan record in the secondary index.

Without disabling the calls, the test would crash in 7 minutes.

This bug is caused by a race condition between insert and purge on tables that contain secondary indexes.

A prerequisite seems to be that the same PRIMARY KEY and UNIQUE KEY values get reused (the row is deleted, and later inserted with the same values for the keys).

The exact details of the race are unknown to me, but both this bug and a locking assertion failure will be gone when we fake a DB_DUPLICATE_KEY error in row_ins_clust_index_entry_by_modify().

The problem seems to occur when an insert encounters a purgeable delete-marked clustered index record.

My suggested fix is to make the insert thread purge the committed delete-marked record. (Alternatively, the insert thread should wait for the purge to do its job, or fake an error.)

NOTE: we will need row_ins_clust_index_entry_by_modify() or TRX_UNDO_UPD_DEL_REC for the case where the same transaction is INSERTing and DELETEing the same row back and forth.

This bug (along with Bug#66819) could explain Bug#61104 and Bug#58571.

[17 Oct 2013 20:05] Kevin Lewis

I repeated this and will discuss it inline;

mysql> create table t1 (f1 int primary key, f2 int, unique key(f2))
engine=innodb;
Query OK, 0 rows affected (21.80 sec)

mysql>  show create table t1;
+-------+-----------------------------------+
| Table | Create Table                      |
+-------+-----------------------------------+
| t1    | CREATE TABLE `t1` (
  `f1` int(11) NOT NULL,
  `f2` int(11) DEFAULT NULL,
  PRIMARY KEY (`f1`),
  UNIQUE KEY `f2` (`f2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-----------------------------------+
1 row in set (0.00 sec)

mysql> insert into t1 values (10, 100), (20, 200);
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1;
+----+------+
| f1 | f2   |
+----+------+
| 10 |  100 |
| 20 |  200 |
+----+------+
2 rows in set (0.00 sec)

          ### On client 2 ###
          mysql> update t1 set f2 = 300 where f1 = 10;
          Query OK, 1 row affected (0.01 sec)
          Rows matched: 1  Changed: 1  Warnings: 0

          mysql> select * from t1;
          +----+------+
          | f1 | f2   |
          +----+------+
          | 20 |  200 |
          | 10 |  300 |
          +----+------+
          2 rows in set (0.00 sec)

mysql> update t1 set f2 = 100 where f1 = 20;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from t1;
+----+------+
| f1 | f2   |
+----+------+
| 10 |  100 |
| 20 |  100 |
+----+------+
2 rows in set (0.00 sec)

#####
Here is where the problem shows up.  In fact, I have previously seen this
phenomenon, but forgot about it.
The InnoDB version of a Repeatable Read transaction will continue to see the
unchanged version of a record that existed when it started until it commits.
You can see below that when this transaction commits, the client will see the
newest version of the f1=10 record.
#####

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from t1;
+----+------+
| f1 | f2   |
+----+------+
| 20 |  100 |
| 10 |  300 |
+----+------+
2 rows in set (0.00 sec)

#####
I tied this with a non-unique key on f2 and also with no key on f2.  In each
case it does the same thing.  The original f1 record with f2=100 remains in
the view after it has been replaced or updated with a f1=10,f2=300 record or
even if it has been deleted.  That is part of the 'snapshot' or Consistent
Read idea behind Repeatable read.  And there is truly nothing preventing the
second update where the a new record shows up with f2=100 when the old
version of the f1=10 record is still in the Repeatable Read view.

I think it would be wrong to stop the RR transaction from changing record
f1=20 to f2=100 because there does not exist any other committed or pending
record that also has f2=100.  There is only an old version of a record that
existed when the transaction started.

It might be possible to notice this other old record in the view but I think
it would be wrong to allow it to prevent the UPDATE from happening.  That
record existed at the time the transaction started, but it does not exist any
more.  The practical need for throughput trumps any need to keep the
impression of serializability from this Repeatable read transaction.

Repeatable Read Isolation actually does allow for phantom records, which are
usually described as new records inserted into the field of view during the
transaction so that one SELECT may see 10 records and the same SELECT
statement later in the same transaction sees 11 records. InnoDB does not
allow these phantoms until an update is done to those 10 records.  If there
are 11 committed records at the time of the UPDATE, InnoDB will then bring
the phantom record into its view.

This anomaly of showing 2 records with the same unique key value during a
transaction where records have been modified is kind of similar to phantom
records.

Things are happening by other clients during a RR transaction and some
interaction is expected.  This is not Serializable Isolation.

I recommend that we close this bug as working as designed.  As an excersize,
Anna might want to test this simple SQL script on Postgress and Microsoft SQL
Server which also provide Repeatable Read Isolation.
#####

[12 Nov 2013 22:50] Annamalai Gurusami:

Kevin,  I tested on PostgreSQL 9.1.10 and they also have the exact save
behaviour as ours.

Initial Setup:

mydb=# create table t1 (f1 int primary key, f2 int unique);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey" for
table "t1"
NOTICE:  CREATE TABLE / UNIQUE will create implicit index "t1_f2_key" for
table "t1"
CREATE TABLE
mydb=# insert into t1 values (10, 100), (20, 200);
INSERT 0 2
mydb=# select * from t1;
 f1 | f2
----+-----
 10 | 100
 20 | 200
(2 rows)

Client 1:

mydb=# start transaction;
START TRANSACTION
mydb=# set transaction isolation level repeatable read;
SET
mydb=# select * from t1;
 f1 | f2
----+-----
 10 | 100
 20 | 200
(2 rows)

Client 2:

mydb=# update t1 set f2 = 300 where f1 = 10;
UPDATE 1
mydb=#

Client 1:

mydb=# update t1 set f2 = 100 where f1 = 20;
UPDATE 1
mydb=# select * from t1;
 f1 | f2
----+-----
 10 | 100
 20 | 100
(2 rows)

mydb=#

So as recommended, I'll close this as not a bug.

FWIW, there were 2 different issues combined in this bug report, repeatable with variations of the same test case.

My comments about assertion failures in this bug are referring to a race condition that is being tracked by
Bug#17320977 FAILING ASSERTION: LOCK_REC_HAS_TO_WAIT_IN_QUEUE(LOCK)

The other issue is the operation of InnoDB multiversioning, which was what was closed as "not a bug".

I reported a similar looking problem, although a little bit different, here:
https://bugs.launchpad.net/percona-server/+bug/1322559

It wasn't tested on Oracle MySQL so far, which is why it's there and not here, but I thought I'd mention it.

Is this a duplicate of bug#73170 ?

Posted by developer:
 
No.  This bug is not related to Bug #73170.  This bug has been closed as not a bug.  The other bug bug#73170 was a real bug.

this bug is mentioned in https://bugs.mysql.com/bug.php?id=93449 , glad something came of this.

I agree that this is not a bug, but I don't agree that is a problem like phantoms read in InnoDB.

The root reason for this issue is that the unique check row_ins_scan_sec_index_for_duplicate() function in this SQL "update t1 set f2 = 100 where f1 = 20;" in InnoDB compare the newest data in InnoDB, it don't read the old version data according to the mvcc rule. Then it pass the unique check function. Then the update success. Then in the next select we saw duplicate values.