Bug #29151 | Falcon: running sysbench 0.4.8 leads to duplicate key errors | ||
---|---|---|---|
Submitted: | 15 Jun 2007 22:10 | Modified: | 15 May 2009 14:14 |
Reporter: | Hakan Küçükyılmaz | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
Version: | 6.0.4 | OS: | Any |
Assigned to: | Hakan Küçükyılmaz | CPU Architecture: | Any |
Tags: | F_ISOLATION, sysbench |
[15 Jun 2007 22:10]
Hakan Küçükyılmaz
[18 Jun 2007 18:16]
MySQL Verification Team
Thank you for the bug report. [miguel@light sysbench-0.4.8]$ sysbench --test=oltp --mysql-engine-trx=yes --mysql-host=localhost --mysql-user=root --max-requests=0 --max-time=60 --num-threads=2 --oltp-table-size=100000 --mysql-table-engine=Falcon run sysbench v0.4.8: multi-threaded system evaluation benchmark No DB drivers specified, using mysql WARNING: Preparing of "BEGIN" is unsupported, using emulation (last message repeated 1 times) Running the test with following options: Number of threads: 2 Doing OLTP test. Running mixed OLTP test Using Special distribution (12 iterations, 1 pct of values are returned in 75 pct cases) Using "BEGIN" for starting transactions Using auto_inc on the id column Threads started! ALERT: failed to execute mysql_stmt_execute(): Err2013 Lost connection to MySQL server during query FATAL: database error, exiting... ALERT: failed to execute mysql_stmt_execute(): Err2013 Lost connection to MySQL server during query FATAL: database error, exiting... Done.
[29 Jun 2007 10:50]
Hakan Küçükyılmaz
Can't repeat anymore. I tried it 1, 2, and 5 threads. hakan@lu0011:~$ sysbench --test=oltp --mysql-engine-trx=yes --mysql-host=localhost --mysql-user=root --max-requests=0 --max-time=60 --num-threads=5 --oltp-table-size=100000 --mysql-table-engine=Falcon run sysbench v0.4.8: multi-threaded system evaluation benchmark No DB drivers specified, using mysql WARNING: Preparing of "BEGIN" is unsupported, using emulation (last message repeated 4 times) Running the test with following options: Number of threads: 5 Doing OLTP test. Running mixed OLTP test Using Special distribution (12 iterations, 1 pct of values are returned in 75 pct cases) Using "BEGIN" for starting transactions Using auto_inc on the id column Threads started! Time limit exceeded, exiting... (last message repeated 4 times) Done. OLTP test statistics: queries performed: read: 84392 write: 30140 other: 12056 total: 126588 transactions: 6028 (100.29 per sec.) deadlocks: 0 (0.00 per sec.) read/write requests: 114532 (1905.49 per sec.) other operations: 12056 (200.58 per sec.) Test execution summary: total time: 60.1062s total number of events: 6028 total time taken by event execution: 300.4313 per-request statistics: min: 0.0090s avg: 0.0498s max: 2.7534s approx. 95 percentile: 0.0647s Threads fairness: events (avg/stddev): 1205.6000/6.80 execution time (avg/stddev): 60.0863/0.00 Error log says: Index deletion failed for record 105464.1 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105471.0 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105476.0 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105530.0 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105537.0 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105551.0 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105573.1 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105581.1 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105602.1 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105604.0 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105613.0 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105634.0 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105635.0 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105695.0 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105724.0 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105725.1 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105743.2 of SBTEST.SBTEST.SBTEST$1 Index deletion failed for record 105858.0 of SBTEST.SBTEST.SBTEST$1
[31 Jul 2007 9:15]
Hakan Küçükyılmaz
Running sysbench-0.4.8 with Falcon leads to duplicate key error. Change set is 1.2662, 2007-07-26 14:30:05+02:00, hakank@au0012.local How to repeat: mysqladmin -uroot create sbtest sysbench \ --test=oltp \ --mysql-user=root \ --oltp-table-size=100000 \ --mysql-table-engine=Falcon \ --mysql-engine-trx=yes \ prepare sysbench \ --test=oltp \ --mysql-engine-trx=yes \ --mysql-host=localhost \ --mysql-user=root \ --max-requests=0 \ --max-time=200 \ --num-threads=2 \ --oltp-table-size=100000 \ --oltp-read-only=off \ --mysql-table-engine=Falcon \ --mysql-engine-trx=yes \ run sysbench v0.4.8: multi-threaded system evaluation benchmark No DB drivers specified, using mysql WARNING: Preparing of "BEGIN" is unsupported, using emulation (last message repeated 1 times) Running the test with following options: Number of threads: 2 Doing OLTP test. Running mixed OLTP test Using Special distribution (12 iterations, 1 pct of values are returned in 75 pct cases) Using "BEGIN" for starting transactions Using auto_inc on the id column Threads started! ALERT: failed to execute mysql_stmt_execute(): Err1582 Duplicate entry '49957' for key 'PRIMARY' FATAL: database error, exiting... Done.
[1 Aug 2007 10:21]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/31947 ChangeSet@1.2663, 2007-08-01 12:18:19+02:00, hakank@lu0011.wdf.sap.corp +3 -0 Added test case for currently failing Bug#29151.
[1 Aug 2007 10:54]
Hakan Küçükyılmaz
Added simplified test case.
[1 Aug 2007 11:02]
Hakan Küçükyılmaz
Looks like an isolation level problem in REPEATABLE-READ. Following sceanrio fails with REPEATABLE-READ and succeeds with READ-COMMITTED. Please note that InnoDB does not fail with REPEATABLE-READ. We have one table with one record CREATE TABLE t1 ( a int, b int, c char(50), PRIMARY KEY (a) ); INSERT INTO t1 VALUES(49990, 0, 'c1'); On connection 1 we do BEGIN; UPDATE t1 SET b=b+1 WHERE a = 49990; DELETE FROM t1 WHERE a = 49990; INSERT INTO t1 VALUES(49990, 0, 'c1'); On connection 2 we see BEGIN; # This one blocks until other transaction COMMITs. UPDATE t1 SET b=b+1 WHERE a = 49990; Now connection 1 commits COMMIT; After that connection 2 deletes and inserts and fails DELETE FROM t1 WHERE a = 49990; # This gives an error with Falcon, but it should not. # However, with isolation level READ-COMMITTED it works. INSERT INTO t1 VALUES(49990, 0, 'c2'); COMMIT; Test case falcon_bug_29151.test exists. Best regards, Hakan
[1 Aug 2007 13:56]
Kevin Lewis
Thank you Hakan, for consolidating this down to a simple testcase. That really helps a lot. This behavior is due to the the 'InnoDB compatibility mode' that we now use for Repeatable Read. Under classic repeatable read, the new record did not exist when transaction 2 started so it would return an error. But now we wait on the update transaction to finish. The only thing is, the record we are waiting on is replaced during the transaction. The query, or record selection, was already accomplished. I can look into what it will take to redo the record selection part of the update after the waitForTransaction returns.
[6 Aug 2007 20:16]
Kevin Lewis
I do not think that this bug should be P1. It has existed ever since the 'InnoDB compatibility mode' was added to the Repeatable Read isolation. What happens is that transaction 1 deletes record number 0 and creates record number 1 with the same unique value. It is a primary key and transaction 1 is allowed to do that. Transaction 2 under classic Repeatable Read, would not see the new record. Since Falcon used 'fuzzy' indexes, the search for that unique value always returns two records after T1 commits. The first record is deleted and the second is newer than T2. Using classic Repeatable Read, the pre-deleted version of record 0 is visible and the newly inserted record 1 is not. But under 'InnoDB compatibility mode', the second record becomes visible after we waited on it for the update statement. When T2 wakes up after T1 commits, the update statement that caused the wait will fail because it was waiting on the deleted record 0. But a second identical update statement after t1 commits allows T2 to update the younger committed record 1. Once T2 has updated record 1, it becomes visible within T2. Now T2 sees both versions of that primary key; mysql> SELECT * FROM t1; +-------+------+------+ | a | b | c | +-------+------+------+ | 49990 | 0 | c1 | | 49990 | 1 | c2 | +-------+------+------+ 2 rows in set (0.08 sec) The delete statement will fail because one of the two records that are visible for primary key 49990 is already deleted. So the delete and insert portion of T2 fail. If the second update statement is not issued, T1 only sees record 0. mysql> SELECT * FROM t1; +-------+------+------+ | a | b | c | +-------+------+------+ | 49990 | 0 | c1 | +-------+------+------+ 1 row in set (0.00 sec) T1 is not able to delete it because it is already deleted by T2. So the delete and insert portion of T2 fail. This InnoDB Compatibility Mode is a can of worms when using fuzzy indexes. I will see if there is a way to use only one version of a unique index without causing additional idiosyncrasies. In the mean time, I am changing it to P2.
[1 Nov 2007 18:19]
Kevin Lewis
After a lot of effort to understand the difference between how Falcon does repeatable read isolation and how InnoDB does it, I think I finally have it working well enough. I pushed changes to the team tree that implements the InnoDB Compatibility Mode more fully than it was. This code push contained; 1) A new parameter called INNODB_COMPATIBILITY which allows the repeatable read isolation level to be able to make changes to newer committed records than what existed when the transaction started. This feature is ON by default, currently. But Jim Starkey has expressed a desire to turn it OFF by default. When OFF, repeatable read isolation is truly a CONSISTENT READ isolation, where newer records are not visible, nor changed. 2)The original testcase, falcon_bug_29151.test, now passes. In addition, I added three new ones; falcon_bug_29151_A; Checks that in CONSISTENT_READ mode (INNODB_COMPATIBILITY = OFF), INSERT, UPDATE and DELETE each wait for a transaction which would cause a failure if committed. Then once that transaction is COMMITTED, Falcon reports the error. falcon_bug_29151_B; Checks that in CONSISTENT_READ mode (INNODB_COMPATIBILITY = OFF), INSERT, UPDATE and DELETE each wait for a transaction which would cause a failure if committed. Then once that transaction is ROLLED BACK, Falcon can continue and the chane succeeds. falcon_bug_29151_C; Checks that in WRITE_COMMITTED mode (INNODB_COMPATIBILITY = ON), INSERT, UPDATE and DELETE each wait for a transaction which would affect which record is changed if committed. Then once that transaction is COMMITTED, Falcon can continue and the change succeeds on the newly committed record. 3) Changes were made to the behavior of the INSERT operation when there is a potential blocking or block relieving transaction pending. Before, it would not wait on this transaction. Now it does wait, hoping for a better chance for success. This behavior difference caused changes in falcon_bug_22847.test and falcon_bug_22111.test. 4) When an index path is being used to traverse a group of records, and a wait occurs, and the isolation level is READ_COMMITTED or WRITE_COMMITTED, Falcon will now re-scan the index before continuing the search. This is needed to catch a newly inserted record that may match the filter. 5) If INNODB_COMPATIBILITY = ON, the internal function Transaction::visible() now returns either CommittedVisible or CommittedInvisible for the same record depending on whether the purpose is for a WRITE or a READ. This trick allowed Falcon to show a repeatable read snapshot of the database, while also being able to change newly committed records.
[6 Nov 2007 4:08]
Kevin Lewis
Subsequent to the previous patch, I had to make two changes; 1) I had to undo change number 4 described in the previous comment. This looping in order to do a re-scan of an index, was accomplished by thowing an internal SQLException inside Falcon. This exception was caught and an error returned up the call stack until the scanIndex could be redone. But Jim pointed out that an exception is an expensive way to return an error code and it should only be done when you really have a problem. Also, the Falcon debug log was filling up with these loops. After taking it out, I found that the concurrent transactions from sysbench still appear to serialize, which is the behavior that is expected with the InnoDB version of Repeatable Read. The concurrent transactions work on records with the same unique key; BEGIN; UPDATE; DELETE; INSERT; COMMIT; Now, the UPDATE does not change any records. But it does not fail. The DELETE will delete the new record, and the INSERT will also work. So the sysbench test should be able to work without error. And falcon_bug_29151.test works as written. 2) The parameter "FALCON_INNODB_COMPATIBILITY", which defaulted to ON, is now renamed to its opposite meaning; "FALCON_CONSISTENT_READ", which defaults to ON. This means that Falcon's default behavior is the opposite of what it was last week. It defaults is consistent read, in which newer records cannot be read or written within a repeatable read transaction. To get a behavior similar to InnoDB which Falcon calls WRITE_COMMITTED, FALCON_CONSISTENT_READ must be set to OFF. NOTE; FALCON_CONSISTENT_READ = OFF does not result in the exact behavior of InnoDB. One difference is described in paragraph 1 above. Another difference is that when SELECTing records within a REPEATABLE_READ transaction that has modified a newer record, both records will be seen. This is true for Falcon even if both the old and new records have the same unique key.
[30 Nov 2007 2:56]
Hakan Küçükyılmaz
falcon_bug_29151 passes now. Also sysbench runs now without duplicate key error. I tried 2 and 10 threads with sysbench: hakan@lu0011:~$ sysbench --test=oltp --mysql-engine-trx=yes --mysql-host=localhost --mysql-user=root --max-requests=0 --max-time=60 --num-threads=10 --oltp-table-size=100000 --mysql-table-engine=Falcon run sysbench v0.4.8: multi-threaded system evaluation benchmark No DB drivers specified, using mysql WARNING: Preparing of "BEGIN" is unsupported, using emulation (last message repeated 9 times) Running the test with following options: Number of threads: 10 Doing OLTP test. Running mixed OLTP test Using Special distribution (12 iterations, 1 pct of values are returned in 75 pct cases) Using "BEGIN" for starting transactions Using auto_inc on the id column Threads started! Time limit exceeded, exiting... (last message repeated 9 times) Done. OLTP test statistics: queries performed: read: 82866 write: 28893 other: 11606 total: 123365 transactions: 5687 (94.63 per sec.) deadlocks: 232 (3.86 per sec.) read/write requests: 111759 (1859.72 per sec.) other operations: 11606 (193.13 per sec.) Test execution summary: total time: 60.0944s total number of events: 5687 total time taken by event execution: 600.5354 per-request statistics: min: 0.0182s avg: 0.1056s max: 1.1847s approx. 95 percentile: 0.1784s Threads fairness: events (avg/stddev): 568.7000/10.76 execution time (avg/stddev): 60.0535/0.03
[30 Nov 2007 20:42]
Bugs System
Pushed into 6.0.4-alpha
[3 Dec 2007 14:24]
MC Brown
A note has been added to the 6.0.4 changelog: Insert information into the same table from multiple threads could cause duplicate key errors. This was related to the changes made to allow compatibility with the InnoDB repeatable-read isolation level. The option, falcon_innodb_compatibility, has been renamed to falcon_consistent_read, but with the opposite effect. The default is for this option to be on. When set to off, the behaviour of Falcon is similar to that in InnoDB.
[5 Dec 2007 16:11]
Omer Barnir
Ran the sysbench oltp test on the system qa machine. The test passes with 5 concurrent users but the error is observed with 10 concurrent users: ALERT: failed to execute mysql_stmt_execute(): Err1062 Duplicate entry '4999' for key 'PRIMARY' FATAL: database error, exiting... ALERT: failed to execute mysql_stmt_execute(): Err1062 Duplicate entry '4965' for key 'PRIMARY' FATAL: database error, exiting... Setting bug back to 'verify'
[5 Dec 2007 17:23]
Kevin Lewis
Omer, This test is not designed to run with SET GLOBAL FALCON_CONSISTENT_READ=ON; Or I should say that FALCON_CONSISTENT_READ is not designed to keep these transaction from getting duplicate key errors. The nature of this transaction is that transactions that they are able to update records that are committed on a record after they get started. Please re-verify that this occurs when you use; SET GLOBAL FALCON_CONSISTENT_READ=OFF;
[6 Dec 2007 16:13]
Omer Barnir
Kevin, I did that already when working on this with Hakan (running with the value set to OFF) and the behavior was the same.
[6 Dec 2007 17:24]
Kevin Lewis
Omer, Hakan told me on IRC that setting falcon_consistent_read by SET GLOBAL FALCON_CONSISTENT_READ=OFF; does not really work. It does not affect the current connection. I am pursuing that as a separate issue. See Bug#33041. So the only reliable way to do it, until this is fixed, it to start the engine with --falcon-consistent-read=off. Please try that.
[6 Dec 2007 22:19]
Omer Barnir
Kevin, This is what I am doing. I also verified with a different connection that the value is indeed of. To clarfy, I am starting the server with Logging: ./mysql-test-run.pl --start-dirty --vardir=/data0/qa_test_data/6004_64bit_rhel5/sysb_var --mysqld=--falcon-consistent-read=off and then starting sysbench
[7 Dec 2007 12:45]
Hakan Küçükyılmaz
Verified Omer's findings. To reproduce the duplicate key error it was important to find out that the initial data size in the test table makes the difference. When we start with 100 rows, then we consistently get the duplicate key error: sysbench \ --test=oltp \ --mysql-user=root \ --oltp-table-size=100000 \ --mysql-table-engine=Falcon \ --mysql-engine-trx=yes \ prepare Run sysbench with 20 connections, like sysbench \ --test=oltp \ --mysql-engine-trx=yes \ --mysql-host=localhost \ --mysql-user=root \ --max-requests=0 \ --max-time=60 \ --num-threads=2 \ --oltp-read-only=off \ --mysql-table-engine=Falcon \ --mysql-engine-trx=yes \ run Observation which leads to the assumption that the initial table size matters: with subsequent runs of sysbench without reloading the table, the probability of getting the duplicate error goes towards zero. After five or six runs we cannot reproduce the duplicate key error anymore. The number of rows are around 2000 - 3000 at that time. Raised to S1 as no workaround available.
[7 Dec 2007 20:34]
Hakan Küçükyılmaz
Sequence which leads to duplicate key error with five connections
Attachment: sysbench_five_connections_sequence.txt (text/plain), 129.77 KiB.
[8 Dec 2007 12:46]
Kevin Lewis
The setting for falcon_consistent_read at startup as set by --falcon-consistent-read=OFF was not being applied when Falcon intitalized. So only SET GLOBAL FALCON_CONSISTENT_READ=OFF was being applied. And that only to new connections.
[10 Dec 2007 15:59]
Hakan Küçükyılmaz
Still failing with duplicate key error. I used latest mysql-6.0-falcon-team tree with falcon_consistent_read = off (set in my.cnf). I checked with "show variables like 'falcon%'" that falcon_consistent_read is set to off. [16:47] root@(none)>show variables like 'falcon%'; +----------------------------------+-----------------+ | Variable_name | Value | +----------------------------------+-----------------+ | falcon_checkpoint_schedule | 7 * * * * * | | falcon_consistent_read | OFF | sysbench --test=oltp --mysql-host=127.0.0.1 --mysql-user=root --oltp-table-size=10 --mysql-table-engine=Falcon --mysql-engine-trx=yes prepare hakan@lu0011:~/work/mysql/mysql-6.0-falcon-team$ sysbench --test=oltp --mysql-engine-trx=yes --mysql-host=localhost --mysql-user=root --max-requests=0 --max-time=600 --num-threads=20 --mysql-table-engine=Falcon run sysbench v0.4.8: multi-threaded system evaluation benchmark No DB drivers specified, using mysql WARNING: Preparing of "BEGIN" is unsupported, using emulation (last message repeated 19 times) Running the test with following options: Number of threads: 20 Doing OLTP test. Running mixed OLTP test Using Special distribution (12 iterations, 1 pct of values are returned in 75 pct cases) Using "BEGIN" for starting transactions Using auto_inc on the id column Threads started! ALERT: failed to execute mysql_stmt_execute(): Err1062 Duplicate entry '4983' for key 'PRIMARY' FATAL: database error, exiting... Done.
[10 Dec 2007 17:50]
Omer Barnir
Running the test with 'falcon_lock_timeout=50' easily exposes the duplicate key issue even with a large table (100,000 rows)
[14 Dec 2007 4:55]
Kevin Lewis
The current Duplicate key error occurs on an Insert. Transaction A had waited on a pending transaction B which committed. During the commit of transaction B, but before making the commit durable in the serial log, the waiting Transaction A is awakened. Then Transaction B takes some time to finish the commit, which involves writing to the serial log. This delay may be because of a group commit, but I am not sure. While Transaction B is finishing the commit, Transaction A does the UPDATE, and DELETE of the new record inserted by Transaction B. But the following insert of transaction A fails with a Duplicate Key error. At this time, Transaction B::state == Committed, so I am not sure yet why the duplicate key error occurs, but Transaction A still sees the older record. Sometime after Transaction A gets the error Transaction B finally returns from the commit. The triage values are; D3 - Medium failure because it is a timming dependent dysfunction W2 - The workaround is to try the insert again within the transaction. It will work a second time, but this workaround is unacceptable in a batch process. I3 - Affects Substantial Minority of Users R4 - High risk since the fix may involve the order of events in a commit. E3 - Medium effort. Probably a small code change, but a major effort to find the exact cause of the conflict.
[15 Dec 2007 5:16]
Kevin Lewis
Ann Harrison wrote; I'm going to quibble with the triage values. D3 seems OK. W2 is too high. Even - especially - in a batch environment, transactions should be written so deadlocks and other errors are detected and managed. I3 is also high. Only test systems are designed to bash on single key values over and over. Real application developers know that such tricks lead to tears before bedtime. Given the amount of time already invested, I'd say E4. The analysis of the bug is interesting and suggests that we might do well to wait until a transaction is write-complete before waking others that are waiting for it. The problem seems to be that the delete is not visible to the code that checks the index while the transaction's changes are in the serial log. There could be some performance impact to waiting for write-complete, and there's probably a more complete solution, but waiting longer might be worth trying in the short run.
[23 Dec 2007 17:13]
Philip Stoev
Hakan, do you recall if, on your original test, sysbench itself was giving out errors such as: ALERT: failed to execute mysql_stmt_execute(): Err1062 Duplicate entry '4978' for key 'PRIMARY' FATAL: database error, exiting... I am running sysbench against 6.0.4, however the error_log does not show any duplicate error messages from Falcon, because in the meantime the Falcon error has been commented out in the code. This leaves only the Sysbench error to judge from, and I would like to know if you got the same sysbench error.
[6 Jan 2008 13:32]
Hakan Küçükyılmaz
Philip, yes, I see the duplicate key error.
[12 Mar 2008 23:02]
Bugs System
Pushed into 6.0.4-alpha
[16 Mar 2009 19:23]
Kevin Lewis
Hakan, can you check to see if this problem still occurs? I see errors in this test in xref, but I do not see any Duplicate key errors. It that does not happen anymore, then we should put this bug to rest and open a new bug for the timeouts or wait states, or whatever is happening intermittently. You might have to make a big-test version.
[21 Apr 2009 13:20]
Hakan Küçükyılmaz
falcon_bug_29151 passes now fine in many runs and the mentioned sysbench 0.4.8 test also passes without duplicate key error. However, there is a new duplicate key error with sysbench v0.5, see Bug#44381 for details.
[15 May 2009 14:14]
MC Brown
Already document in the 6.0.4 changelog.