Bug #3659 | >1 concurrent update causes deadlock | ||
---|---|---|---|
Submitted: | 5 May 2004 10:34 | Modified: | 19 Jul 2004 13:39 |
Reporter: | Johan Andersson | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | mysql-4.1.2-alpha bk | OS: | FermiLinux LTS 3.01 |
Assigned to: | Magnus Blåudd | CPU Architecture: | Any |
[5 May 2004 10:34]
Johan Andersson
[7 May 2004 16:01]
Magnus Blåudd
update T38_INTRADAY_1MIN set MT_CRS_CLO=20.93, MT_CRS_HAUT=(GREATEST(MT_CRS_HAUT,20.93)), MT_CRS_BAS=(LEAST(MT_CRS_BAS,20.93)), MT_VOL=(MT_VOL+1472.0) where CD_REFERENTIEL='XPARFR0000127771' and DA_INT='2004-05-04 12:53:00' With the current table schema, this query will start a full table scan of T38_INTRADAY_1MIN, when the record is found an update of that record will be sent to NDB thus setting a write lock on the record. That record lock will then be held until the rest of the records in the table has been scanned, and the the transaction can be committed. During this time(which can be long) it seems like the update transaction has timed out. And thus error 266 is returned. The other UPDATE also starts a scan which will not be able to pass the exclusive lock on the record that the first UPDATE has aquired. And subsequently that scan has timed out with error 296. So this is not really a deadlock problem, but a timeout problem. Solution: 1. Add an ordered index on CD_REFERENTIELL or DA_INT. KEY(CD_REFERENTIELL, DA_INT). This will allow a ordered index scan instead of the full table scan and thus it will take less time, thus allowing the two queries to finish without timeouts. 2. Use latest version. :) In the long run: 1. Add test cases for this where a long running full table scan is used to update only one or a few records in a big table. It should work to do it this way, even if it's slow. 2. Maybe the NdbConnection::refresh function has to be called from handler once in a while, but it would be better if this is done internally by NdbScanOperation if necessary.
[7 May 2004 22:56]
Mathias Herberts
Problem occurred even when both clients were not updating the same records. Problem did not occur with only one client, update was done in no time. As soon as the second client is interacting with the DB (even on differents rows since the test was also done with an additional ID column whose value was different for each client) the UPDATE leads to a 266 or 296 error being thrown.
[3 Jun 2004 17:37]
Mikael Ronström
WL 1822 Flush Locks when being locked in scan to avoid unnecessary deadlocks and WL 1823 Flexible time-out to avoid aborting both deadlocked transactions added for part of this problem Fix for scan time-out applied and pushed already
[8 Jul 2004 14:02]
Mikael Ronström
Fixed by WL 1822, 1823 and two bug fixes
[9 Jul 2004 13:20]
Johan Andersson
It should not be closed (yet). It has been very much improved. Deadlocks are a lot more infrequent, but they can still happen. I have two java applications connected to a mysql server connected to two ndb nodes. Martin has the Java program (Updater.java). The two applications are not updating the same data. The data is differentiated with an application id. Exception3: Deadlock found when trying to get lock; Try restarting transaction, message from server: "Lock wait timeout exceeded; try restarting transaction" stmt: com.mysql.jdbc.PreparedStatement@a470b8: UPDATE johan SET d='kalleankakajsaanka01' WHERE a=17692 AND aid =2 AND tid=1
[9 Jul 2004 13:44]
Johan Andersson
Hmm, I might have been to "submitter"-happy. I willl investigate this further. It is a very rare bug, and not easily reproducable.