Bug #60516 | UPDATE within a function fails and causes infinite loop | ||
---|---|---|---|
Submitted: | 17 Mar 2011 15:26 | Modified: | 24 Nov 2011 11:33 |
Reporter: | John Crowley | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: DML | Severity: | S1 (Critical) |
Version: | 5.5.10 | OS: | Linux (Fedora 14) |
Assigned to: | CPU Architecture: | Any |
[17 Mar 2011 15:26]
John Crowley
[17 Mar 2011 15:31]
John Crowley
getUID function
Attachment: mysql.getuid.txt (text/plain), 1.71 KiB.
[17 Mar 2011 17:10]
Valeriy Kravchuk
Please, check if the problem still happens with a newer version, 5.5.10. If it does, please, send the results of: show create table lclUID\G and exact version of JDBC driver + connection string used.
[17 Mar 2011 19:28]
John Crowley
OK -- have installed and now running under 5.5.10 Downloaded and installed version 5.1.15 of the JDBC connector. Have restarted application -- no error to date, but it does usually take a while before the situation occurs. Will inform if does/does not occur in a couple of hours. Here is the create table output: CREATE TABLE `lclUID` ( `uidStrKey` varchar(32) NOT NULL DEFAULT '' COMMENT 'Unique KEY for this set of UIDs', `uidUidId` bigint(20) unsigned NOT NULL DEFAULT '1' COMMENT 'Next available UID within this KEY', PRIMARY KEY (`uidStrKey`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC COMMENT='Provide independent pools of unique UIDs for given KEYs'
[17 Mar 2011 21:08]
Sveta Smirnova
Thank you for the update. We will wait results of your testing.
[18 Mar 2011 12:11]
John Crowley
So far, the problem has not recurred under 5.5.10 Have marked as Closed.
[18 Apr 2011 21:29]
John Crowley
Same problem occurred again today. Time lapse may be misleading because have not been working on that particular DB for a while. All previous information re the function and problem still holds. Have the last 100 lines of the General log & will attach -- although it really doesn't show much. NOTE: After problems started I added the "INSERT into spcLog ..." statements to attempt to capture some debug information. No rows were actually posted to this table.
[18 Apr 2011 21:30]
John Crowley
Last 100 lines of general log (full log was 1.6G of these lines)
Attachment: tail.SAVE (application/octet-stream, text), 11.96 KiB.
[21 Apr 2011 22:09]
John Crowley
Problem still showing up. One call on getUID(..) goes into this loop, and then all other calls stack up and eventually time out (from the Java code). Something locked internally?? Attempted a fix by putting a 50 millisecond sleep at the end of the loop: select sleep(0.050000) into xxx; but still have the problem. Problem is CRITICAL -- once one process hits the loop, eventually all other clients wind up aborting. Problem is RANDOM -- cannot figure out a test context to actually cause this failure.
[25 Apr 2011 18:20]
Sveta Smirnova
Thank you for the feedback. Which transaction isolation level do you use and which transaction isolation level did you use with 5.0?
[25 Apr 2011 18:22]
John Crowley
Whatever the default is -- don't believe that our code ever changes it, but will double-check.
[25 Apr 2011 19:00]
Sveta Smirnova
Thank you for the feedback. Looks like row was modified after uidUidId has been read first time. Will it help if you add COMMIT right before END LOOP?
[25 Apr 2011 20:43]
John Crowley
Understand your suggestion, but have some concerns: 1) This has been working for 4 years under all previous versions of MySQL, so what changed? 2) AUTOCOMMIT is ON for that database, so a COMMIT statement should not be necessary. 3) The point of the guarded logic is to handled concurrency without requiring a BEGIN TRANSACTION .... COMMIT sequence. 4) Even if doing (3) would work, I feel that we would be covering up a deeper problem that will just show up someplace else in the future. Would really love to understand exactly is happening here. I've put in an attempted workaround of counting the number of times around the loop, bailing out with an error code if the count is exceeded, tearing down the connection, opening a new connection and trying again. Have not yet been able to test this code, but will this week. But again, even if it works I still feel that we are ignoring the basic problem. Thanks, John
[26 Apr 2011 8:30]
Sveta Smirnova
John, thank you for the feedback. I guess we could change how we do transactions inside stored procedures, so you can just see same dataset from first select even if run it second time, then can not update this row since no such row in the table anymore, thus the loop. Adding COMMIT just before END LOOP can easily confirm or reject my guess.
[26 Apr 2011 11:33]
John Crowley
We may not be understanding each other ... The whole point of the loop is to do optimistic concurrency control. If you strip out the logic to create a new key, output log records, etc, it comes down to this: LOOP SELECT uidUidId INTO biVal FROM lclUID WHERE uidStrKey = strKey; UPDATE lclUID SET uidUidId = (uidUidId + iQty) WHERE uidStrKey = strKey <--- guarded WHERE to make sure no one else AND uidUidID = biVal; <--- has updated the same row concurrently SET rowCount = ROW_COUNT(); IF rowCount=1 THEN RETURN biVal; END IF; END LOOP ; Note that uidStrKey is UNIQUE, so the idea is: a) SELECT the row for 'strKey' to get the current value of uidUidID b) Do a guarded UPDATE testing that 'uidUidID' was not changed by another user c) If the UPDATE succeeds (rowCount=1), then exit. Note that this UPDATE should have AUTOCOMMITed at this point. d) If NO rows are updated (rowCount=0), then another user did a concurrent update so loop back to (a) to SELECT the new value of uidUidID and try again. I'll try a COMMIT, but still don't understand why it should be necessary (and it wasn't needed in previous MySQL versions). Does an UPDATE within a procedure/function have different COMMIT logic than an UPDATE issued directly? Can you point me at any discussion of this in the manual? Thanks
[26 Apr 2011 12:21]
John Crowley
Have a theory about what might be causing this problem .... A cached row which is not correctly invalidated could lead to exactly the behavior actually observed (referring to the code in the previous post): 1) Thread 1 does the SELECT to read the current value of uidUidID (assume value=38) 2) This row is cached for this thread. 3) Thread 2 also executes and succeeds, updating uidUidID to 39 in the DB 4) BUT the cached row for thread 1 is not invalidated 5) Thread 1 now issues the UPDATE, which updates zero rows since uidUidID in the DB now has the value 39 6) Thread 1 now loops back to step (1) to read the new value of uidUidID 7) BUT the SELECT in step (1) reads the value from the cached row, and returns 38 again. This loop will never end. .... as I say, just a theory.
[26 Apr 2011 13:12]
Sveta Smirnova
Thank you for the feedback. > Have a theory about what might be causing this problem .... .... This is what I want to check with COMMIT right before END LOOP. Please also send us output of SHOW VARIABLES LIKE '%cache%': I want to check if you enabled query cache also.
[26 Apr 2011 13:41]
John Crowley
Will not compile with the COMMIT Error Code : 1422 Explicit or implicit commit is not allowed in stored function or trigger. BTW: My workaround was triggered and appears to be successful. If we go around the LOOP > 16 times it returns -99 to the caller. The calling Java code then tears down the connection, establishes a new connection, and tries again. The next try was successful. But this is still a very poor solution.
[26 Apr 2011 17:53]
Sveta Smirnova
Thank you for the feedback. I can not repeat described behavior: SP reads inserted value in my case. Please send us output of show variables like '%cache%'; and log biVal right after select, so we know which value it reads in each iteration. Something like here: DECLARE iteration INT; SET iteration=1; ... LOOP SELECT uidUidId INTO biVal FROM lclUID WHERE uidStrKey = strKey; SELECT CONCAT("Iteration: ", iteration, ", biVal: ", biVal) INTO logMsg; INSERT INTO spcLog(logMessage) VALUES(logMsg); ... SET iteration=iteration+1; ...
[26 Apr 2011 21:21]
John Crowley
Understand that you can't reproduce it -- the problem only shows up on rare occasions. I've never been able to trigger it by entering command-line queries. Re logging: The original version of the function did attempt to insert a log message every time around LOOP, but these were never posted (don't know why). So I think we're stuck on that. But what I can do is save the value of uidUidID the FIRST time the select is called, then compare it to future calls to see if it has in fact changed, and finally return some negative value to the Java code that I can log. BTW: It would be really helpful if there was some sort of LOG or TRACE statement in MySQL that posted a message directly to the Error log or General log. Is there any such thing??? If not, can I make that a suggestion for a future version? Here is the result of SHOW VARIABLES LIKE '%cache%' +------------------------------+------------+ | Variable_name | Value | +------------------------------+------------+ | binlog_cache_size | 32768 | | have_query_cache | YES | | key_cache_age_threshold | 300 | | key_cache_block_size | 1024 | | key_cache_division_limit | 100 | | max_binlog_cache_size | 4294963200 | | query_cache_limit | 1048576 | | query_cache_min_res_unit | 4096 | | query_cache_size | 0 | | query_cache_type | ON | | query_cache_wlock_invalidate | OFF | | table_definition_cache | 256 | | table_open_cache | 64 | | thread_cache_size | 0 | +------------------------------+------------+ Best, John
[26 Apr 2011 23:31]
Sveta Smirnova
Thank you for the feedback. > But what I can do is save the value of uidUidID the FIRST time the select is called, then compare it to future calls to see if it has in fact changed, and finally return some negative value to the Java code that I can log. Yes, this would be helpful as we at least would know if we read same value or not. > It would be really helpful if there was some sort of LOG or TRACE statement in MySQL that posted a message directly to the Error log or General log. Is there any such thing??? If not, can I make that a suggestion for a future version? I think currently is only way to do it is to write UDF or plugin which will log messages into error log or to separate log. If you want this feature to be added to the server feel free to fill feature request.
[26 Apr 2011 23:42]
John Crowley
getUID function w debug statements
Attachment: getUID.txt (text/plain), 3.30 KiB.
[26 Apr 2011 23:43]
John Crowley
Java code log messages after getUID() fails
Attachment: getUID.fail.txt (text/plain), 10.80 KiB.
[27 Apr 2011 0:06]
John Crowley
OK, more info. Instrumented the getUID() function so that the FIRST time through the loop it saves the value of 'biVal' (the value read from the DB). Then it tries to do the SELECT .... UPDATE logic 16 times. If all fail, it returns a negative value which consists of the original value of 'biVal' (* 1B) and the CURRENT value of 'biVal' (in the low-order portion). (This version of getUID() uploaded.) On the Java code side, if we get a negative value back it logs a message showing the original and current values of 'biVal'. Then it will also try 16 times to tear down the connection, open a new connection, retry the operation. The log messages are in the getuid.fail.txt uploaded file. Here's the relevant portion of the first 3 lines: 'GetUidTest' retry count = 0 ORIGINAL biVal = 6 CURRENT = 6 'GetUidTest' retry count = 1 ORIGINAL biVal = 7 CURRENT = 7 'GetUidTest' retry count = 2 ORIGINAL biVal = 8 CURRENT = 8 The RETRY COUNT indicates that this is the 0th (first), 2nd, and 3rd retry attempt by the Java code (for this particular application). Then it succeeded (otherwise there would be retry counts up to 16). Key point: The Original value read from the DB matches the CURRENT value we've (supposedly) just re-read from the database. However, if that were true then the UPDATE should have succeeded. ... and remember that we've done the SELECT .... UPDATE sequence 16 times before we give up and RETURN. I'm still leaning toward a row in the cache that did not get invalidated when some other thread updated the DB table. So in this case, the first time in we read a 6 from the DB, then thread 2 updates the DB to 7, so our UPDATE fails, but when we re-issue the SELECT we get the cached row which still has the value 6, so the UPDATE fails, etc. When we give up and return to the Java code, a new connection is established and we try again from scratch. This time reading a 7, but then the same thing happens -- some other thread updates the DB to 8, we re-read the cached value, etc. Note that after three tries it does succeed -- this thread managed to issue both the SELECT and UPDATE before any other thread updated the DB. Does this help? (BTW: Please sanity check the code in this version of getUID() to make sure that I didn't do something stupid.) Thanks, John
[28 Apr 2011 9:10]
Øystein Grøvlen
Note that even if autocommit is on for the session that is calling a function, all statements within the function will execute within a single transaction. Hence, I do not think your "optimistic concurrency control" will work with Innodb. Look at this example where I execute the statements from your function with autocommit off: ============================================================ First, in Connection 1: mysql> select * from b; +------+-------+ | id | value | +------+-------+ | 1 | 7 | +------+-------+ 1 row in set (0.00 sec) Then, switch to Connection 2: mysql> select * from b; +------+-------+ | id | value | +------+-------+ | 1 | 7 | +------+-------+ 1 row in set (0.00 sec) mysql> update b set value='8' where id=1 and value='7'; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> select row_count(); +-------------+ | row_count() | +-------------+ | 1 | +-------------+ 1 row in set (0.00 sec) mysql> commit; Query OK, 0 rows affected (0.00 sec) Then back to connection 1: mysql> update b set value='8' where id=1 and value='7'; Query OK, 0 rows affected (0.00 sec) Rows matched: 0 Changed: 0 Warnings: 0 mysql> select row_count(); +-------------+ | row_count() | +-------------+ | 0 | +-------------+ 1 row in set (0.00 sec) mysql> select * from b; +------+-------+ | id | value | +------+-------+ | 1 | 7 | +------+-------+ 1 row in set (0.00 sec) ================================================================= And connection 1 will, due to the multiversion concurrency control used by InnoDB, continue to see the same value and not succeed to do any updates until the function completes and the statement calling the function has completed. You say that your problem does not appear on 5.0. Did you use InnoDB
[28 Apr 2011 14:01]
John Crowley
Wow, I think you've surrounded the problem, and the issue is the isolation level. Just checked and it is set to REPEATABLE-READ, which would explain the SELECT behavior. Will re-test after changing this to READ-COMMITTED and report results. Checked on the 5.0 version of the database, and it also uses InnoDB and has a default of REPEATABLE-READ -- but has been working without problems for 4 years, so something changed. BTW: Doesn't this also imply that if you use connection pooling you need to make sure that you don't have a REPEATABLE-READ isolation level or that connection would always see whatever values happened to exist the first time a SELECT was issued? Thanks for you help. John
[28 Apr 2011 20:31]
John Crowley
getUID as a PROCEDURE instead of FUNCTION
Attachment: getUIDProc.txt (text/plain), 3.37 KiB.
[28 Apr 2011 20:40]
John Crowley
Set default isolation level to READ-COMMITTED (globally for the DB). getUID(..) as a FUNCTION failed as before. Rewrote it as getUIDProc(...) with an explicit COMMIT if the UPDATE returns a row-count of zero but this also FAILs just as before. (This one really puzzled me! Code for this version uploaded.) Finally, went back to the FUNCTION version and added a FOR UPDATE to the initial SELECT and this appears to work. Still think that there is something wrong somewhere, and hate to set an explicit lock every time for a situation which should occur very rarely, but also need to move on to other tasks. Feel free to Close this unless you want to dig into it further. Thanks for the help, John
[24 Nov 2011 11:33]
MySQL Verification Team
Thank you for the feedback. Closing since Sveta wasn't able to repeat.