MySQL Bugs: #60516: UPDATE within a function fails and causes infinite loop

Bug #60516	UPDATE within a function fails and causes infinite loop
Submitted:	17 Mar 2011 15:26	Modified:	24 Nov 2011 11:33
Reporter:	John Crowley	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: DML	Severity:	S1 (Critical)
Version:	5.5.10	OS:	Linux (Fedora 14)
Assigned to:		CPU Architecture:	Any

Description:
Have a table (lclUID) with 2 columns:  uidStrKey varchar(32) not null, uidUidID bigint not null.   uidStrKey is defined as the PK.

Used as a 'next number' basis to issue a sequential number for a given key -- e.g. key = "customer"

Have a function (attached), getUID(key, qty) which will assign and return the next number to the caller (qty just allows caller to request multiple sequential unique numbers).  This function handles potential concurrent calls by doing a SELECT to read the current 'next number' value for the given key, and then doing a guarded UPDATE for the next number value with a WHERE clause to ensure that no other process has updated the same key in the interim:

  UPDATE lclUID SET uidUidId = (uidUidId + iQty)
   WHERE uidStrKey = strKey
     AND uidUidID = biVal;        (biVal is the original value)

If the ROW_COUNT() == 1, then the update succeeded and RETURN.  Otherwise loop around to re-issue the SELECT to get the new next-number value and retry.

This has worked without problem for 3+ years on version 5.0   Now it SOMETIMES goes into an infinite loop and never returns.  Changes in environment: a) using version 5.5.8,  b) binary logging is ON,  c) this database is a slave in a replication setup (but this table is NOT replicated).  Access to the function is via a Java application thru MySql connector version 5.0.7

Using SQLyog, listing processes, and doing Refresh, you can see the function executing through the loop.  No update is ever successful (the value in the table does not change).  Also, the function as attached has some INSERTs to a log table -- these also never show up (but I have to kill the process in order to get out of the loop, so they may not have been committed).

How to repeat:
Unfortunately, the function usually works OK.  Have not been able to force a failure scenario.

Setup the database as above.  
Set on the log_bin_trust_function_creators flag.
Strip out the INSERTs to the spcLog table (I would have done this, but wanted to send you the code exactly as compiled).
Compile the getUID function (attached).
Start making calls (from separate processes if possible).

Note:  Have not been able to cause the problem issuing 'select getUID('customer', 3)' from a command line or query window.  It seems to show up when accessed through the Java connector -- but would think that the loop behavior should be internal to MySQL.  Also have many more invocations through the Java app than my issuing manual requests, and the timing will obviously be different.

getUID function

Attachment: mysql.getuid.txt (text/plain), 1.71 KiB.

Please, check if the problem still happens with a newer version, 5.5.10. If it does, please, send the results of:

show create table lclUID\G

and exact version of JDBC driver + connection string used.

OK -- have installed and now running under 5.5.10

Downloaded and installed version 5.1.15 of the JDBC connector.

Have restarted application -- no error to date, but it does usually take a while before the situation occurs.  Will inform if does/does not occur in a couple of hours.

Here is the create table output:

CREATE TABLE `lclUID` (
  `uidStrKey` varchar(32) NOT NULL DEFAULT '' COMMENT 'Unique KEY for this set of UIDs',
  `uidUidId` bigint(20) unsigned NOT NULL DEFAULT '1' COMMENT 'Next available UID within this KEY',
  PRIMARY KEY (`uidStrKey`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC COMMENT='Provide independent pools of unique UIDs for given KEYs'

Thank you for the update.

We will wait results of your testing.

So far, the problem has not recurred under 5.5.10

Have marked as Closed.

Same problem occurred again today.  Time lapse may be misleading because have not been working on that particular DB for a while.

All previous information re the function and problem still holds.

Have the last 100 lines of the General log & will attach -- although it really doesn't show much.

NOTE:  After problems started I added the "INSERT into spcLog ..." statements to attempt to capture some debug information.  No rows were actually posted to this table.

Last 100 lines of general log (full log was 1.6G of these lines)

Attachment: tail.SAVE (application/octet-stream, text), 11.96 KiB.

Problem still showing up.  One call on getUID(..) goes into this loop, and then all other calls stack up and eventually time out (from the Java code).  Something locked internally??

Attempted a fix by putting a 50 millisecond sleep at the end of the loop:

select sleep(0.050000) into xxx;

but still have the problem.

Problem is CRITICAL -- once one process hits the loop, eventually all other clients wind up aborting.  Problem is RANDOM -- cannot figure out a test context to actually cause this failure.

Thank you for the feedback.

Which transaction isolation level do you use and which transaction isolation level did you use with 5.0?

Whatever the default is -- don't believe that our code ever changes it, but will double-check.

Thank you for the feedback.

Looks like row was modified after uidUidId has been read first time. Will it help if you add COMMIT right before END LOOP?

Understand your suggestion, but have some concerns:

1) This has been working for 4 years under all previous versions of MySQL, so what changed?

2) AUTOCOMMIT is ON for that database, so a COMMIT statement should not be necessary.

3) The point of the guarded logic is to handled concurrency without requiring a BEGIN TRANSACTION .... COMMIT sequence.

4) Even if doing (3) would work, I feel that we would be covering up a deeper problem that will just show up someplace else in the future.  Would really love to understand exactly is happening here.

I've put in an attempted workaround of counting the number of times around the loop, bailing out with an error code if the count is exceeded, tearing down the connection, opening a new connection and trying again.  Have not yet been able to test this code, but will this week.  But again, even if it works I still feel that we are ignoring the basic problem.

Thanks,
John

John,

thank you for the feedback. I guess we could change how we do transactions inside stored procedures, so you can just see same dataset from first select even if run it second time, then can not update this row since no such row in the table anymore, thus the loop. Adding COMMIT just before END LOOP can easily confirm or reject my guess.

We may not be understanding each other ...

The whole point of the loop is to do optimistic concurrency control.  If you strip out the logic to create a new key, output log records, etc, it comes down to this:

  LOOP
    SELECT uidUidId INTO biVal FROM lclUID WHERE uidStrKey = strKey;
    UPDATE lclUID SET uidUidId = (uidUidId + iQty)
     WHERE uidStrKey = strKey     <--- guarded WHERE to make sure no one else
       AND uidUidID = biVal;      <--- has updated the same row concurrently
    SET rowCount = ROW_COUNT();
    IF rowCount=1 THEN
      RETURN biVal;
    END IF;
  END LOOP ;

Note that uidStrKey is UNIQUE, so the idea is:
a) SELECT the row for 'strKey' to get the current value of uidUidID
b) Do a guarded UPDATE testing that 'uidUidID' was not changed by another user
c) If the UPDATE succeeds (rowCount=1), then exit.  Note that this UPDATE should have AUTOCOMMITed at this point.
d) If NO rows are updated (rowCount=0), then another user did a concurrent update so loop back to (a) to SELECT the new value of uidUidID and try again.

I'll try a COMMIT, but still don't understand why it should be necessary (and it wasn't needed in previous MySQL versions).

Does an UPDATE within a procedure/function have different COMMIT logic than an UPDATE issued directly?  Can you point me at any discussion of this in the manual?

Thanks

Have a theory about what might be causing this problem ....

A cached row which is not correctly invalidated could lead to exactly the behavior actually observed (referring to the code in the previous post):

1) Thread 1 does the SELECT to read the current value of uidUidID (assume value=38)
2) This row is cached for this thread.
3) Thread 2 also executes and succeeds, updating uidUidID to 39 in the DB
4) BUT the cached row for thread 1 is not invalidated
5) Thread 1 now issues the UPDATE, which updates zero rows since uidUidID in the DB now has the value 39
6) Thread 1 now loops back to step (1) to read the new value of uidUidID
7) BUT the SELECT in step (1) reads the value from the cached row, and returns 38 again.

This loop will never end.

.... as I say, just a theory.

Thank you for the feedback.

> Have a theory about what might be causing this problem ....
....

This is what I want to check with COMMIT right before END LOOP.

Please also send us output of SHOW VARIABLES LIKE '%cache%': I want to check if you enabled query cache also.

Will not compile with the COMMIT

Error Code : 1422
Explicit or implicit commit is not allowed in stored function or trigger.

BTW:  My workaround was triggered and appears to be successful.  If we go around the LOOP > 16 times it returns -99 to the caller.  The calling Java code then tears down the connection, establishes a new connection, and tries again.  The next try was successful.

But this is still a very poor solution.

Thank you for the feedback.

I can not repeat described behavior: SP reads inserted value in my case. Please send us output of show variables like '%cache%'; and log biVal right after select, so we know which value it reads in each iteration.

Something like here:

DECLARE iteration INT;
SET iteration=1;
...
 LOOP
    SELECT uidUidId INTO biVal FROM lclUID WHERE uidStrKey = strKey;
     SELECT CONCAT("Iteration: ", iteration, ", biVal: ", biVal) INTO logMsg;
      INSERT INTO spcLog(logMessage) VALUES(logMsg);
...
SET iteration=iteration+1;
...

Understand that you can't reproduce it -- the problem only shows up on rare occasions.  I've never been able to trigger it by entering command-line queries.

Re logging:  The original version of the function did attempt to insert a log message every time around LOOP, but these were never posted (don't know why).  So I think we're stuck on that.  But what I can do is save the value of uidUidID the FIRST time the select is called, then compare it to future calls to see if it has in fact changed, and finally return some negative value to the Java code that I can log.

BTW:  It would be really helpful if there was some sort of LOG or TRACE statement in MySQL that posted a message directly to the Error log or General log.  Is there any such thing???  If not, can I make that a suggestion for a future version?

Here is the result of SHOW VARIABLES LIKE '%cache%' 

+------------------------------+------------+
| Variable_name                | Value      |
+------------------------------+------------+
| binlog_cache_size            | 32768      |
| have_query_cache             | YES        |
| key_cache_age_threshold      | 300        |
| key_cache_block_size         | 1024       |
| key_cache_division_limit     | 100        |
| max_binlog_cache_size        | 4294963200 |
| query_cache_limit            | 1048576    |
| query_cache_min_res_unit     | 4096       |
| query_cache_size             | 0          |
| query_cache_type             | ON         |
| query_cache_wlock_invalidate | OFF        |
| table_definition_cache       | 256        |
| table_open_cache             | 64         |
| thread_cache_size            | 0          |
+------------------------------+------------+

Best,
John

Thank you for the feedback.

> But what I can do is save
the value of uidUidID the FIRST time the select is called, then compare
it to future calls to see if it has in fact changed, and finally return
some negative value to the Java code that I can log.

Yes, this would be helpful as we at least would know if we read same value or not.

> It would be really helpful if there was some sort of LOG or TRACE
statement in MySQL that posted a message directly to the Error log or
General log.  Is there any such thing???  If not, can I make that a
suggestion for a future version?

I think currently is only way to do it is to write UDF or plugin which will log messages into error log or to separate log. If you want this feature to be added to the server feel free to fill feature request.

getUID function w debug statements

Attachment: getUID.txt (text/plain), 3.30 KiB.

Java code log messages after getUID() fails

Attachment: getUID.fail.txt (text/plain), 10.80 KiB.

OK, more info.

Instrumented the getUID() function so that the FIRST time through the loop it saves the value of 'biVal' (the value read from the DB).  Then it tries to do the SELECT .... UPDATE logic 16 times.  If all fail, it returns a negative value which consists of the original value of 'biVal' (* 1B) and the CURRENT value of 'biVal' (in the low-order portion).  (This version of getUID() uploaded.)

On the Java code side, if we get a negative value back it logs a message showing the original and current values of 'biVal'.  Then it will also try 16 times to tear down the connection, open a new connection, retry the operation.  The log messages are in the getuid.fail.txt uploaded file.

Here's the relevant portion of the first 3 lines:

'GetUidTest' retry count = 0 ORIGINAL biVal = 6 CURRENT = 6
'GetUidTest' retry count = 1 ORIGINAL biVal = 7 CURRENT = 7 
'GetUidTest' retry count = 2 ORIGINAL biVal = 8 CURRENT = 8

The RETRY COUNT indicates that this is the 0th (first), 2nd, and 3rd retry attempt by the Java code (for this particular application).  Then it succeeded (otherwise there would be retry counts up to 16).

Key point:  The Original value read from the DB matches the CURRENT value we've (supposedly) just re-read from the database.  However, if that were true then the UPDATE should have succeeded.  ...  and remember that we've done the SELECT .... UPDATE sequence 16 times before we give up and RETURN.

I'm still leaning toward a row in the cache that did not get invalidated when some other thread updated the DB table.  So in this case, the first time in we read a 6 from the DB, then thread 2 updates the DB to 7, so our UPDATE fails, but when we re-issue the SELECT we get the cached row which still has the value 6, so the UPDATE fails, etc.

When we give up and return to the Java code, a new connection is established and we try again from scratch.  This time reading a 7, but then the same thing happens -- some other thread updates the DB to 8, we re-read the cached value, etc.

Note that after three tries it does succeed -- this thread managed to issue both the SELECT and UPDATE before any other thread updated the DB.

Does this help?  (BTW:  Please sanity check the code in this version of getUID() to make sure that I didn't do something stupid.)

Thanks,
John

Note that even if autocommit is on for the session that is calling a function, all statements within the function will execute within a single transaction.

Hence, I do not think your "optimistic concurrency control" will work with Innodb.  Look at this example where I execute the statements from your function with autocommit off:

============================================================
First, in Connection 1:

mysql> select * from b;
+------+-------+
| id   | value |
+------+-------+
|    1 | 7     |
+------+-------+
1 row in set (0.00 sec)

Then, switch to Connection 2:

mysql> select * from b;
+------+-------+
| id   | value |
+------+-------+
|    1 | 7     |
+------+-------+
1 row in set (0.00 sec)

mysql> update b set value='8' where id=1 and value='7';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select row_count();
+-------------+
| row_count() |
+-------------+
|           1 |
+-------------+
1 row in set (0.00 sec)

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

Then back to connection 1:
mysql> update b set value='8' where id=1 and value='7';
Query OK, 0 rows affected (0.00 sec)
Rows matched: 0  Changed: 0  Warnings: 0

mysql> select row_count();
+-------------+
| row_count() |
+-------------+
|           0 |
+-------------+
1 row in set (0.00 sec)

mysql> select * from b;
+------+-------+
| id   | value |
+------+-------+
|    1 | 7     |
+------+-------+
1 row in set (0.00 sec)

=================================================================

And connection 1 will, due to the multiversion concurrency control used by InnoDB, continue to see the same value and not succeed to do any updates until the function completes and the statement calling the function has completed.

You say that your problem does not appear on 5.0.  Did you use InnoDB

Wow, I think you've surrounded the problem, and the issue is the isolation level.  Just checked and it is set to REPEATABLE-READ, which would explain the SELECT behavior.  Will re-test after changing this to READ-COMMITTED and report results.

Checked on the 5.0 version of the database, and it also uses InnoDB and has a default of REPEATABLE-READ -- but has been working without problems for 4 years, so something changed.

BTW:  Doesn't this also imply that if you use connection pooling you need to make sure that you don't have a REPEATABLE-READ isolation level or that connection would always see whatever values happened to exist the first time a SELECT was issued?

Thanks for you help.

John

getUID as a PROCEDURE instead of FUNCTION

Attachment: getUIDProc.txt (text/plain), 3.37 KiB.

Set default isolation level to READ-COMMITTED (globally for the DB).

getUID(..) as a FUNCTION failed as before.  Rewrote it as getUIDProc(...) with an explicit COMMIT if the UPDATE returns a row-count of zero but this also FAILs just as before.  (This one really puzzled me!  Code for this version uploaded.)

Finally, went back to the FUNCTION version and added a FOR UPDATE to the initial SELECT and this appears to work.

Still think that there is something wrong somewhere, and hate to set an explicit lock every time for a situation which should occur very rarely, but also need to move on to other tasks.

Feel free to Close this unless you want to dig into it further.

Thanks for the help,
John

Thank you for the feedback. Closing since Sveta wasn't able to repeat.