MySQL Bugs: #56051: Got temporary error 899 'Rowid already allocated' from NDBCLUSTER during INSERT

Bug #56051	Got temporary error 899 'Rowid already allocated' from NDBCLUSTER during INSERT
Submitted:	17 Aug 2010 15:53	Modified:	9 May 2011 7:43
Reporter:	Richard McCluskey	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	mysql-5.1-telco-7.1	OS:	Linux (Centos 5.4)
Assigned to:	Jonas Oreland	CPU Architecture:	Any
Tags:	mysql-5.1.44 ndb-7.1.3

Description:
$ have 6 front end servers that are INSERTing data records, via tomcat/Java into the Cluster. Under heavy load our logs are getting filled with the errors below. Got temporary error 899 'Rowid already allocated' from NDBCLUSTER. These are not UPDATES, they are INSERTS. This is causing us no end of problems. Do you have a suggestion?

Unfortunaltely everyone of these transactions is lost money for us, as they are records of displays for which we are paid. :(

adtracker.log:java.sql.SQLException: Got temporary error 899 'Rowid already allocated' from NDBCLUSTER
adtracker.log:Caused by: javax.jdo.JDODataStoreException: Insert of object "com.go2.adtracker.db.model.ImpressionsEvent@32598774" using statement "INSERT INTO `IMPRESSIONS` (`RESP_CODE`,`RESP_FORM_FACTOR`,`RESP_CONTENT_SCOPE`,`SVR_IP`,`USR_ETHNICITY`,`CONTENT_ID`,`USR_LOC`,`REQ_REF`,`REC_TIME`,`REQ_TS`,`REQ_LAT`,`REQ_LON`,`REQ_SWIDTH`,`RESP_CONTENT_SRC`,`USR_UA`,`IMPRESSION_ID`,`REQ_CID`,`USR_GENDER`,`REQ_SID`,`REC_DATE`,`REQ_IP`,`RESP_CALL_TO_ACTION`,`USR_YOB`,`RESP_CREATIVE_ID`,`USR_TARGET`,`REQ_FMT`,`REQ_PARTNER_ID`,`REQ_VER`) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)" failed : Got temporary error 899 'Rowid already allocated' from NDBCLUSTER

How to repeat:
under load

I uploaded the tarball bug-data-56051.tar.bz2 to /pub/mysql/upload/ on your FTP site. IT contains the data from the ndb_error_reporter run.

Temporary error against database - the transaction has to be redone from the application layer. The same can happen if e.g. a single node goes down, so the application redo is needed anyway.

If you are retrying in the application code and get this in every retry, then it is a bug.

Sorry for the short description in the last comment.

Databases (especially distributed databases) will sometimes give transient errors (see e.g. java.sql.SQLTransientException), and the client application has to be written in such a way that the complete transaction is redone in case one gets one of these exceptions. This is not MySQL Cluster specific.

The error message you get can occur because of an optimization to allocate row ids for quick bulk inserts. A client application that did redo on transient errors would redo the insert, and we believe that the inserts would succeed.

So, the question we have is: does your application redo (client side) transactions on transient exceptions?

Same issue...
In need of a resolution? NDB --initial is a costly solution.

pushed to 7.0.25 and 7.1.1.14

Documented bugfix in the NDB-7.0.25 and 7.1.14 changelogs, as follows:

        Under heavy loads with many concurrent inserts, temporary
        failures in transactions could be misreported as being due to
        NDB Error 899 -Rowid already allocated-. Such failures are now
        reported correctly as being temporary errors with transactions,
        which should be retried by the application. In addition, NDB
        Error 899 has been re-classified as an internal error, rather
        than as a temporary transaction error.

Also updated error codes tables in API docs.

Closed.

I'm having the same problem with mysql-5.1.56 and ndb-7.1.19. When 2 nodes of the cluster are operational randomly the application is receiving the 899 error message. 

The 2 nodes in the configuration are identical, they are running the exact same versions of the mysql ga cluster (mysql-cluster-gpl-7.1.9-linux-x86_64-glibc23.tar.gz)

I forgot to mention the cluster is running with no load at all.

i don't know the upgrade history of you installation...
but if I were you...I would try to to backup/initial start/restore
to see if problem persists...

maybe you already have...
/Jonas

Problems started after a rolling upgrade from older version to the 7.1.9. The LCP compression is off by default, that most of the people are saying its a possible cause. One thing that is bugging me is if this is not actual an error of the cluster and its on the connector.

This issue still persists. We just got the same error messages in MySQL Cluster (5.5.19-ndb-7.2.4-enterprise-commercial-advanced-log) after a cluster crash and restart. Changing auto_increment counters on tables didn't help.