MySQL Bugs: #66866: Still getting 899 "rowid already allocated" errors with cluster

Bug #66866	Still getting 899 "rowid already allocated" errors with cluster
Submitted:	19 Sep 2012 8:25	Modified:	28 Jun 2016 16:20
Reporter:	Hartmut Holzgraefe	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	7.2.x	OS:	Linux
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
Looks as if the fix for bug #56051 did not solve all occurrences of error 899 "rowid already allocated" with cluster.

The error can still be hit when dealing with highly fragmented tables that get both a lot of INSERTs and DELETEs over time.

Recreating the table on a regular basis to defragment it, e.g. with

  ALTER TABLE table_name ENGINE=ndb;

can be used as a workaround.

The problem *may* be related to node restarts ...

How to repeat:
Do a lot of INSERTs and DELETEs on a busy table, maybe add node restarts to the mix ...

Hi,

it still exists in 7.3.7 and was solved, using Hartmut's workaround.

We are operating a 10 node cluster on Solaris 10

Steps to get the problem:
- shut down th cluster from MGM.
- started half of the boxes, using --nowait-nodes=3,5,7,9,11
- then started the other nodes with 'all start'

We had to do this, because 3,5,7,9,11 have been offline for about 3 days on purpose, and I did not want to risk that theses nodes start faster than the other nodes... no idea, what would hav happened with the data :-/

Certainly, our application retries the transaction in such cases, but it did not help help at all (64 MySQLd with 3 conn). 

The only solution was given by Hartmut.

thanks!
Stefan

one more comment:

It is stated somewhere, that it might happen because of the optimized row id fetching.

While analyzing this, we temp set up a mysqld, using 
ndb-autoincrement-prefetch-sz=1
We tried 100 inserts, where lots of them failed with 899.
Cluster seemd to handle the row proposals, as we had something like the following rows inserted on the particular mysqld (made them up for display... actual numbers have been in the 100th of millions):

1
2
3
error
5
6
7
error
9
error
11
10252
10253
10254
10255
10256

All thes ros have benn inserted by my client and I assumed, that 10 other nodes have been given 1024 rows to handle them locally, in the meantime.

My first asumption on this hole trouble was, that cluster might offer overlapping row id bundles to the sql nodes, but we tried to figure, which sql node has inserted the 4,8,10 to see, whether the all come from a certain sql node... 

But: here have been no datasets with 4,8,or 10...

So we think, it is not really the correct error for some strange behaviour.

Hope this helps.

Stefan

I can reproduce this on 7.2.12 (need a really fragmented table and some restarting nodes) but I can't reproduce this on 7.2.23 nor on 7.4.10

kind regards
Bogdan Kecman