MySQL Bugs: #85706: Rowid already allocated under heavy load

Bug #85706	Rowid already allocated under heavy load
Submitted:	30 Mar 2017 11:19	Modified:	11 May 2017 5:34
Reporter:	Сергей Кукуев	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	mysql-5.7.17 ndb-7.5.5	OS:	Oracle Linux (6.8 (Santiago) kernel 4.1.12-37.4.1.el6uek.x86_64)
Assigned to:		CPU Architecture:	Any
Tags:	899, Error 899, heavy load, Rowid already allocated

Description:
We have 16 nodes x 48 CPU cluster.
~1000 billions rows in 6 tables summary.

Each data node co-located with business logic application.

When running approx. 64000 ndb requests per second per business logic application we've faced with 'Rowid already allocated' error. 

In the same time we never saw this error
on another production instance with 16 nodes x 32 CPU cluster mysql-5.6.28 ndb-7.4.10 on OEL 6.3 (Santiago) kernel 2.6.39-200.24.1.el6uek.x86_64.

How to repeat:
Run heavy load on mysql-5.7.17 ndb-7.5.5.

mysql-bug-data-85706.tar.gz uploaded to //support/incoming

Additionally we perform tests with less data nodes.
On cluster with 1 or 2 datanodes error had not reproduced.
On 4 and 8 nodes it was reproduced.

And it was reproduced with less data - approx. 30 millions rows in 6 tables summary.

test wich reproduces error

Attachment: main.cpp (text/plain), 15.52 KiB.

DB scheme for test

Attachment: DB.sql (application/octet-stream, text), 1016 bytes.

We reproduced this error within small test.

Run it with 80 threads (command line param)

Files with test and test database attached in prev comment.

No initial tables filling needed for run.

Reproduced
OEL	MySQL cluster gpl	libndbclient
7.2	7.5.5	7.4.12
7.2	7.5.5	7.5.5
7.2	7.4.14	7.4.12
7.2	7.4.10	7.4.12
6.8	7.5.5	7.5.5

Hi Сергей,
I tried your test case and on "normal, small" cluster I was not able to reproduce this. I will retry on a larger system but before that I need to know, since you wrote "We reproduced this error within small test" - do you talk about this small test case but on the big "16 nodes x 48 CPU" cluster or you managed to reproduce this on a smaller cluster as well (one that you said you could not reproduce the problem on)?
best regards
Bogdan

Hi, Bogdan!

We reproduced this bug on 8x32cpu and on 4x32cpu clusters.

On 4 nodes it takes a bit more time than on 8 nodes.

Hi,

And on 16cpu nodes? There you can or cannot reproduce?

thanks
Bogdan

We haven't got 16cpus nodes. So we didn't test it on such configuration.

Hi,

Sorry, my mistake, so you reproduced on 48 and 32CPU boxes both.

This bug is fixed number of times already, but looks under some circumstances it reappears. Our dev team is on it, I'm waiting to see if there's any info they need you to provide them from your system as we ourselves have huge problems reproducing this.

all best
Bogdan

Hi,

I'm running the test now on 192 core's and I'm reproducing it :(

mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64

config.ini:
[mysql@supra10 mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64]$ cat config.ini
[ndbd default]
NoOfReplicas= 2
DataDir= /export/home/mysql/mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/clusterdata
DataMemory = 1024M
IndexMemory = 256M
MaxNoOfConcurrentOperations=500000
MaxNoOfExecutionThreads=32
NoOfFragmentLogParts=32

[ndb_mgmd]
Hostname= localhost
DataDir= /export/home/mysql/mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/clusterdata

[ndbd]
HostName= localhost

[ndbd]
HostName= localhost

[ndbd]
HostName= localhost

[ndbd]
HostName= localhost

[mysqld]
[mysqld]
[mysqld]
[mysqld]
[mysqld]

[mysql@supra10 mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64]$ bin/ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=2    @127.0.0.1  (mysql-5.6.36 ndb-7.4.15, Nodegroup: 0, *)
id=3    @127.0.0.1  (mysql-5.6.36 ndb-7.4.15, Nodegroup: 0)
id=4    @127.0.0.1  (mysql-5.6.36 ndb-7.4.15, Nodegroup: 1)
id=5    @127.0.0.1  (mysql-5.6.36 ndb-7.4.15, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @127.0.0.1  (mysql-5.6.36 ndb-7.4.15)

[mysqld(API)]   5 node(s)
id=6    @127.0.0.1  (mysql-5.6.36 ndb-7.4.15)
id=7    @127.0.0.1  (mysql-5.6.36 ndb-7.4.15)
id=8 (not connected, accepting connect from any host)
id=9 (not connected, accepting connect from any host)
id=10 (not connected, accepting connect from any host)

[mysql@supra10 mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64]$ cat mysql.cnf 
[mysqld]
log-bin
binlog-format = ROW
gtid-mode = ON
enforce-gtid-consistency = ON
log-slave-updates = ON
master-info-repository = TABLE
relay-log-info-repository = TABLE
binlog-checksum = NONE
datadir=/export/home/mysql/mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/mysqldata/
basedir=/export/home/mysql/mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/
socket=/export/home/mysql/mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/mysqldata/mysql.sock
ndbcluster
skip-networking

[mysql@supra10 mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64]$ 

running your test reproduces the problem
g++ -o testcase testcase.c -I mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/include/ -L mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/lib/ -I ./mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/include/storage/ndb/ndbapi -I./mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/include/storage/ndb/ -I ./mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/include/storage/ -lmysqlclient -lndbclient -std=gnu++11

[mysql@supra10 ~]$ LD_LIBRARY_PATH=mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/lib/ ./testcase localhost BS 1000000 80
execute:222: line: 144: Rowid already allocated
execute:222: line: 144: Rowid already allocated
execute:222: line: 144: Rowid already allocated

Setting the bug to verified. Thanks for the test case!

all best
Bogdan