Bug #85706 Rowid already allocated under heavy load
Submitted: 30 Mar 2017 11:19 Modified: 11 May 2017 5:34
Reporter: Сергей Кукуев Email Updates:
Status: Verified Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.7.17 ndb-7.5.5 OS:Oracle Linux (6.8 (Santiago) kernel 4.1.12-37.4.1.el6uek.x86_64)
Assigned to: CPU Architecture:Any
Tags: 899, Error 899, heavy load, Rowid already allocated

[30 Mar 2017 11:19] Сергей Кукуев
We have 16 nodes x 48 CPU cluster.
~1000 billions rows in 6 tables summary.

Each data node co-located with business logic application.

When running approx. 64000 ndb requests per second per business logic application we've faced with 'Rowid already allocated' error. 

In the same time we never saw this error
on another production instance with 16 nodes x 32 CPU cluster mysql-5.6.28 ndb-7.4.10 on OEL 6.3 (Santiago) kernel 2.6.39-200.24.1.el6uek.x86_64.

How to repeat:
Run heavy load on mysql-5.7.17 ndb-7.5.5.
[30 Mar 2017 11:24] Сергей Кукуев
mysql-bug-data-85706.tar.gz uploaded to //support/incoming
[30 Mar 2017 13:22] Сергей Кукуев
Additionally we perform tests with less data nodes.
On cluster with 1 or 2 datanodes error had not reproduced.
On 4 and 8 nodes it was reproduced.

And it was reproduced with less data - approx. 30 millions rows in 6 tables summary.
[31 Mar 2017 15:30] Сергей Кукуев
test wich reproduces error

Attachment: main.cpp (text/plain), 15.52 KiB.

[31 Mar 2017 15:33] Сергей Кукуев
DB scheme for test

Attachment: DB.sql (application/octet-stream, text), 1016 bytes.

[31 Mar 2017 15:36] Сергей Кукуев
We reproduced this error within small test.

Run it with 80 threads (command line param)

Files with test and test database attached in prev comment.

No initial tables filling needed for run.
[5 Apr 2017 11:43] Michael Prokopiv
OEL	MySQL cluster gpl	libndbclient
7.2	7.5.5	7.4.12
7.2	7.5.5	7.5.5
7.2	7.4.14	7.4.12
7.2	7.4.10	7.4.12
6.8	7.5.5	7.5.5
[7 Apr 2017 3:42] MySQL Verification Team
Hi Сергей,
I tried your test case and on "normal, small" cluster I was not able to reproduce this. I will retry on a larger system but before that I need to know, since you wrote "We reproduced this error within small test" - do you talk about this small test case but on the big "16 nodes x 48 CPU" cluster or you managed to reproduce this on a smaller cluster as well (one that you said you could not reproduce the problem on)?
best regards
[7 Apr 2017 7:29] Сергей Кукуев
Hi, Bogdan!

We reproduced this bug on 8x32cpu and on 4x32cpu clusters.

On 4 nodes it takes a bit more time than on 8 nodes.
[7 Apr 2017 7:36] MySQL Verification Team

And on 16cpu nodes? There you can or cannot reproduce?

[7 Apr 2017 7:39] Сергей Кукуев
We haven't got 16cpus nodes. So we didn't test it on such configuration.
[7 Apr 2017 7:47] MySQL Verification Team

Sorry, my mistake, so you reproduced on 48 and 32CPU boxes both.

This bug is fixed number of times already, but looks under some circumstances it reappears. Our dev team is on it, I'm waiting to see if there's any info they need you to provide them from your system as we ourselves have huge problems reproducing this.

all best
[11 May 2017 5:34] MySQL Verification Team

I'm running the test now on 192 core's and I'm reproducing it :(


[mysql@supra10 mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64]$ cat config.ini
[ndbd default]
NoOfReplicas= 2
DataDir= /export/home/mysql/mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/clusterdata
DataMemory = 1024M
IndexMemory = 256M

Hostname= localhost
DataDir= /export/home/mysql/mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/clusterdata

HostName= localhost

HostName= localhost

HostName= localhost

HostName= localhost


[mysql@supra10 mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64]$ bin/ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
[ndbd(NDB)]     4 node(s)
id=2    @  (mysql-5.6.36 ndb-7.4.15, Nodegroup: 0, *)
id=3    @  (mysql-5.6.36 ndb-7.4.15, Nodegroup: 0)
id=4    @  (mysql-5.6.36 ndb-7.4.15, Nodegroup: 1)
id=5    @  (mysql-5.6.36 ndb-7.4.15, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @  (mysql-5.6.36 ndb-7.4.15)

[mysqld(API)]   5 node(s)
id=6    @  (mysql-5.6.36 ndb-7.4.15)
id=7    @  (mysql-5.6.36 ndb-7.4.15)
id=8 (not connected, accepting connect from any host)
id=9 (not connected, accepting connect from any host)
id=10 (not connected, accepting connect from any host)

[mysql@supra10 mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64]$ cat mysql.cnf 
binlog-format = ROW
gtid-mode = ON
enforce-gtid-consistency = ON
log-slave-updates = ON
master-info-repository = TABLE
relay-log-info-repository = TABLE
binlog-checksum = NONE

[mysql@supra10 mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64]$ 

running your test reproduces the problem
g++ -o testcase testcase.c -I mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/include/ -L mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/lib/ -I ./mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/include/storage/ndb/ndbapi -I./mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/include/storage/ndb/ -I ./mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/include/storage/ -lmysqlclient -lndbclient -std=gnu++11

[mysql@supra10 ~]$ LD_LIBRARY_PATH=mysql-cluster-gpl-7.4.15-linux-glibc2.5-x86_64/lib/ ./testcase localhost BS 1000000 80
execute:222: line: 144: Rowid already allocated
execute:222: line: 144: Rowid already allocated
execute:222: line: 144: Rowid already allocated

Setting the bug to verified. Thanks for the test case!

all best