Bug #45462 "Waiting for ndbcluster global schema lock" on an empty cluster
Submitted: 11 Jun 2009 21:29 Modified: 19 Jan 2016 18:19
Reporter: David Ashman Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.1-telco-6.3 OS:Linux (RHEL5)
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: mysql-5.1.32 ndb-6.3.23

[11 Jun 2009 21:29] David Ashman
Description:
After starting up a fresh cluster with --initial, I am unable to create a database on any API node. When attempting to create one, the thread hangs. Processlist shows a state of 'Waiting for ndbcluster global schema lock' with the command being 'show tables'.

If I shutdown the cluster, I can create the empty database on the api nodes successfully. After restarting the cluster, I am again unable to create additional databases or even run a 'use database' command.

A forum thread was started here: http://forums.mysql.com/read.php?25,266418,266418#msg-266418

The cluster setup is:
23 servers running 2 data nodes each, for 46 data nodes (to better use multi-core hardware until we upgrade to multi-threaded processes)
24 servers running 4 api nodes each with mysql_multi, for 96 api nodes (again, for multi-core performance reasons)
2 management servers

How to repeat:
Use the following config.ini, start the cluster, attempt to create a database on an API node.

config.ini
-------------------------------------------------------

[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=5400M
IndexMemory=700M
LockPagesInMainMemory=1
MemReportFrequency=300
MaxNoOfConcurrentOperations=51200
MaxNoOfConcurrentTransactions=20480
DataDir=/ndb/mysql-cluster
RealTimeScheduler=1
NoOfFragmentLogFiles=300
RedoBuffer=32M
FragmentLogFileSize=32M
Odirect=0
CompressedBackup=1
CompressedLCP=1

[MYSQLD DEFAULT]
BatchByteSize=1M
BatchSize=992

[NDB_MGMD DEFAULT]

[TCP DEFAULT]
SendBufferMemory=2M
ReceiveBufferMemory=2M

# Section for the cluster management nodes
[NDB_MGMD]
HostName=ndb-mgmt01

[NDB_MGMD]
HostName=ndb-mgmt02

# Section for the storage nodes
# First set NDB storage nodes
[NDBD]
Id=3
HostName=ndb01
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=4
HostName=ndb02
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=5
HostName=ndb03
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=6
HostName=ndb04
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=7
HostName=ndb05
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=8
HostName=ndb06
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=9
HostName=ndb07
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=10
HostName=ndb08
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=11
HostName=ndb09
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=12
HostName=ndb10
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=13
HostName=ndb11
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=14
HostName=ndb12
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=15
HostName=ndb13
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=16
HostName=ndb14
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=17
HostName=ndb15
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=18
HostName=ndb16
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=19
HostName=ndb17
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=20
HostName=ndb18
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=21
HostName=ndb19
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=22
HostName=ndb20
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=23
HostName=ndb21
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=24
HostName=ndb22
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=25
HostName=ndb23
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

# Second set of NDB storage nodes
[NDBD]
Id=26
HostName=ndb01
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=27
HostName=ndb02
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=28
HostName=ndb03
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=29
HostName=ndb04
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=30
HostName=ndb05
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=31
HostName=ndb06
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=32
HostName=ndb07
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=33
HostName=ndb08
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=34
HostName=ndb09
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=35
HostName=ndb10
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=36
HostName=ndb11
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=37
HostName=ndb12
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=38
HostName=ndb13
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=39
HostName=ndb14
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=40
HostName=ndb15
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=41
HostName=ndb16
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=42
HostName=ndb17
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=43
HostName=ndb18
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=44
HostName=ndb19
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=45
HostName=ndb20
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=46
HostName=ndb21
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=47
HostName=ndb22
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=48
HostName=ndb23
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

# MySQL API Nodes
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

# MySQL API Nodes
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

-------------------------------------------------------
[15 Jun 2009 13:22] Jørgen Austvik
Thanks for the report!

Could you please send us the logs of the mysqld that is hanging?
[15 Jun 2009 13:22] Jørgen Austvik
...and the cluster log and the data node logs.
[15 Jun 2009 18:19] David Ashman
The data node logs may not be useful, I wiped the FS on the data nodes at one point.

Attachment: 45462-LogFiles.zip (, text), 83.76 KiB.

[16 Jun 2009 1:44] David Ashman
We managed to get the cluster up and running by reducing the number of nodes. With 44 data nodes (22x2, 1 less than the initial setup) and 84 API nodes (21x4, 3 less than the initial setup) the cluster works fine. If I add a 22nd API node server (84 to 88 API nodes), the global schema lock occurs. Shutting down that server frees up the lock. I used three different servers as the 22nd node and the same problem occurs so I do still believe it's a bug somewhere and not hardware related, possibly related to having a large number of a nodes?
[17 Sep 2009 11:49] Gustaf Thorslund
David,

Could you please also provide your my.cnf for the SQL nodes?

/Gustaf
[17 Sep 2009 16:33] David Ashman
I know I've added max_connections, table_open_cache, and skip-name-resolve since we had the issue, but I'm pretty sure the rest is exactly what was there before:

[mysqld]
socket=/var/lib/mysql/mysql.sock
datadir=/var/lib/mysql
federated
ndbcluster
ndb-connectstring=ndb-mgmt01,ndb-mgmt02
ndb_force_send=1
ndb_optimized_node_selection=3
ndb_use_exact_count=0
engine_condition_pushdown=1
ndb_index_stat_enable=1
max_connections=1000
slow_query_log=1
skip-name-resolve
table_open_cache=1024

[mysql_cluster]
ndb-connectstring=ndb-mgmt01,ndb-mgmt02
datadir=/usr/share

[mysqld_safe]
log-error=/var/lib/mysql/mysqld.err
pid-file=/var/run/mysqld/mysqld.pid
language=/usr/share/mysql/english
[19 Jan 2016 13:58] MySQL Verification Team
Hi,

Thanks for the report. 
I'm setting this to closed as this can't be reproduced on any of the modern releases of the cluster.

kind regards
Bogdan Kecman
[19 Jan 2016 18:19] David Ashman
Were you able to test with 46-48 data nodes? We were never able to reproduce the problem with 44 or less nodes, but every attempt we tried with 46 or 48 nodes had the same issue. We tried with multiple versions in the 6.3 line and an early 7.0 version as well.

We no longer have the hardware to test again on our end with that many nodes, so I'm not sure if the issue persists in later versions or if it was fixed at some point. If it's fixed, I'm happy, just wanted to make sure the test case included the node count since that's the only time we ran into it.