Bug #45462 "Waiting for ndbcluster global schema lock" on an empty cluster
Submitted: 11 Jun 23:29 Modified: 17 Sep 18:33
Reporter: David Ashman
Status: Open
Category:Server: Cluster Severity:S2 (Serious)
Version:mysql-5.1-telco-6.3 OS:Linux (RHEL5)
Assigned to: Gustaf Thorslund Target Version:
Tags: mysql-5.1.32 ndb-6.3.23
Triage: Triaged: D2 (Serious) / R6 (Needs Assessment) / E6 (Needs Assessment)

[11 Jun 23:29] David Ashman
Description:
After starting up a fresh cluster with --initial, I am unable to create a database on any
API node. When attempting to create one, the thread hangs. Processlist shows a state of
'Waiting for ndbcluster global schema lock' with the command being 'show tables'.

If I shutdown the cluster, I can create the empty database on the api nodes successfully.
After restarting the cluster, I am again unable to create additional databases or even run
a 'use database' command.

A forum thread was started here:
http://forums.mysql.com/read.php?25,266418,266418#msg-266418

The cluster setup is:
23 servers running 2 data nodes each, for 46 data nodes (to better use multi-core
hardware until we upgrade to multi-threaded processes)
24 servers running 4 api nodes each with mysql_multi, for 96 api nodes (again, for
multi-core performance reasons)
2 management servers

How to repeat:
Use the following config.ini, start the cluster, attempt to create a database on an API
node.

config.ini
-------------------------------------------------------

[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=5400M
IndexMemory=700M
LockPagesInMainMemory=1
MemReportFrequency=300
MaxNoOfConcurrentOperations=51200
MaxNoOfConcurrentTransactions=20480
DataDir=/ndb/mysql-cluster
RealTimeScheduler=1
NoOfFragmentLogFiles=300
RedoBuffer=32M
FragmentLogFileSize=32M
Odirect=0
CompressedBackup=1
CompressedLCP=1

[MYSQLD DEFAULT]
BatchByteSize=1M
BatchSize=992

[NDB_MGMD DEFAULT]

[TCP DEFAULT]
SendBufferMemory=2M
ReceiveBufferMemory=2M

# Section for the cluster management nodes
[NDB_MGMD]
HostName=ndb-mgmt01

[NDB_MGMD]
HostName=ndb-mgmt02

# Section for the storage nodes
# First set NDB storage nodes
[NDBD]
Id=3
HostName=ndb01
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=4
HostName=ndb02
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=5
HostName=ndb03
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=6
HostName=ndb04
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=7
HostName=ndb05
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=8
HostName=ndb06
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=9
HostName=ndb07
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=10
HostName=ndb08
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=11
HostName=ndb09
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=12
HostName=ndb10
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=13
HostName=ndb11
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=14
HostName=ndb12
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=15
HostName=ndb13
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=16
HostName=ndb14
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=17
HostName=ndb15
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=18
HostName=ndb16
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=19
HostName=ndb17
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=20
HostName=ndb18
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=21
HostName=ndb19
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=22
HostName=ndb20
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=23
HostName=ndb21
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=24
HostName=ndb22
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

[NDBD]
Id=25
HostName=ndb23
LockMaintThreadsToCPU=2
LockExecuteThreadToCPU=3

# Second set of NDB storage nodes
[NDBD]
Id=26
HostName=ndb01
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=27
HostName=ndb02
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=28
HostName=ndb03
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=29
HostName=ndb04
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=30
HostName=ndb05
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=31
HostName=ndb06
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=32
HostName=ndb07
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=33
HostName=ndb08
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=34
HostName=ndb09
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=35
HostName=ndb10
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=36
HostName=ndb11
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=37
HostName=ndb12
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=38
HostName=ndb13
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=39
HostName=ndb14
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=40
HostName=ndb15
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=41
HostName=ndb16
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=42
HostName=ndb17
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=43
HostName=ndb18
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=44
HostName=ndb19
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=45
HostName=ndb20
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=46
HostName=ndb21
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=47
HostName=ndb22
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

[NDBD]
Id=48
HostName=ndb23
LockMaintThreadsToCPU=4
LockExecuteThreadToCPU=5

# MySQL API Nodes
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

# MySQL API Nodes
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

-------------------------------------------------------
[15 Jun 15:22] Jørgen Austvik
Thanks for the report!

Could you please send us the logs of the mysqld that is hanging?
[15 Jun 15:22] Jørgen Austvik
...and the cluster log and the data node logs.
[15 Jun 20:19] David Ashman
The data node logs may not be useful, I wiped the FS on the data nodes at one point.

Attachment: 45462-LogFiles.zip (, text), 83.76 KiB.

[16 Jun 3:44] David Ashman
We managed to get the cluster up and running by reducing the number of nodes. With 44 data
nodes (22x2, 1 less than the initial setup) and 84 API nodes (21x4, 3 less than the
initial setup) the cluster works fine. If I add a 22nd API node server (84 to 88 API
nodes), the global schema lock occurs. Shutting down that server frees up the lock. I
used three different servers as the 22nd node and the same problem occurs so I do still
believe it's a bug somewhere and not hardware related, possibly related to having a large
number of a nodes?
[17 Sep 13:49] Gustaf Thorslund
David,

Could you please also provide your my.cnf for the SQL nodes?

/Gustaf
[17 Sep 18:33] David Ashman
I know I've added max_connections, table_open_cache, and skip-name-resolve since we had
the issue, but I'm pretty sure the rest is exactly what was there before:

[mysqld]
socket=/var/lib/mysql/mysql.sock
datadir=/var/lib/mysql
federated
ndbcluster
ndb-connectstring=ndb-mgmt01,ndb-mgmt02
ndb_force_send=1
ndb_optimized_node_selection=3
ndb_use_exact_count=0
engine_condition_pushdown=1
ndb_index_stat_enable=1
max_connections=1000
slow_query_log=1
skip-name-resolve
table_open_cache=1024

[mysql_cluster]
ndb-connectstring=ndb-mgmt01,ndb-mgmt02
datadir=/usr/share

[mysqld_safe]
log-error=/var/lib/mysql/mysqld.err
pid-file=/var/run/mysqld/mysqld.pid
language=/usr/share/mysql/english