Bug #16870 Cluster: MySQLD Process allowed to connect to wrong ID
Submitted: 28 Jan 2006 17:30 Modified: 22 May 2006 21:58
Reporter: Jonathan Miller Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:4.1 -> OS:Linux (Linux 32 Bit OS)
Assigned to: CPU Architecture:Any

[28 Jan 2006 17:30] Jonathan Miller
Description:
In starting up TPC-B testing I received:

ERROR 1017 (HY000): Can't find file: 'account' (errno: 2)

Then looking inside MySQL
mysql> use TPCB
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+----------------+
| Tables_in_TPCB |
+----------------+
| account        |
| branch         |
| history        |
| teller         |
| trans          |
+----------------+
5 rows in set (0.00 sec)

mysql> select * from account;
ERROR 1017 (HY000): Can't find file: 'account' (errno: 2)
mysql> select * from teller;
ERROR 1017 (HY000): Can't find file: 'teller' (errno: 2)
mysql> exit

Looking at the error log showed

Configuration error: Could not connect to socket : Could not alloc node id at ndb08 port 14000: Connection done from wrong host ip 10.100.1.94.
060128 18:14:48 [Note] /home/ndbdev/jmiller/builds/libexec/mysqld: ready for connections.
Version: '5.1.6-alpha-log'  socket: '/tmp/mysql2.sock'  port: 3307  Source distribution

The config.ini:

[DB DEFAULT]
NoOfReplicas: 2
IndexMemory: 500M
DataMemory: 1300M
BackupMemory: 64M
MaxNoOfLocalOperations: 300000
MaxNoOfTables: 200
StopOnError: 1
MaxNoOfConcurrentScans: 100
DataDir: /space/run
DiskPageBufferMemory: 500M
#DiskPageBufferMemory: 4M
MaxNoOfConcurrentOperations: 300000

[MGM DEFAULT]
PortNumber: 14000
DataDir: /space/run

[TCP DEFAULT]
SendBufferMemory: 10485760

[ndb_mgmd]
Id: 1
HostName: ndb08
ArbitrationRank: 1

[ndbd]
Id: 2
HostName: ndb13
FileSystemPath: /space/node1

[ndbd]
Id: 3
HostName: ndb14
FileSystemPath: /space/node1

[api]
Id: 4
HostName: ndb13

[api]
Id: 5
HostName: ndb14

[mysqld]
Id: 6
HostName: ndb08

[mysqld]
Id: 7
HostName: ndb09

[mysqld]
Id: 8
HostName: ndb09

[mysqld]
Id: 9
HostName: ndb10

[mysqld]
Id: 10
HostName: ndb10

[mysqld]
Id: 11
HostName: ndb11

[mysqld]
Id: 12
HostName: ndb11

[mysqld]
Id: 13
HostName: ndb12

[mysqld]
Id: 14
HostName: ndb12

[mysqld]
Id: 15
HostName: ndb15

But the show showed ndb15 connected as ID 7 

[mysqld(API)]   12 node(s)
id=4 (not connected, accepting connect from ndb13)
id=5 (not connected, accepting connect from ndb14)
id=6    @XXX.100.1.93  (Version: 5.1.6)
id=7    @XXX.100.1.162  (Version: 5.1.6)
id=8    @XXX.100.1.94  (Version: 5.1.6)
id=9    @XXX.100.1.95  (Version: 5.1.6)
id=10   @XXX.100.1.95  (Version: 5.1.6)
id=11   @XXX.100.1.96  (Version: 5.1.6)
id=12   @XXX.100.1.96  (Version: 5.1.6)
id=13   @XXX.100.1.97  (Version: 5.1.6)
id=14   @XXX.100.1.97  (Version: 5.1.6)
id=15 (not connected, accepting connect from ndb15)

I had to stop the mysqld process on ndb15

[mysqld(API)]   12 node(s)
id=4 (not connected, accepting connect from ndb13)
id=5 (not connected, accepting connect from ndb14)
id=6    @XXX.100.1.93  (Version: 5.1.6)
id=7 (not connected, accepting connect from ndb09)
id=8    @XXX.100.1.94  (Version: 5.1.6)
id=9    @XXX.100.1.95  (Version: 5.1.6)
id=10   @XXX.100.1.95  (Version: 5.1.6)
id=11   @XXX.100.1.96  (Version: 5.1.6)
id=12   @XXX.100.1.96  (Version: 5.1.6)
id=13   @XXX.100.1.97  (Version: 5.1.6)
id=14   @XXX.100.1.97  (Version: 5.1.6)
id=15 (not connected, accepting connect from ndb15)

Then I could start the second MySQLD process on NDB09 and have it connect.

id=4 (not connected, accepting connect from ndb13)
id=5 (not connected, accepting connect from ndb14)
id=6    @XXX.100.1.93  (Version: 5.1.6)
id=7    @XXX.100.1.94  (Version: 5.1.6)
id=8    @XXX.100.1.94  (Version: 5.1.6)
id=9    @XXX.100.1.95  (Version: 5.1.6)
id=10   @XXX.100.1.95  (Version: 5.1.6)
id=11   @XXX.100.1.96  (Version: 5.1.6)
id=12   @XXX.100.1.96  (Version: 5.1.6)
id=13   @XXX.100.1.97  (Version: 5.1.6)
id=14   @XXX.100.1.97  (Version: 5.1.6)
id=15 (not connected, accepting connect from ndb15)

How to repeat:
Not sure

Suggested fix:
Ensure that only host connect to their reserved slots