MySQL Bugs: #47222: Forced node shutdown completed. Caused by error 2809

Bug #47222	Forced node shutdown completed. Caused by error 2809
Submitted:	9 Sep 2009 17:06	Modified:	7 Mar 2016 6:34
Reporter:	Matthew Bilek	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	7.0.6	OS:	Linux
Assigned to:	MySQL Verification Team	CPU Architecture:	Any
Tags:	mysql cluster 7.0.6 2809 error

Description:
When ever I start a ndb backup, data node 22 crashes and ndb_mgr displays an error message of:

"Forced node shutdown completed. Caused by error 2809: 'Temporary on access to file(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.".

I have to manually restart the data node 22.  The same error message occurs every time I issue a "start backup" command.

This has occurred before in the past and if I shutdown the cluster it will never come up again.  I would have to initialize the data nodes thus losing data.

This may be in connected to mysql bug http://bugs.mysql.com/bug.php?id=46985 .  

How to repeat:
Refer to bug http://bugs.mysql.com/bug.php?id=46985 for configuration.

Using the ndb management client ndb_mgm enter a backup command:

ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=21   @192.168.0.84  (mysql-5.1.34 ndb-7.0.6, Nodegroup: 0, Master)
id=22   @192.168.0.85  (mysql-5.1.34 ndb-7.0.6, Nodegroup: 0)
id=23   @192.168.0.86  (mysql-5.1.34 ndb-7.0.6, Nodegroup: 1)
id=24   @192.168.0.87  (mysql-5.1.34 ndb-7.0.6, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @192.168.0.80  (mysql-5.1.34 ndb-7.0.6)

[mysqld(API)]   7 node(s)
id=11   @192.168.0.81  (mysql-5.1.34 ndb-7.0.6)
id=12   @192.168.0.81  (mysql-5.1.34 ndb-7.0.6)
id=13   @192.168.0.81  (mysql-5.1.34 ndb-7.0.6)
id=14   @192.168.0.82  (mysql-5.1.34 ndb-7.0.6)
id=15   @192.168.0.82  (mysql-5.1.34 ndb-7.0.6)
id=16   @192.168.0.82  (mysql-5.1.34 ndb-7.0.6)
id=20 (not connected, accepting connect from any host)

ndb_mgm> start backup 7
Waiting for completed, this may take several minutes
Node 21: Backup 7 started from node 1
Node 22: Forced node shutdown completed. Caused by error 2809: 'Temporary on access to file(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 21: Backup 7 started from 1 has been aborted. Error: 1326
Backup failed
*  3001: Could not start backup
*        Backup abortet due to node failure: Permanent error: Internal error
ndb_mgm>

Hi,

1) Can you upload full ndb_error_reporter-tar-ball
2) We also released (so far src only) 7.0.7 which has many bug fixes,
   you can try that

/Jonas

Re-built at 7.0.7 and re-started cluster.  Issued a "START BACKUP 10" and received the following message:

2009-09-15 13:26:01 [MgmSrvr] INFO     -- Node 22: Node 16: API mysql-5.1.35 ndb-7.0.7
2009-09-15 13:26:58 [MgmSrvr] INFO     -- Node 21: Backup 10 started from node 1
2009-09-15 13:27:00 [MgmSrvr] ALERT    -- Node 22: Forced node shutdown completed. Caused by error 2809: 'Temporary on access to file(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2009-09-15 13:27:00 [MgmSrvr] INFO     -- Node 21: Communication to Node 22 closed
2009-09-15 13:27:00 [MgmSrvr] ALERT    -- Node 1: Node 22 Disconnected
2009-09-15 13:27:00 [MgmSrvr] ALERT    -- Node 21: Arbitration check won - node group majority
2009-09-15 13:27:00 [MgmSrvr] INFO     -- Node 21: President restarts arbitration thread [state=6]
2009-09-15 13:27:00 [MgmSrvr] INFO     -- Node 23: Communication to Node 22 closed
2009-09-15 13:27:00 [MgmSrvr] ALERT    -- Node 24: Node 22 Disconnected
2009-09-15 13:27:00 [MgmSrvr] INFO     -- Node 24: Communication to Node 22 closed
2009-09-15 13:27:00 [MgmSrvr] ALERT    -- Node 21: Node 22 Disconnected
2009-09-15 13:27:00 [MgmSrvr] ALERT    -- Node 1: Node 22 Disconnected
2009-09-15 13:27:00 [MgmSrvr] ALERT    -- Node 23: Node 22 Disconnected
2009-09-15 13:27:00 [MgmSrvr] ALERT    -- Node 21: Backup 10 started from 1 has been aborted. Error: 1326

Started all of the cluster data nodes with "--initial-start" parameter and the issue the "START BACKUP 11" command with the following results: 

ndb_mgm> start backup 11
Waiting for completed, this may take several minutes
Node 21: Backup 11 started from node 1
Node 21: Backup 11 started from node 1 completed
 StartGCP: 1283 StopGCP: 1286
 #Records: 2053 #LogRecords: 0
 Data: 51712 bytes Log: 0 bytes
ndb_mgm>

Appears that backup will work only if there is no data in the database.

Jonas,

I also ran across a problem in 7.0.7. When attempting to connect to any of the API nodes specifying a host IP, using MySQL Administrator or MySQL CLI, I would receive the error message of "Can't get hostname for your address".

If I attempted to connect locally without specifying the host IP I would connect regardless of what port "-P" option I would specify, even if the port number was an invalid one.

I had to fallback to 7.0.6 because of this issue.

Matt

Please upload the full ndb_error_reporter-tar-ball

I too found the same problem with 7.0.7 when using the IP Address to connect.

I found if you add the IP + Hostname to the /etc/hosts file, it works just fine. This isn't a good fix as I can imagine others having MANY hosts they would have to add.

The error in the API log is:
091004 13:57:25 [Warning] IP address '10.3.1.4' could not be resolved: getnameinfo() returned error (code: -3).

What's also interesting is that after a successful connection, if you remove the ip/host from /etc/hosts it continues to connect just fine.

Unfortunately I rebuilt at the 7.0.6 version and rebuilt the schema from scratch.  The information that you are looking for is no longer available.  If I run across this problem again I will be sure to provide this information.

Thanks,
Matt

Have you tried adding
[mysqld]
skip-name-resolve
to my.cnf/my.ini?

- after few years: this looks like storage corruption, number of bugs fixed since 2009, not reproducible on any of the current 7.3/7.4 releases

kind regards
bogdan kecman