Bug #21929 Slave DN gets unknown PGMAN Error: 2341 during replication of mixed DBT2 testing
Submitted: 30 Aug 2006 20:26 Modified: 30 Nov 2007 19:40
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Disk Data Severity:S3 (Non-critical)
Version:5.1.12 OS:Linux (Linux (AMD64))
Assigned to: Pekka Nousiainen CPU Architecture:Any

[30 Aug 2006 20:26] Jonathan Miller
Description:
Late last night I started a DBT2 load of mixed (disk/memory) and recieved an Error 233 on the slave. I  increased MaxNoOfConcurrentOperations restart the mgt node and each data node and restart the slave. 

This morning I checked to make sure that the slave was caught up with the master and that the data on the slave was == the data on the master.

After which I start running sveral DBT2 tests.
1111.72 new-order transactions per minute (NOTPM)
1041.88 new-order transactions per minute (NOTPM)

On the third test I was checking mysqld and cluster error logs and found that the second DN ID#3 had failed on the slave:

2006-08-30 20:16:36 [MgmSrvr] ALERT    -- Node 3: Forced node shutdown completed. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2006-08-30 20:16:36 [MgmSrvr] ALERT    -- Node 3: Forced node shutdown completed. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2006-08-30 20:16:36 [MgmSrvr] ALERT    -- Node 3: Forced node shutdown completed. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

The error log on DN ID#3:

Time: Wednesday 30 August 2006 - 20:16:36
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: pgman.cpp
Error object: PGMAN (Line: 1463) 0x0000000a
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 5836
Trace: /space/run/ndb_3_trace.log.1
Version: Version 5.1.12 (beta)

I tried to restart the data node, but the DN would hang in phase 4 and be using no CPU on the host. After an hour or so I issues a killall -6 ndbd and tried to restart the DN using --initial, but got the same result of hanging in phase 4.

How to repeat:
Not sure
[15 Sep 2006 21:56] Jonas Oreland
Hi,

I quite certain that this a duplicate of http://bugs.mysql.com/bug.php?id=20612
Can you retest when pekka has pushed fix for that...
[15 Oct 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[1 Jun 2007 7:51] Pekka Nousiainen
the context:

void
Pgman::fsreadreq(Signal* signal, Ptr<Page_entry> ptr)
{
  File_map::ConstDataBufferIterator it;
  bool ret = m_file_map.first(it) && m_file_map.next(it, ptr.p->m_file_no);
  ndbrequire(ret);
  Uint32 fd = * it.data;