Bug #41193 Heavy simple inserts can cause random data node failures lgman.cpp
Submitted: 3 Dec 2008 3:09 Modified: 20 Dec 2008 19:54
Reporter: Jonathan Miller Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Disk Data Severity:S2 (Serious)
Version:5.1-telco-6.3 OS:Linux
Assigned to: Assigned Account CPU Architecture:Any

[3 Dec 2008 3:09] Jonathan Miller
Description:
While running simple inserts test using NDBAtomics one of the data nodes failed with the following:

Time: Tuesday 2 December 2008 - 08:07:21
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: lgman.cpp
Error object: LGMAN (Line: 1468) 0x00000006
Program: /data0/cr_autotest/libexec/ndbd
Pid: 19788
Trace: ./ndb_2_trace.log.1
Version: mysql-5.1.30 ndb-6.3.20-GA
***EOM***

2008-12-02 08:07:22 [MgmSrvr] ALERT    -- Node 1: Node 2 Disconnected
2008-12-02 08:07:22 [MgmSrvr] ALERT    -- Node 3: Node 2 Disconnected
2008-12-02 08:07:22 [MgmSrvr] INFO     -- Node 3: Communication to Node 2 closed
2008-12-02 08:07:22 [MgmSrvr] ALERT    -- Node 3: Network partitioning - arbitration required
2008-12-02 08:07:22 [MgmSrvr] INFO     -- Node 3: President restarts arbitration thread [state=7]
2008-12-02 08:07:22 [MgmSrvr] ALERT    -- Node 3: Arbitration won - positive reply from node 1
2008-12-02 08:07:22 [MgmSrvr] INFO     -- Node 3: GCP Take over started
2008-12-02 08:07:22 [MgmSrvr] INFO     -- Node 3: GCP Take over completed
2008-12-02 08:07:22 [MgmSrvr] INFO     -- Node 3: kk: 3214/61 2 0
2008-12-02 08:07:22 [MgmSrvr] ALERT    -- Node 3: Node 5 Disconnected
2008-12-02 08:07:22 [MgmSrvr] INFO     -- Node 3: Communication to Node 5 closed
2008-12-02 08:07:22 [MgmSrvr] INFO     -- Mgmt server state: nodeid 5 freed, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000012.
2008-12-02 08:07:22 [MgmSrvr] ALERT    -- Node 2: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2008-12-02 08:07:22 [MgmSrvr] INFO     -- Node 3: Started arbitrator node 1 [ticket=75200002db419c05]

Note: 900 insert operations per transactions were being passed 

How to repeat:
I have not repeated it just yet

ACRT

in one terminal
/space/cluster_rep_auto>sh -x scripts/boot.sh --clone=mysql-5.1-telco-6.3 --CONF=/space/cluster_rep_auto/cr-autotest.conf --start-and-exit 2-dn

in another terminal
/space/cluster_rep_auto>sh -x drivers/ndbatomics-dd-tester.sh ./cr-autotest.conf

[atrt]
basedir=CHOOSE_dir
baseport=15000
clusters= .master

[ndb_mgmd]

[mysqld]
skip-grant-tables
skip-innodb
ndb_use_exact_count=0
loose-join_cache_level=6

[cluster_config]
MaxNoOfSavedMessages = 1000

[cluster_config.master]
NoOfReplicas = 2
DataMemory = 4000M
IndexMemory = 400M
RedoBuffer=200M
NoOfFragmentLogFiles=10
FragmentLogFileSize=256M
MaxNoOfConcurrentOperations = 250000
MaxNoOfLocalOperations = 275000
MaxNoOfConcurrentIndexOperations = 20000
MaxNoOfAttributes=2048
MaxNoOfOrderedIndexes=512
MaxNoOfUniqueHashIndexes=512
DiskPageBufferMemory=1048MB
LockPagesInMainMemory=1
DiskCheckpointSpeed=16M

ndb_mgmd = CHOOSE_host2
ndbd = CHOOSE_host2,CHOOSE_host3
mysqld = CHOOSE_host1
ndbapi=  CHOOSE_host1,CHOOSE_host1

[cluster_config.ndbd.1.master]
FileSystemPath=/data1/

[cluster_config.ndbd.2.master]
FileSystemPath=/data1/

CREATE LOGFILE GROUP $our_lfg_name
                           ADD UNDOFILE 'undofile.dat'
                           INITIAL_SIZE 2000M
                           UNDO_BUFFER_SIZE = 4M
                           ENGINE=NDB;
ALTER LOGFILE GROUP $our_lfg_name
                               ADD UNDOFILE '$file'
                               INITIAL_SIZE 2500M
                               ENGINE=NDB;

CREATE TABLESPACE $our_ts_name
                           ADD DATAFILE 'datafile.dat'
                           USE LOGFILE GROUP $our_lfg_name
                           INITIAL_SIZE 50M
                           ENGINE=NDB;

+ 39 of
"ALTER TABLESPACE $our_ts_name
                               ADD DATAFILE '$file'
                               INITIAL_SIZE 50M
                               ENGINE=NDB;
[4 Dec 2008 18:18] Jonathan Miller
Reproduced on mysql-5.1.30 ndb-6.4

Time: Thursday 4 December 2008 - 21:06:00
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: lgman.cpp
Error object: LGMAN (Line: 1540) 0x00000006
Program: /data0/cr_autotest/libexec/ndbd
Pid: 1581
Trace: ./ndb_2_trace.log.1
Version: mysql-5.1.30 ndb-6.4.0-alpha
[18 Dec 2008 12:01] Jonas Oreland
http://bugs.mysql.com/bug.php?id=28077
[20 Dec 2008 19:54] Jonas Oreland
close as duplicate of 28077,
but given that I just fixed that,
you're most welcome to retest it.