Bug #16193 DD: drop logfile group corrupts NDB File system and cause NDBD crash
Submitted: 4 Jan 2006 16:21 Modified: 20 Jan 2006 16:05
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1 OS:Linux (Linux)
Assigned to: Jon Stephens CPU Architecture:Any

[4 Jan 2006 16:21] Jonathan Miller
Description:
Test Cases: 

1) Start Cluster
2) From MySQL Client > 
 CREATE LOGFILE GROUP lg1 ADD UNDOFILE '/space/run/undofile.dat' INITIAL_SIZE 16M UNDO_BUFFER_SIZE = 1M ENGINE=NDB;
3) Client> 
drop logfile group lg1 engine = ndb;
4) Client>
CREATE LOGFILE GROUP lg1 ADD UNDOFILE '/space/run/undofile.dat' INITIAL_SIZE 16M UNDO_BUFFER_SIZE = 1M ENGINE=NDB;
ERROR 1502 (HY000): Failed to create UNDOFILE

NDBD Error Log:
Time: Wednesday 4 January 2006 - 17:09:30
Status: Ndbd file system error, restart node initial
Message: File not found (Ndbd file system inconsistency error, please report a bug)
Error: 2815
Error data: LGMAN: File system close failed. OS errno: 2
Error object: LGMAN (Line: 1820) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 6085
Trace: /space/run/ndb_5_trace.log.1
Version: Version 5.1.5 (alpha)
***EOM***

Time: Wednesday 4 January 2006 - 17:12:58
Status: Temporary error, restart node
Message: Internal program error (failed ndbassert) (Internal error, programming error or missing error message, please report a bug)
Error: 2343
Error data: lgman.cpp
Error object: LGMAN (Line: 1672) 0x00000008
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 6274
Trace: /space/run/ndb_5_trace.log.2
Version: Version 5.1.5 (alpha)
***EOM***

Time: Wednesday 4 January 2006 - 17:13:09
Status: Temporary error, restart node
Message: Assertion (Internal error, programming error or missing error message, please report a bug)
Error: 2301
Error data: ArrayPool<T>::getPtr
Error object: ../../../../../storage/ndb/src/kernel/vm/ArrayPool.hpp line: 330 (block: LGMAN)
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 6376
Trace: /space/run/ndb_5_trace.log.3
Version: Version 5.1.5 (alpha)
***EOM***

Time: Wednesday 4 January 2006 - 17:13:17
Status: Temporary error, restart node
Message: Assertion (Internal error, programming error or missing error message, please report a bug)
Error: 2301
Error data: ArrayPool<T>::getPtr
Error object: ../../../../../storage/ndb/src/kernel/vm/ArrayPool.hpp line: 330 (block: LGMAN)
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 6446
Trace: /space/run/ndb_5_trace.log.4
Version: Version 5.1.5 (alpha)
***EOM***

How to repeat:
see above
[4 Jan 2006 16:51] Jonathan Miller
Seems like you need to shudown and restart the cluster between the create and the drop to cause this problem
[6 Jan 2006 17:07] Jonas Oreland
I've fixed error(s) in tracefile 2,3,4,5 (in my unpushed clone)
Error in tracefile 1, I can't reproduce.

Question: When I read your config you have both ndb nodes on ndb08
That means that this statement "CREATE LOGFILE GROUP lg1 ADD UNDOFILE '/space/run/undofile.dat' INITIAL_SIZE" should _always_ fail.

Since both of the nodes tries to create the file at '/space/run/undofile.dat' at both of them
cant succed.

Can you try to reproduce it?
[6 Jan 2006 18:19] Jonathan Miller
Hi,

Your "statement":
Question: When I read your config you have both ndb nodes on ndb08 That means that this statement "CREATE LOGFILE GROUP lg1 ADD UNDOFILE '/space/run/undofile.dat' INITIAL_SIZE" should _always_ fail.

Logically, I understand what you are saying, but a couple of issues with this statement that I need to understand.

1) If the above won't work for me, then how does your ndb_basic_disk.test work? The mysql-test-run config.ini is setup the same way:
[ndbd default]
NoOfReplicas= 2
MaxNoOfConcurrentTransactions= 64
MaxNoOfConcurrentOperations= 5000
DataMemory= 10M
IndexMemory= 1M
Diskless= 0
TimeBetweenWatchDogCheck= 30000
DataDir= /home/ndbdev/jmiller/clones/mysql-5.1-dd-new/mysql-test/var/ndbcluster-9350
MaxNoOfOrderedIndexes= 32
MaxNoOfAttributes= 2048
TimeBetweenGlobalCheckpoints= 500
NoOfFragmentLogFiles= 3

[ndbd]
HostName= localhost

[ndbd]
HostName= localhost

2) Is this not a defect\bug\implementation short comming? Many customers do run two data nodes on one system, but now they can only run one if they use disk data?

Maybe I am missing some information here that you can provide. 

I will work on reproducing this bug and place the findings here .

Best Regards.
[6 Jan 2006 18:24] Jonas Oreland
ndb_basic_disk uses filenames 'undofile.dat'
note that there is no "/" in the beginning...then the the file will but put relative
FileSystemPath using "/" as first character in filename makes it a absolute path...
  then it's not possible to use to nodes on one computer....(unless the run chroot)
[6 Jan 2006 18:25] Jonathan Miller
Setting back to verified until after resloved.
[6 Jan 2006 21:52] Jonathan Miller
Update to test case:
mysql> CREATE LOGFILE GROUP lg1 ADD UNDOFILE '/space/run/undofile.dat' INITIAL_SIZE 16M UNDO_BUFFER_SIZE = 1M ENGINE=NDB;
ERROR 1502 (HY000): Failed to create UNDOFILE

mysql> drop logfile group lg1 engine = ndb;
Query OK, 0 rows affected (0.12 sec)

mysql> CREATE LOGFILE GROUP lg1 ADD UNDOFILE '/space/run/undofile.dat' INITIAL_SIZE 16M UNDO_BUFFER_SIZE = 1M ENGINE=NDB;
Query OK, 0 rows affected (1.43 sec)

mysql> drop logfile group lg1 engine = ndb; ERROR 1503 (HY000): Failed to drop LOGFILE GROUP

Time: Friday 6 January 2006 - 22:50:06
Status: Ndbd file system error, restart node initial
Message: File not found (Ndbd file system inconsistency error, please report a bug)
Error: 2815
Error data: LGMAN: File system close failed. OS errno: 2
Error object: LGMAN (Line: 1820) 0x0000000c
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 25833
Trace: /space/run/ndb_4_trace.log.1
Version: Version 5.1.5 (alpha)
***EOM***

Current byte-offset of file-pointer is: 568

Time: Friday 6 January 2006 - 22:50:07
Status: Temporary error, restart node
Message: Assertion (Internal error, programming error or missing error message, please report a bug)
Error: 2301
Error data: ArrayPool<T>::getPtr
Error object: ../../../../../../storage/ndb/src/kernel/vm/ArrayPool.hpp line: 365 (block: DBDICT)Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 25837
Trace: /space/run/ndb_5_trace.log.1
Version: Version 5.1.5 (alpha)
[10 Jan 2006 9:39] Jonas Oreland
Fixed...please retest
[10 Jan 2006 14:15] Jonathan Miller
patch works.
[18 Jan 2006 19:01] Mike Hillyer
No version numbers, cannot make changelog entry.
[20 Jan 2006 16:05] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented bugfix in 5.1.6 changelog. Closed.