Bug #18000 DD: New test now takes 100X more space, reports out of space and cores NDBD
Submitted: 6 Mar 2006 22:48 Modified: 13 Apr 2006 11:18
Reporter: Jonathan Miller Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.1.8 OS:Linux (Linux 32 Bit OS)
Assigned to: Assigned Account CPU Architecture:Any

[6 Mar 2006 22:48] Jonathan Miller
Description:
Calvin found where I had made a mistake in the new Free Space test where one data file was created as 8MB and the other was created as 12MB.

CREATE TABLESPACE ts1
ADD DATAFILE 'datafile.dat'
USE LOGFILE GROUP lg1
INITIAL_SIZE 8M
ENGINE NDB;

ALTER TABLESPACE ts1
ADD DATAFILE 'datafile2.dat'
INITIAL_SIZE 12M
ENGINE NDB;

After my pull today and rebuild of 5.1 I had adjusted the first create from 8MB to 12MB and started the test. The test stated shortly after starting that it was out of table space.

anager disabled, skipping manager start.
Loading Standard Test Databases
Starting Tests

TEST                            RESULT
-------------------------------------------------------
ndb_1                          [ fail ]

Errors are (from /home/ndbdev/jmiller/clones/mysql-5.1-new/mysql-test/var/log/mysqltest-time) :
mysqltest: At line 176: query 'LOAD DATA INFILE '../std_data_ln/words.dat' INTO TABLE t1 (word)' failed: 1114: The table 't1' is full
(the last lines may be the most important ones)

So I increased both to 200 MB and still got the same error. I then increased to 1000M each and still got the same just further down in the test case. No other changes had been made to the test case since the last time I ran it.

Files are being created at the right size:
-rw-r--r--  1 ndbdev ndbdev 1048641536 Mar  6 20:28 datafile2.dat
-rw-r--r--  1 ndbdev ndbdev 1048641536 Mar  6 20:28 datafile.dat

Further investigations shows that NDBD had been creating a core file on each run.

-rw-------  1 ndbdev ndbdev 26710016 Mar  6 20:13 core.11297
-rw-------  1 ndbdev ndbdev 26869760 Mar  6 20:15 core.12926
-rw-------  1 ndbdev ndbdev 26869760 Mar  6 20:22 core.14619
-rw-------  1 ndbdev ndbdev 26710016 Mar  6 20:28 core.16279
-rw-------  1 ndbdev ndbdev 26701824 Mar  6 19:09 core.2580
-rw-------  1 ndbdev ndbdev 26701824 Mar  6 19:11 core.4203
-rw-------  1 ndbdev ndbdev 26701824 Mar  6 19:12 core.5832
-rw-------  1 ndbdev ndbdev 26701824 Mar  6 20:09 core.9652

Error Logs:
Time: Monday 6 Mars 2006 - 22:28:08
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dblqh/DblqhMain.cpp
Error object: DBLQH (Line: 6708) 0x0000000e
Program: /home/ndbdev/jmiller/clones/mysql-5.1-new/storage/ndb/src/kernel/ndbd
Pid: 16279
Trace: ./ndb_2_trace.log.1
Version: Version 5.1.8 (beta)
***EOM***

2006-03-06 22:28:06 [MgmSrvr] INFO     -- Node 2: Index usage increased to 100%(160 8K pages of total 160)
2006-03-06 22:28:09 [MgmSrvr] ALERT    -- Node 1: Node 2 Disconnected
2006-03-06 22:28:09 [MgmSrvr] INFO     -- Node 1: Communication to Node 2 closed2006-03-06 22:28:09 [MgmSrvr] ALERT    -- Node 1: Network partitioning - arbitration required
2006-03-06 22:28:09 [MgmSrvr] INFO     -- Node 1: President restarts arbitration thread [state=7]
2006-03-06 22:28:09 [MgmSrvr] INFO     -- Node 3: Node 2 Connected
2006-03-06 22:28:09 [MgmSrvr] ALERT    -- Node 1: Arbitration won - positive reply from node 3
2006-03-06 22:28:09 [MgmSrvr] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

Back Trace from one of the core files:

#0  0x0015e402 in __kernel_vsyscall ()
#1  0x008db118 in raise () from /lib/libc.so.6
#2  0x008dc888 in abort () from /lib/libc.so.6
#3  0x0809f250 in childAbort (code=0, currentStartPhase=0) at main.cpp:105
#4  0x0827b978 in NdbShutdown (type=NST_ErrorHandler, restartType=NRT_Default)
    at Emulator.cpp:248
#5  0x08272606 in SimulatedBlock::progError (this=0xbfc93658, line=6708,
    err_code=0, extra=0x0) at SimulatedBlock.cpp:737
#6  0x08199fb3 in Dblqh::execACCKEYREF (this=0x8af5e60, signal=0x84087e4)
    at dblqh/DblqhMain.cpp:6706
#7  0x0819d8a3 in Dblqh::exec_acckeyreq (this=0x8af5e60, signal=0x84087e4,
    regTcPtr={p = 0x3e41f98, i = 12}) at dblqh/DblqhMain.cpp:3815
#8  0x0819e197 in Dblqh::prepareContinueAfterBlockedLab (this=0x8af5e60,
    signal=0x84087e4) at dblqh/DblqhMain.cpp:3741
#9  0x081a0072 in Dblqh::execATTRINFO (this=0x8af5e60, signal=0x84087e4)
    at dblqh/DblqhMain.cpp:2865
#10 0x08278df2 in FastScheduler::doJob (this=0x8405960)
    at ./SimulatedBlock.hpp:570
#11 0x08279cdf in ThreadConfig::ipControlLoop (this=0x89f1820)
    at ThreadConfig.cpp:175

How to repeat:
I will attach test case to this bug.

Command line used was:

./mysql-test-run --do-test=ndb_1
[6 Mar 2006 23:05] Jonathan Miller
New test with 2GB data file

Attachment: ndb_1.test (application/octet-stream, text), 8.71 KiB.

[6 Mar 2006 23:06] Jonathan Miller
Original file with 8 and 12 MB data files

Attachment: ndb_1.org (application/octet-stream, text), 9.05 KiB.

[13 Mar 2006 8:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/3756
[13 Mar 2006 11:08] Jonas Oreland
Since the crash part is handled by bug17854, i'll handle that there.
[13 Mar 2006 11:09] Jonas Oreland
Smaller testcase wo/ drop table

Attachment: ndb_bug18000.test (application/octet-stream, text), 819 bytes.

[13 Mar 2006 11:18] Jonas Oreland
Hi,

I tested a smaller testcase (attached) and got exact same result in 5.1.7 (tag "clone-5.1.7-build") and current 5.1-bk.

As I explained in mail, the testcase that you supplied uses drop table,
  and the spaced freed by drop table can't currently be reused until a LCP has been
  performed. The test that you supplied will therefore work sometimes
  (if LCP is being run) and fail sometimes, see also bug#17605, giving the
  impression that 100 times more 
  space is needed...

I therefore think that this is not a bug.
I do think that behaviour in bug#17605 should be improved.
  (but as I stated in email, even so, there will be a timeframe that extents dropped
   can be reused...) 

/Jonas

ps. there can ofcourse be something else going on...but I don't think so :-) ds.
[13 Apr 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".