| Bug #34102 | Creating LOGFILE group crashes cluster | ||
|---|---|---|---|
| Submitted: | 28 Jan 2008 12:53 | Modified: | 19 Feb 2009 14:27 |
| Reporter: | Johan Andersson | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Cluster: Disk Data | Severity: | S3 (Non-critical) |
| Version: | 5.1.23 ndb 6.3.7; ndb 6.3.13 | OS: | Linux |
| Assigned to: | Jonas Oreland | CPU Architecture: | Any |
| Tags: | cluster disk data, disk data, diskdata | ||
[13 May 2008 22:00]
Hartmut Holzgraefe
still reproducible with ndb-6.3.13
[13 May 2008 22:06]
Hartmut Holzgraefe
Restart problem reported as new bug #36702
[19 Feb 2009 10:01]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/66857 2846 Jonas Oreland 2009-02-19 ndb - bug#34102 - lgman crashed if using more that 150M undo-buffer-memory, increase limit to 600M and don't crash
[19 Feb 2009 10:32]
Bugs System
Pushed into 5.1.32-ndb-6.2.17 (revid:jonas@mysql.com-20090219100101-thq39n075vk91jj2) (version source revid:jonas@mysql.com-20090219100101-thq39n075vk91jj2) (merge vers: 5.1.32-ndb-6.2.17) (pib:6)
[19 Feb 2009 10:33]
Bugs System
Pushed into 5.1.32-ndb-6.4.3 (revid:jonas@mysql.com-20090219101945-mi9ni9z66ctoswbi) (version source revid:jonas@mysql.com-20090219101945-mi9ni9z66ctoswbi) (merge vers: 5.1.32-ndb-6.4.3) (pib:6)
[19 Feb 2009 10:36]
Bugs System
Pushed into 5.1.32-ndb-6.3.23 (revid:jonas@mysql.com-20090219103357-fcemygrfinsopjmp) (version source revid:jonas@mysql.com-20090219100413-a1hp7s0agpgl9nxk) (merge vers: 5.1.32-ndb-6.3.23) (pib:6)
[19 Feb 2009 14:27]
Jon Stephens
Documented in the NDB-6.2.17, 6.3.23, and 6.4.3 changelogs as follows:
Trying to execute a CREATE LOGFILE GROUP statement using a value
greater than 150M for UNDO_BUFFER_SIZE caused data nodes to
crash.
As a result of this fix, the upper limit for UNDO_BUFFER_SIZE is
now 600M.
Also noted the before-and-after limits under "CREATE LOGFILE GROUP Syntax".

Description: 2 data node vanilla setup: [ndbd default] NoOfReplicas=2 LockPagesInMainMemory=1 DataMemory=2000M IndexMemory=200M ODirect=1 NoOfFragmentLogFiles=50 FragmentLogFileSize=64M datadir=/data1/johan/mysqlcluster MaxNoOfConcurrentOperations=500000 MaxNoOfConcurrentTransactions=32768 SchedulerSpinTimer=400 SchedulerExecutionTimer=80 RealTimeScheduler=1 TimeBetweenGlobalCheckpoints=1000 TimeBetweenEpochs=200 Diskcheckpointspeed=10M Diskcheckpointspeedinrestart=100M RedoBuffer=32M SharedGlobalMemory=256M I create a logfile group (with undo buffer size = 192M ): mysql> CREATE LOGFILE GROUP lg_1 ADD UNDOFILE '/data0/johan/mysqlcluster/undo_1.dat' INITIAL_SIZE=4096M UNDO_BUFFER_SIZE=192M ENGINE=ndb; ERROR 1528 (HY000): Failed to create LOGFILE GROUP mysql> show errors; +-------+------+-------------------------------------------+ | Level | Code | Message | +-------+------+-------------------------------------------+ | Error | 1296 | Got error 4009 'Cluster Failure' from NDB | | Error | 1528 | Failed to create LOGFILE GROUP | +-------+------+-------------------------------------------+ 2 rows in set (0.00 sec) Both data nodes are down... Time: Monday 28 January 2008 - 13:37:30 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: lgman.cpp Error object: LGMAN (Line: 912) 0x0000000e Program: ndbd Pid: 24033 Trace: /data1/johan/mysqlcluster/ndb_2_trace.log.5 Version: mysql-5.1.23 ndb-6.3.7-beta ***EOM*** lgman.cpp (line 912) has this code: Page_map map(m_data_buffer_pool, ptr.p->m_buffer_pages); while(pages) { Uint32 ptrI; Uint32 cnt = pages > 64 ? 64 : pages; m_ctx.m_mm.alloc_pages(RG_DISK_OPERATIONS, &ptrI, &cnt, 1); if (cnt) { Buffer_idx range; range.m_ptr_i= ptrI; range.m_idx = cnt; ###line 912### ndbrequire(map.append((Uint32*)&range, 2)); pages -= range.m_idx; } So it seems to fail to allocate disk operations.. However, i think it should return an error message instead of crashing the data nodes. Moreover, a subsequent system restart fails: Time: Monday 28 January 2008 - 13:45:42 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: dbdict/Dbdict.cpp Error object: DBDICT (Line: 3560) 0x0000000a Program: ndbd Pid: 24121 Trace: /data1/johan/mysqlcluster/ndb_2_trace.log.7 Version: mysql-5.1.23 ndb-6.3.7-beta ***EOM*** How to repeat: Have: SharedGlobalMemory=256M Create a logfile group with a quite big undo_buffer_size: CREATE LOGFILE GROUP lg_1 ADD UNDOFILE '/data0/johan/mysqlcluster/undo_1.dat' INITIAL_SIZE=4096M UNDO_BUFFER_SIZE=192M ENGINE=ndb; Suggested fix: -