Bug #57057 MRR scan + delete causes leak in NDB stored procedure pool
Submitted: 28 Sep 2010 9:38 Modified: 25 Jan 2011 10:16
Reporter: yuan chaohua Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:any OS:Linux
Assigned to: Pekka Nousiainen CPU Architecture:Any
Tags: cluster, dbtup, ndbd

[28 Sep 2010 9:38] yuan chaohua
Description:
Hi, Could anyone help or give some information about this error. 

Version: mysql-5.1.44 ndb-7.1.4b
Trace: /usr/local/mysql/data/ndb_3_trace.log.2 /usr/local/mysql/data/ndb_3_trace.log.2_t1 /usr/local/mysql/data/ndb_3_tra
Time: Tuesday 28 September 2010 - 15:24:58
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbtup/DbtupStoredProcDef.cpp
Error object: DBTUP (Line: 114) 0x00000008
Program: ndbd
Pid: 9376
Version: mysql-5.1.44 ndb-7.1.4b
Trace: /usr/local/mysql/data/ndb_3_trace.log.3

My cluster config and cnf files:

config.ini

# Options affecting ndbd processes on all data nodes:
[ndbd default]
NoOfReplicas=2    # Number of replicas
#DataMemory=3072M    # How much memory to allocate for data storage
#DataMemory=20480M
#DataMemory=18432M
DataMemory=15360M

IndexMemory=3413M   # How much memory to allocate for index storage
                  # For DataMemory and IndexMemory, we have used the
                  # default values. Since the "world" database takes up
                  # only about 500KB, this should be more than enough for
                  # this example Cluster setup.
StringMemory=25

ODirect=1
MaxNoOfLocalScans=64

MaxNoOfTables=5120
MaxNoOfOrderedIndexes=5120
MaxNoOfUniqueHashIndexes=5120
MaxNoOfAttributes=12000
#MaxNoOfAttributes=24576
MaxNoOfTriggers=14336
MaxNoOfConcurrentOperations=5000000
# 1.1 * MaxNoOfConcurrentOperations
#MaxNoOfLocalOperations=5500000
#MaxAllocate=50M

LockPagesInMainMemory=1

MaxNoOfConcurrentTransactions=16384

NoOfFragmentLogFiles=48

#### New Add ##############
DiskCheckpointSpeedInRestart=100M
FragmentLogFileSize=256M
#TimeBetweenLocalCheckpoints=20
TimeBetweenGlobalCheckpoints=1000
TimeBetweenEpochs=100
InitFragmentLogFiles=SPARSE
MemReportFrequency=30
BackupReportFrequency=10

### Watchdog 
#TimeBetweenWatchDogCheck =60000
TimeBetweenWatchdogCheckInitial=60000

### TransactionInactiveTimeout  - should be enabled in Production 
TransactionInactiveTimeout=60000
SharedGlobalMemory=384M          
LongMessageBuffer=1024M 
BatchSizePerLocalScan=512
#############################

#InitFragmentLogFiles=FULL
RedoBuffer=32M
#

MaxNoOfConcurrentScans=500

TransactionBufferMemory=10M
TimeBetweenLocalCheckpoints=4

DiskPageBufferMemory=256M
DiskCheckpointSpeed=100M

LogLevelStartup=15
LogLevelShutdown=15
LogLevelCheckpoint=8
LogLevelNodeRestart=15
LogLevelError=15

BackupWriteSize=1M
BackupDataBufferSize=16M
BackupLogBufferSize=4M
BackupMemory=20M

UndoIndexBuffer=64M
UndoDataBuffer=256M

StopOnError=0            

#NoOfDiskPagesToDiskAfterRestartTUP=40
#NoOfDiskPagesToDiskAfterRestartACC=20
#NoOfDiskPagesToDiskDuringRestartTUP=40
#NoOfDiskPagesToDiskDuringRestartACC=20

## modify at 6.28
MaxNoOfExecutionThreads=8

TotalSendBufferMemory=20M

# New Add ##
HeartbeatIntervalDbDb=10000
HeartbeatIntervalDbApi=10000
TimeBetweenWatchDogCheck=15000
ArbitrationTimeout=15000              
TransactionDeadLockDetectionTimeOut=100000

#############################################################################################

# Management process options:
[ndb_mgmd]
id=1
hostname=10.192.136.117           # Hostname or IP address of management node
datadir=/var/lib/mysql-cluster  # Directory for management node log files

ArbitrationRank=1
ArbitrationDelay=0

# Options for data node "A":
[ndbd]
id=2                                # (one [ndbd] section per data node)
hostname=10.192.136.118           # Hostname or IP address
datadir=/usr/local/mysql/data   # Directory for this data node's data files
#TotalSendBufferMemory=200M

# Options for data node "B":
[ndbd]
id=3
hostname=10.192.136.116           # Hostname or IP address
datadir=/usr/local/mysql/data   # Directory for this data node's data files
#TotalSendBufferMemory=200M

######################################################################################################

# SQL node options:

[mysqld]
ArbitrationRank=0
hostname=10.192.136.108           
id=18
MaxScanBatchSize=16M
ArbitrationDelay=0

[mysqld]
ArbitrationRank=0
hostname=10.192.136.108
id=19
MaxScanBatchSize=16M
ArbitrationDelay=0

[mysqld]
id=20
ArbitrationRank=0
#hostname=10.192.136.108
MaxScanBatchSize=16M
ArbitrationDelay=0

[mysqld]
id=38
ArbitrationRank=0
hostname=10.192.136.110
MaxScanBatchSize=16M
ArbitrationDelay=0

[mysqld]
id=39
ArbitrationRank=0
hostname=10.192.136.110
MaxScanBatchSize=16M
ArbitrationDelay=0

[mysqld]
id=40
ArbitrationRank=0
#hostname=10.192.136.110
MaxScanBatchSize=16M
ArbitrationDelay=0
###########################################################################################

# TCP/IP options:
[tcp default]
#portnumber=1186   # This the default; however, you can use any port that is free
                  # for all the hosts in the cluster
                  # Note: It is recommended that you do not specify the port
                  # number at all and allow the default value to be used instead
SendBufferMemory=20480K
ReceiveBufferMemory=20480K

[tcp]
NodeId1=2
NodeId2=3
HostName1=192.168.0.12
HostName2=192.168.0.11

my.cnf file:
# Options for mysqld process:
[client]
port = 3306
socket = /tmp/mysql.sock

[mysqld]
port = 3306
socket = /tmp/mysql.sock
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
max_allowed_packet = 160M 
init_connect='set transaction_allow_batching=1;set autocommit=0'

#log = /tmp/mysqld.sql
log=/usr/local/mysql/log/mysqld.sql
log-error=/usr/local/mysql/log/localhost.err

#bind-address=10.192.140.110

skip-locking
skip-innodb
#skip-merge
skip-name-resolve
query_cache_type = 2
query_cache_size = 128M
net_buffer_length = 20000
lower_case_table_names = 1
thread_cache_size = 200
ndb_batch_size = 15728640
ndb_use_transactions = 0

# Add new parameter ##########
table_open_cache=2048
default-storage-engine=NDBCLUSTER

# Buffers ##
join_buffer_size=512K
sort_buffer_size=512K
read_buffer_size=256K
read_rnd_buffer_size=256K
# Specific to MyISAM and on disk temnp tables
key_buffer_size = 256M 

#transaction_alloc_block_size=81920
#transaction_prealloc_size=40960
#profiling_history_size=100
#profiling=1
max_connections=2000

#ait_timeout = 10
#nteractive_timeout = 10

long_query_time = 1
log-slow-queries = /usr/local/mysql/log/slow.log
log-queries-not-using-indexes 

#log-bin=mysql-bin
#binlog_format=mixed

ndb_autoincrement_prefetch_sz = 1024
#ndb_cluster_connection_pool = 4 

ndbcluster                      # run NDB storage engine
ndb-connectstring=10.192.136.117  # location of management server
ndb-use-exact-count=0
ndb-index-stat-enable=0
ndb-force-send=1
engine-condition-pushdown=1

[ndbd]
connect-string=10.192.136.118
[ndbd]
connect-string=10.192.136.116

# Options for ndbd process:
[mysql_cluster]
ndb-connectstring=10.192.136.117  # location of management server

[mysqldump]
max_allowed_packet = 160M

[mysqlhotcopy]
interactive-timeout

Thanks guys!!!!

How to repeat:
I can repeat it when do the test.
[28 Sep 2010 9:59] Hartmut Holzgraefe
To simplify our investigation please make sure to run ndb_error_reporter and attach the resulting collection of configuration, trace and log files to this bug report or upload the file to ftp://ftp.mysql.com/pub/mysql/upload and let us know the file name.

For more information about the error reporter please refer to http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-programs-ndb-error-reporter.html

If you have problems running ndb_error_reporter you can also collect your cluster configuration file (config.ini) and ndb_*_cluster.log, ndb_*_out.log, ndb_*_error.log and ndb_*_trace.* files manually.
[28 Sep 2010 10:00] Magnus Blåudd
Please submit reproducable test case and ndb_error_reporter tar file.
[28 Sep 2010 10:07] yuan chaohua
Hi, I can not give the reproduce steps. The test tool is for real production. And i can not attach file because i have no permission to transfer file to internet. 
I looked into the source code and could not find the reason. The error seems related to SP(Stored procedure), But our test case do not use any sp...

Hope someone can help....
[29 Sep 2010 12:16] Jørgen Austvik
Can you please say what the tests you have is doing?
[11 Oct 2010 7:13] yuan chaohua
After change  1. MaxNoOfLocalScans from 64 to 512
               2. ndb_use_transactions from 0 to 1
The error disappear.
[1 Jan 2011 15:38] Pekka Nousiainen
This is a real bug.
It came up now while testing bug#58277.

There is a pool of "stored procs" in NDB
(not related to MySQL SPs) associated with scans.
The crash is a memory leak in this pool when
MRR (multi-range) scans are mixed with deletes.

>After change  1. MaxNoOfLocalScans from 64 to 512
>              2. ndb_use_transactions from 0 to 1

1. increases the pool, does not fix the crash.
2. makes the memory leak less likely if main cause
   was doing scan deletes from mysql with multiple
   ranges like "delete from .. where x < 10 or x > 20".
[1 Jan 2011 15:50] Pekka Nousiainen
Assigned to self, fix under bug#58277.
[3 Jan 2011 10:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/127769

3373 Pekka Nousiainen	2011-01-03
      bug#58277,bug#57057 a06_fix3.diff
      MRR scan and drop or delete fail to release stored proc
[3 Jan 2011 11:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/127770

3374 Pekka Nousiainen	2011-01-03
      bug#58277,bug#57057 a07_fix4.diff
      ifdef on ERROR_INSERT, add to daily-basic-tests
[3 Jan 2011 16:01] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.21 (revid:pekka@mysql.com-20110103150841-7tgh4tfp10wsvz9k) (version source revid:pekka@mysql.com-20110103150841-7tgh4tfp10wsvz9k) (merge vers: 5.1.51-ndb-7.0.21) (pib:24)
[3 Jan 2011 16:02] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.40 (revid:pekka@mysql.com-20110103110242-3wsx0gxnnnoa0ue0) (version source revid:pekka@mysql.com-20110103110242-3wsx0gxnnnoa0ue0) (merge vers: 5.1.51-ndb-6.3.40) (pib:24)
[5 Jan 2011 17:01] Pekka Nousiainen
update Synopsis
[25 Jan 2011 10:16] Jon Stephens
See BUG#58277 for docs info/changelog entry. Closed.