Bug #16895 cluster fails with unknown error under load
Submitted: 30 Jan 2006 11:45 Modified: 29 Aug 2010 14:05
Reporter: Matt Gregory Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.0.18 OS:Linux (RedHat Enterprise 4)
Assigned to: Assigned Account CPU Architecture:Any

[30 Jan 2006 11:45] Matt Gregory
Description:
using 2 identical Dell 1850s, dual xeon 8Gb ram, each running mysqld, ndbd ndb_mgmd. During load testing the cluster fails sporadically with the error

2006-01-30 11:04:44,133 ERROR [http-8080-Processor60]:xmlRequest line:383 java.sql.SQLException: Got error 1 'Unknown error code' from ndbcluster

the load test consists of selects, updates and inserts over four tables each using the ndbcluster engine. There doesn't seem to be a problem when not running under load, the errors occurr with about 100 concurrent web users, each web request spawning 2 selects, and 3 inserts/updates . Using innodb tables in place of ndbcluster solves the problem and also causes the whole app to run approximately twice as fast, but we need the clustering.

We're using the latest version of connectorJ from tomcat5.5.12

Any help would be appreciated as we launch our product the day after tomorrow!

the cluster config file is as follows:
[NDBD DEFAULT]
NoOfReplicas=2    # Number of replicas
DataMemory=3500M    # How much memory to allocate for data storage
IndexMemory=1000M   # How much memory to allocate for index storage
MaxNoOfConcurrentOperations=1048576
MaxNoOfConcurrentTransactions= 1048576
MaxNoOfLocalOperations=1048576
MaxNoOfConcurrentIndexOperations=16384
MaxNoOfConcurrentScans=500
LockPagesInMainMemory=Y
# TCP/IP options:
[TCP DEFAULT]
portnumber=2202   # This the default; however, you can use any

# Management process options:
[NDB_MGMD]
hostname=192.168.254.1          # Hostname or IP address of MGM node
datadir=/var/lib/mysql-cluster  # Directory for MGM node logfiles

[NDB_MGMD]
hostname=192.168.254.2          # Hostname or IP address of MGM node
datadir=/var/lib/mysql-cluster  # Directory for MGM node logfiles

# Options for data node "A":
[NDBD]
                                # (one [NDBD] section per data node)
hostname=192.168.254.1          # Hostname or IP address
datadir=/usr/local/mysql/data   # Directory for this data node's datafiles

# Options for data node "B":
[NDBD]
hostname=192.168.254.2          # Hostname or IP address
datadir=/usr/local/mysql/data   # Directory for this data node's datafiles

# SQL node options:
[MYSQLD]
hostname=192.168.254.1
[MYSQLD]
hostname=192.168.254.2

my.cnf file is

# Example MySQL config file for very large systems.
#
# This is for a large system with memory of 1G-2G where the system runs mainly
# MySQL.
#
# You can copy this file to
# /etc/my.cnf to set global options,
# mysql-data-dir/my.cnf to set server-specific options (in this
# installation this directory is /usr/local/mysql/data) or
# ~/.my.cnf to set user-specific options.
#
# In this file, you can use all long options that a program supports.
# If you want to know which options a program supports, run the program
# with the "--help" option.

# The following options will be passed to all MySQL clients
[client]
#password       = your_password
port            = 3306
socket          = /tmp/mysql.sock

# Here follows entries for some specific programs

# The MySQL server
[mysql_cluster]
ndb-connectstring=192.168.254.2
[ndbd]
connect-string=192.168.254.2
[ndb_mgm]
connect-string=192.168.254.2
[ndb_mgmd]
config-file=/var/lib/mysql-cluster/config.ini
[mysqld]
set-variable=max_connections=1500
ndbcluster
ndb-connectstring=192.168.254.2
port            = 3306
socket          = /tmp/mysql.sock
skip-locking
key_buffer = 384M
max_allowed_packet = 1M
table_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size = 32M
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 8

# Don't listen on a TCP/IP port at all. This can be a security enhancement,
# if all processes that need to connect to mysqld run on the same host.
# All interaction with mysqld must be made via Unix sockets or named pipes.
# Note that using this option without enabling named pipes on Windows
# (via the "enable-named-pipe" option) will render mysqld useless!
#
#skip-networking

# Replication Master Server (default)
# binary logging is required for replication
log-bin=mysql-bin

# required unique id between 1 and 2^32 - 1
# defaults to 1 if master-host is not set
# but will not function as a master if omitted
server-id       = 1

# Replication Slave (comment out master section to use this)
#
# To configure this host as a replication slave, you can choose between
# two methods :
#
# 1) Use the CHANGE MASTER TO command (fully described in our manual) -
#    the syntax is:
#
#    CHANGE MASTER TO MASTER_HOST=<host>, MASTER_PORT=<port>,
#    MASTER_USER=<user>, MASTER_PASSWORD=<password> ;
#
#    where you replace <host>, <user>, <password> by quoted strings and
#    <port> by the master's port number (3306 by default).
#
#    Example:
#
#    CHANGE MASTER TO MASTER_HOST='125.564.12.1', MASTER_PORT=3306,
#    MASTER_USER='joe', MASTER_PASSWORD='secret';
#
# OR
#
# 2) Set the variables below. However, in case you choose this method, then
#    start replication for the first time (even unsuccessfully, for example
#    if you mistyped the password in master-password and the slave fails to
#    connect), the slave will create a master.info file, and any later
#    change in this file to the variables' values below will be ignored and
#    overridden by the content of the master.info file, unless you shutdown
#    the slave server, delete master.info and restart the slaver server.
#    For that reason, you may want to leave the lines below untouched
#    (commented) and instead use CHANGE MASTER TO (see above)
#
# required unique id between 2 and 2^32 - 1
# (and different from the master)
# defaults to 2 if master-host is set
# but will not function as a slave if omitted
#server-id       = 2
#
# The replication master for this slave - required
#master-host     =   <hostname>
#
# The username the slave will use for authentication when connecting
# to the master - required
#master-user     =   <username>
#
# The password the slave will authenticate with when connecting to
# the master - required
#master-password =   <password>
#
# The port the master is listening on.
# optional - defaults to 3306
#master-port     =  <port>
#
# binary logging - not required for slaves, but recommended
#log-bin=mysql-bin

# Point the following paths to different dedicated disks
#tmpdir         = /tmp/
#log-update     = /path-to-dedicated-directory/hostname

# Uncomment the following if you are using BDB tables
#bdb_cache_size = 384M
#bdb_max_lock = 100000

# Uncomment the following if you are using InnoDB tables
#innodb_data_home_dir = /usr/local/mysql/data/
#innodb_data_file_path = ibdata1:2000M;ibdata2:10M:autoextend
#innodb_log_group_home_dir = /usr/local/mysql/data/
#innodb_log_arch_dir = /usr/local/mysql/data/
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
#innodb_buffer_pool_size = 384M
#innodb_additional_mem_pool_size = 20M
# Set .._log_file_size to 25 % of buffer pool size
#innodb_log_file_size = 100M
#innodb_log_buffer_size = 8M
#innodb_flush_log_at_trx_commit = 1
#innodb_lock_wait_timeout = 50

[mysqldump]
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates

[isamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M

[myisamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive-timeout

How to repeat:
create a cluster across 2 RedHat Enterprise linux machines, each 8Gb RAM, dual xeon. Use tomcat with approx 100 concurrent users each user spawning 2 select and 3 insert/updates across 4 tables on the cluster.
[30 Jan 2006 11:47] Matt Gregory
Forgot to mention we're using dual gigabit NICs aggregated to a single IP
[30 Jan 2006 12:18] Valeriy Kravchuk
Thank you for a problem report. We need the SHOW CREATE TABLE results for those tables, some sample data and exect statements one have to execute to repeat the problem you described.

Is there anything unusual in the error logs of your cluster nodes?
[30 Jan 2006 15:07] Matt Gregory
sample data

Attachment: sampledata.txt (text/plain), 26.00 KiB.

[30 Jan 2006 15:08] Matt Gregory
queries run to cause the error

Attachment: queries.txt (text/plain), 1.25 KiB.

[30 Jan 2006 15:08] Matt Gregory
table sql

Attachment: createtables.txt (text/plain), 2.23 KiB.

[30 Jan 2006 15:10] Matt Gregory
Hi,
   please find attached the queries, tables and sample data as text files. We are not seeing anything unusual in the ndb logs. As you can see from the quieries we are using hibernate, would this have any bearing?

thanks
[31 Jan 2006 8:31] Hartmut Holzgraefe
can you try to run this with just one NIC assigend to the address and the other one temporarily disabled?
[31 Jan 2006 14:36] Matt Gregory
unfortunately disabling one of the NICs on all the boxes had no effect, the problem still exists
[31 Jan 2006 16:12] Hartmut Holzgraefe
Well, at least we've ruled out a possible cause ...
[1 Feb 2006 12:51] Matt Gregory
we've noticed that the number of open tables is very high when the error occurs, and that issuing a flush tables command drops this down and seems to delay the error.
[3 Feb 2006 12:59] Hartmut Holzgraefe
Could you provide the mysql and cluster error logs so that
we can check them for ourselves?
[4 Mar 2006 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[15 Jul 2010 9:16] Tsutsui T
Me 2!
[29 Jul 2010 14:05] Jørgen Austvik
Tsutsui T: Please add mysql and cluster error logs using http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-programs-ndb-error-reporter.html
[29 Aug 2010 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".