Bug #43069 GCP stop in rather simple setup
Submitted: 20 Feb 2009 22:24 Modified: 25 Mar 2009 11:10
Reporter: Hartmut Holzgraefe Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.1-telco-7.0 OS:Linux
Assigned to: Assigned Account CPU Architecture:Any

[20 Feb 2009 22:24] Hartmut Holzgraefe
Description:
I ran into a reproducible GCP stop case today that, unlike bug #41292 or #37227,
does not involve disk based tables or concurrency but happens reliably with a rather basic configuration using mostly defaults and a simple query script.

The test script first populates a simple table until data memory
is about 90% full, then it deletes batches of rows with even
auto_increment values. 

One data node dies pretty quick with a "GCP stop" error message
when executing the batch deletes, the other node continues to
work though ...

(So far i've been running this on a single dual core machine,
so it may be an overload issue, and loosing one data node
cuts down the active processes from two ndbds and one mysqld
to just one ndbd and mysqld each so that each gets a core
to utilize on its own ...)

How to repeat:
== config.ini ==

[NDBD DEFAULT]
NoOfReplicas= 2
datadir=/usr/local/mysql-ndb-6.3.22/cluster
DataMemory=40M
IndexMemory=20M

[MYSQLD DEFAULT]

[NDB_MGMD DEFAULT]
datadir=/usr/local/mysql-ndb-6.3.22/cluster

[COMPUTER]
Id= 1
HostName= 127.0.0.1

[NDB_MGMD]
Id= 1
ExecuteOnComputer= 1

[NDBD]
Id= 2
ExecuteOnComputer= 1

[NDBD]
Id= 3
ExecuteOnComputer= 1

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

== test.sql ===

drop table if exists t1;

create table t1(
  id int auto_increment, 
  msg varchar(255),
  primary key(id)
) engine=ndb;

insert into t1 select null, md5(rand());

insert into t1 select null, md5(rand()) FROM t1;
insert into t1 select null, md5(rand()) FROM t1;
insert into t1 select null, md5(rand()) FROM t1;
insert into t1 select null, md5(rand()) FROM t1;
insert into t1 select null, md5(rand()) FROM t1;
insert into t1 select null, md5(rand()) FROM t1;
insert into t1 select null, md5(rand()) FROM t1;
insert into t1 select null, md5(rand()) FROM t1;
insert into t1 select null, md5(rand()) FROM t1;
insert into t1 select null, md5(rand()) FROM t1;
insert into t1 select null, md5(rand()) FROM t1;
insert into t1 select null, md5(rand()) FROM t1;
insert into t1 select null, md5(rand()) FROM t1;
insert into t1 select null, md5(rand()) FROM t1;

insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;
insert into t1 select null, md5(rand()) FROM t1 LIMIT 10000;

DELETE FROM t1 WHERE id%2=0 LIMIT 20000;
DELETE FROM t1 WHERE id%2=0 LIMIT 20000;
DELETE FROM t1 WHERE id%2=0 LIMIT 20000;
DELETE FROM t1 WHERE id%2=0 LIMIT 20000;
DELETE FROM t1 WHERE id%2=0 LIMIT 20000;
DELETE FROM t1 WHERE id%2=0 LIMIT 20000;
DELETE FROM t1 WHERE id%2=0 LIMIT 20000;
DELETE FROM t1 WHERE id%2=0 LIMIT 20000;
DELETE FROM t1 WHERE id%2=0 LIMIT 20000;
DELETE FROM t1 WHERE id%2=0 LIMIT 20000;
DELETE FROM t1 WHERE id%2=0 LIMIT 20000;
DELETE FROM t1 WHERE id%2=0 LIMIT 20000;
[20 Feb 2009 22:27] Hartmut Holzgraefe
ndb_error_reporter results

Attachment: ndb_error_report_20090220232550.tar.bz2 (application/x-bzip, text), 73.07 KiB.

[23 Feb 2009 15:52] Jonas Oreland
FYI, run this 25 times in loop on my desktop machine (which is a quad-core) 
wo/ any problems.

One note is that one should probably increase sendbuffermemory
(to avoid the warnings)

---

I thinking about adding significantly more printouts when this happens,
can you then retest, and see if they give some useful information?

/Jonas
[24 Feb 2009 12:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/67365

2878 Jonas Oreland	2009-02-24
      ndb - bug#43069 - add more printouts in case of gcp stop
[24 Feb 2009 12:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/67366

2882 Jonas Oreland	2009-02-24
      ndb - bug#43069 - add more printouts in case of gcp stop
[24 Feb 2009 13:03] Bugs System
Pushed into 5.1.32-ndb-6.3.24 (revid:jonas@mysql.com-20090224130145-2b7pzs3s5si2y3is) (version source revid:jonas@mysql.com-20090224125843-4ctsncrwgh7c96jo) (merge vers: 5.1.32-ndb-6.3.23) (pib:6)
[24 Feb 2009 13:04] Bugs System
Pushed into 5.1.32-ndb-6.4.4 (revid:jonas@mysql.com-20090224130242-97fj3euks0530ge1) (version source revid:jonas@mysql.com-20090224130025-azsxmcsqr11ipd02) (merge vers: 5.1.32-ndb-6.4.4) (pib:6)
[25 Feb 2009 11:59] Hartmut Holzgraefe
ndb_error_reporter logs from test run with patched ndb-6.3.22

Attachment: ndb_error_report_20090225125741.tar.bz2 (application/x-bzip, text), 74.15 KiB.

[26 Feb 2009 9:59] Hartmut Holzgraefe
logs from additional test runs

Attachment: ndb_error_reporter_logs.tar.gz (application/x-gzip, text), 254.21 KiB.

[25 Mar 2009 11:10] Jonathan Miller
see also http://bugs.mysql.com/bug.php?id=37227