Bug #43882 ndb GCP stop(3) while importing tables with big (till 470KB) longtext columns
Submitted: 26 Mar 14:25 Modified: 6 Oct 15:18
Reporter: Jos? Luis Gordo Romero
Status: Verified
Category:Server: Cluster Severity:S1 (Critical)
Version:mysql-5.1-telco-6.3 OS:Linux (ubuntu server 7.04)
Assigned to: Frazer Clement Target Version:
Tags: mysql-5.1-telco-6.3.20, ndb, hang, longtext
Triage: Triaged: D2 (Serious) / R6 (Needs Assessment) / E6 (Needs Assessment)

[26 Mar 14:25] Jos? Luis Gordo Romero
Description:
I have a setup with two data nodes + two api nodes (on two servers) + 1 separated mng
sever.

Importing a table with a big longtext column (till 470KB) the ndb hangs:

moving extents (1364 1410) to real free list 1454
Detected GCP stop(3)...sending kill to [SignalCounter: m_count=1 0000000000000010]
start_resend(1, empty bucket (11471/53 11471/52) -> active
REMOVING lcp: 6 from table: 0 frag: 0 node: 4
REMOVING lcp: 6 from table: 0 frag: 1 node: 4
2009-03-25 16:09:10 [ndbd] INFO     -- dbtc/DbtcMain.cpp
2009-03-25 16:09:10 [ndbd] INFO     -- DBTC (Line: 4420) 0x0000000a
2009-03-25 16:09:10 [ndbd] INFO     -- Error handler shutting down system
2009-03-25 16:09:11 [ndbd] INFO     -- Error handler shutdown completed - exiting
2009-03-25 16:09:11 [ndbd] ALERT    -- Node 3: Forced node shutdown completed. Caused by
error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error
or missing error message, please report a bug). Temporary error, restart node'.
2009-03-25 16:12:30 [ndbd] INFO     -- Angel pid: 5393 ndb pid: 5394

Sometimes the mysqld also hangs with:
*** glibc detected *** /usr/local/mysql/libexec/mysqld: double free or corruption (top):
0x00002aaaab48cbc0 ***
start_resend(0, min: 1207/7 - max: 1207/8) page: 6

And only starts again with an initial of the ndb's

I tried the longtext column in memory and disk with the same hang, but when stored in
disk sometimes the server freeze completly (make pings but no connections and when reboot
nothing in syslog).

Also I tried in two separated enviroments (in order to discard hard problems).

How to repeat:
Import a table with a longtext column with 470KB
[8 Apr 18:03] Magnus Blaudd
We would be very happy if you could provide some additional information about your table
schema, how many rows you loaded etc. Have it occured more than once?

Please also use ndb_error_reporter script to create a tarfile and upload it to this bug.
http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-utilities-ndb-error-reporter.html
[16 Apr 14:56] Jonathan Miller
Please respond to Magnus's request.

Also how is table imported (mysql load, restore) transaction on or off?

Possible workaround.
[16 Apr 15:28] Jonathan Miller
http://bugs.mysql.com/bug.php?id=39498
http://bugs.mysql.com/bug.php?id=37227
[22 Apr 10:48] Jos? Luis Gordo Romero
First sorry for the delay, we have the system on production (we be able to import the data
with a custom script with minimal delays between inserts), and the development machine was
for other tests.

I reinstall the test enviroment and I'm having other non-ndb problems with it and I'm not
able to reproduce the problem, sorry.

Some info: the table has 11.000 records with a longtext column till 470KB (200KB
average):

CREATE TABLE `x` (
  `id` int(11) NOT NULL AUTO_INCREMENT STORAGE MEMORY,
  `document_id` int(11) DEFAULT NULL STORAGE MEMORY,
  `version` int(11) DEFAULT NULL STORAGE MEMORY,
  `name` varchar(255) DEFAULT '' STORAGE MEMORY,
  `creator_id` int(11) DEFAULT NULL STORAGE MEMORY,
  `content` longtext STORAGE DISK,
  `created_at` datetime DEFAULT NULL STORAGE MEMORY,
  `updated_at` datetime DEFAULT NULL STORAGE MEMORY,
  `editor_id` int(11) DEFAULT NULL STORAGE MEMORY,
  `reference_id` int(11) DEFAULT NULL STORAGE MEMORY,
  `digest` varchar(40) DEFAULT NULL STORAGE MEMORY,
  `locked` tinyint(1) DEFAULT NULL STORAGE MEMORY,
  PRIMARY KEY (`id`);
[28 Apr 10:55] Magnus Blaudd
Thanks for that info. Unfortunately is is currently possible to cause a "GCP stop"(i.e it
takes too long for the global checkpoint protocol to complete) by committing a large
number of rows in one transaction, see BUG#43069. Especially when having a blob on disk
like you do. The INSERT to the blob column is behind the scenes split into 8k INSERTS
into a blob table so all in all it adds up to quite a large transaction just because of
that.

We have some ideas how to fix it, but for now I would suggest you configure the system
differently to cope with the load or rewrite the INSERTS slightly. 

/ Magnus