Bug #43882 ndb GCP stop(3) while importing tables with big (till 470KB) longtext columns
Submitted: 26 Mar 2009 13:25 Modified: 6 Oct 2009 13:18
Reporter: Jos? Luis Gordo Romero Email Updates:
Status: Verified Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1-telco-6.3 OS:Linux (ubuntu server 7.04)
Assigned to: Assigned Account CPU Architecture:Any
Tags: hang, longtext, mysql-5.1-telco-6.3.20, ndb

[26 Mar 2009 13:25] Jos? Luis Gordo Romero
I have a setup with two data nodes + two api nodes (on two servers) + 1 separated mng sever.

Importing a table with a big longtext column (till 470KB) the ndb hangs:

moving extents (1364 1410) to real free list 1454
Detected GCP stop(3)...sending kill to [SignalCounter: m_count=1 0000000000000010]
start_resend(1, empty bucket (11471/53 11471/52) -> active
REMOVING lcp: 6 from table: 0 frag: 0 node: 4
REMOVING lcp: 6 from table: 0 frag: 1 node: 4
2009-03-25 16:09:10 [ndbd] INFO     -- dbtc/DbtcMain.cpp
2009-03-25 16:09:10 [ndbd] INFO     -- DBTC (Line: 4420) 0x0000000a
2009-03-25 16:09:10 [ndbd] INFO     -- Error handler shutting down system
2009-03-25 16:09:11 [ndbd] INFO     -- Error handler shutdown completed - exiting
2009-03-25 16:09:11 [ndbd] ALERT    -- Node 3: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2009-03-25 16:12:30 [ndbd] INFO     -- Angel pid: 5393 ndb pid: 5394

Sometimes the mysqld also hangs with:
*** glibc detected *** /usr/local/mysql/libexec/mysqld: double free or corruption (top): 0x00002aaaab48cbc0 ***
start_resend(0, min: 1207/7 - max: 1207/8) page: 6

And only starts again with an initial of the ndb's

I tried the longtext column in memory and disk with the same hang, but when stored in disk sometimes the server freeze completly (make pings but no connections and when reboot nothing in syslog).

Also I tried in two separated enviroments (in order to discard hard problems).

How to repeat:
Import a table with a longtext column with 470KB
[8 Apr 2009 16:03] Magnus Blåudd
We would be very happy if you could provide some additional information about your table schema, how many rows you loaded etc. Have it occured more than once?

Please also use ndb_error_reporter script to create a tarfile and upload it to this bug.
[16 Apr 2009 12:56] Jonathan Miller
Please respond to Magnus's request.

Also how is table imported (mysql load, restore) transaction on or off?

Possible workaround.
[16 Apr 2009 13:28] Jonathan Miller
[22 Apr 2009 8:48] Jos? Luis Gordo Romero
First sorry for the delay, we have the system on production (we be able to import the data with a custom script with minimal delays between inserts), and the development machine was for other tests.

I reinstall the test enviroment and I'm having other non-ndb problems with it and I'm not able to reproduce the problem, sorry.

Some info: the table has 11.000 records with a longtext column till 470KB (200KB average):

  `document_id` int(11) DEFAULT NULL STORAGE MEMORY,
  `version` int(11) DEFAULT NULL STORAGE MEMORY,
  `name` varchar(255) DEFAULT '' STORAGE MEMORY,
  `creator_id` int(11) DEFAULT NULL STORAGE MEMORY,
  `content` longtext STORAGE DISK,
  `created_at` datetime DEFAULT NULL STORAGE MEMORY,
  `updated_at` datetime DEFAULT NULL STORAGE MEMORY,
  `editor_id` int(11) DEFAULT NULL STORAGE MEMORY,
  `reference_id` int(11) DEFAULT NULL STORAGE MEMORY,
  `digest` varchar(40) DEFAULT NULL STORAGE MEMORY,
  `locked` tinyint(1) DEFAULT NULL STORAGE MEMORY,
  PRIMARY KEY (`id`);
[28 Apr 2009 8:55] Magnus Blåudd
Thanks for that info. Unfortunately is is currently possible to cause a "GCP stop"(i.e it takes too long for the global checkpoint protocol to complete) by committing a large number of rows in one transaction, see BUG#43069. Especially when having a blob on disk like you do. The INSERT to the blob column is behind the scenes split into 8k INSERTS into a blob table so all in all it adds up to quite a large transaction just because of that.

We have some ideas how to fix it, but for now I would suggest you configure the system differently to cope with the load or rewrite the INSERTS slightly. 

/ Magnus