MySQL Bugs: #17788: LCP should start on out of Redo, ndb

Bug #17788	LCP should start on out of Redo, ndb_restore should retry more
Submitted:	28 Feb 2006 15:57	Modified:	11 Sep 2009 7:33
Reporter:	Johan Andersson	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	mysql-4.1	OS:	Any (*)
Assigned to:		CPU Architecture:	Any
Tags:	4.1->

Description:
ndb_restore is not so dynamic and can easily cause redo log associated errors.
ndb_restore should take into account the tuple size, so that it can adapt the parallelism so it does not try to push too much data, so that neither Redo buffers or Send buffers are exploded. 

Also, when restoring, it would be nice if ndb_restore does not terminate with "aborted".

How to repeat:
N/A

Suggested fix:
Adaptiveness

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/18534

ChangeSet@1.2539, 2007-01-22 20:11:07+08:00, gni@dev3-221.dev.cn.tlan +7 -0
  BUG#17788 ndb_restore has more 'adaptive' functions. When the 410 temperary error occurs,
  It will send LCP immediately start signal.

how to reproduce it:
1. create a table and insert records into it. 
   You must ensure that the size of the table is greater than Redo log size.(For example , if you use the default value about TimeBetweenLocalCheckpoints and NoOfFragmentLogFiles, the table size is greater than 800M). 
  You can set a large value for TimeBetweenLocalCheckpoints  and a small value for NoOfFragmentLogFiles, and then you can use a small size table.
2. start backup in ndb_mgm
3. restart ndb cluster with --initial option
4. ndb_restore it with the backup data
  During the process of ndb_restore, you will get the error message

What's the status with this bug?

Last conversation I can find is at the end of January (and I think we had some IRC discussion too). Basically saying that we should be able to trigger the start LCP from kernel on error instead... with me not liking the use of the dump interface here.

current status?

Hi Jonas,
   Stewart suggests that i should define a new signal to start LCP, and if i use NDB API, i should add new interface in the Ndb class, or to trigger the
start LCP from kernel on error instead. whichever method i adopted, the new signals must be defined.
  what do you think? please give your suggestion.

  thanks!

/Guangbao Ni

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/29661

ChangeSet@1.2473, 2007-06-27 09:36:59+08:00, gni@dev3-221.dev.cn.tlan +9 -0
  BUG#17788 ndb_restore is too static in its behavior.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/30420

ChangeSet@1.2473, 2007-07-06 11:49:16+08:00, gni@dev3-221.dev.cn.tlan +9 -0
  BUG#17788 ndb_restore is too static in its behavior.

Looks okay to me.

Since Jonas is on vacation, Pekka - can you have a quick look too?

I think we should only apply this to 5.1 though.

I would still like a test case.

please also check what ndb_restore does in the temporary error situation... does it retry for long enough? or does it give up at some point? if it gives up.... this is a problem with large LCP

Hi Stewart,
  Before fixed, it will abort after 10 retries for the same transaction.
the patch is to solve the problem, make it be self-recoverable from the temperaary error.

I think we should continue to retry (not limit it to 10). Naturally displaying some kind of warning though.

Setting back to In Progress as still something to be done.

Hi Stewart,
   if a test case wants to insert a error to ndbd kernel, it will use the NdbTamper()  (NDBAPI) and NDB_TAMPER signal?
   the test case should be put in ndb/test/ndbapi directory?

there's an mgmapi function to do it:

  /**
   * Provoke an error.
   *
   * @param handle the NDB management handle.
   * @param nodeId the node id.
   * @param errrorCode the errorCode.
   * @param reply the reply message.
   * @return 0 if successful or an error code.
   */
  int ndb_mgm_insert_error(NdbMgmHandle handle,
                           int nodeId, 
                           int errorCode,
                           struct ndb_mgm_reply* reply);

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/33774

ChangeSet@1.2473, 2007-09-06 09:36:12+08:00, gni@dev3-221.dev.cn.tlan +11 -0
  BUG#17788 LCP should start on out of Redo, ndb_restore should retry more.

I think the test program is missing from the patch.

also, this makes retries==100, not "infinite" in restore.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/37720

ChangeSet@1.2473, 2007-11-14 10:24:52+08:00, gni@dev3-221.dev.cn.tlan +12 -0
  BUG#17788 LCP should start on out of Redo, ndb_restore should retry more.