MySQL Bugs: #25119: Data nodes died during inserting 1M records through INSERT INTO ... SELECT FROM

Bug #25119	Data nodes died during inserting 1M records through INSERT INTO ... SELECT FROM
Submitted:	17 Dec 2006 16:36	Modified:	12 Mar 2007 12:10
Reporter:	Serge Kozlov	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	5.1.15-bk	OS:	Linux (Linux FC4)
Assigned to:		CPU Architecture:	Any

Description:
The attached script (sqe.pl and aa.txt) creates one ndb_dd table with 1M records and then copies these records into second table via INSERT INTO ... SELECT FROM in loop. Though all options have high values (DataMemory, MaxNoOfConcurrentOperations, MaxNoOfLocalOperations, undofiles, datafiles) data nodes had crash:

Current byte-offset of file-pointer is: 568

Time: Sunday 17 December 2006 - 16:59:36
Status: Permanent error, external action needed
Message: Signal lost, out of send buffer memory, please increase SendBufferMemor
y or lower the load (Resource configuration error)
Error: 6052
Error data: Remote note id 2.
Error object: TransporterCallback.cpp
Program: ./builds/libexec/ndbd
Pid: 13239
Trace: /space/run/ndb_3_trace.log.1
Version: Version 5.1.15 (beta)
***EOM***

Current byte-offset of file-pointer is: 568

Time: Sunday 17 December 2006 - 17:10:17
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing
error message, please report a bug)
Error: 6000
Error data: Signal 11 received; Segmentation fault
Error object: main.cpp
Program: ./builds/libexec/ndbd
Pid: 13232
Trace: /space/run/ndb_2_trace.log.1
Version: Version 5.1.15 (beta)
***EOM***

How to repeat:
1. Use configuration from attached file. Main options from one are:
DataMemory: 1G
IndexMemory: 500M
MaxNoOfConcurrentOperations: 2M
MaxNoOfLocalOperations: 2M
2. Start cluster.
3. Run the script:
 ./sqe.pl -q aa.txt -p 127.0.0.1:3306:root::test
4. Wait while the script will show '4009' error and look error log files

trace, log files, config.ini, perl script

Attachment: bug25119.tar.gz (application/gzip, text), 161.25 KiB.

trace files from node 2 is missing.

trace files for node 2

Attachment: bug25119-trace-node-2.tar.gz (application/gzip, text), 141.87 KiB.

Hi,

Could you test if problem is related to relativly small undo_buffer_size,
  by increasing it to say 8M

/Jonas

I used undo_buffer_size=8M, 15M, and 20M and got same results (crash).

Hi,

I tried this today, but I failed already at trying to use such a big configuration.

My machine does only have 2G of ram, how big machine are you using?
Can you run this with LockPagesInMemory (note need to be root, or correctly set ulimit)

Then, doing a 1M row transation is very much not recommended.
Some algorithms are not adapted to big transactions.
Does this work for MM?

Anyway, so I added limit clauses here and there, and then it works like a charm.
(also using a configuration that I could run wo/ swapping on my machine)

---

So, conclusion: there might is probably a bug somewhere.
But wo/ a more realistic test-case, it's very hard to estimate how likely
  this is to get "in real life".

My guess would be that you could maybe recreate a similar bug with reasonable
  sized transactions and a small value for SharedGlobalMemory.

(i added limit 50000, you can ask johan what I suggest as maximum for customer to use)

/Jonas

Also I got same error for query CREATE TABLE ... SELECT FROM ... if source table has 1M rows. I expected that because in fact same transaction uses for that as for INSERT INTO ... SELECT FROM ...