Bug #66104 | MySQL-Cluster Online backup error | ||
---|---|---|---|
Submitted: | 30 Jul 2012 20:48 | Modified: | 7 Sep 2012 6:31 |
Reporter: | jose ferrero | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 7.2.7 | OS: | Linux (Debian 6) |
Assigned to: | Ole John Aske | CPU Architecture: | Any |
Tags: | 3001: Could not start backup, Backup, mysql-cluster |
[30 Jul 2012 20:48]
jose ferrero
[3 Aug 2012 10:53]
jose ferrero
Jay Ward made a wonderfull debugging of the same problem ( http://forums.mysql.com/read.php?25,563119,563457#msg-563457 ). Please find attached the traces.
[3 Aug 2012 10:54]
jose ferrero
Debugging traces made by Jay Ward
Attachment: debug.txt (text/plain), 8.58 KiB.
[12 Aug 2012 15:49]
Jay Ward
ndb_error_report for second cluster built with this same problem
Attachment: ndb_error_report_20120812114527.tar.bz2 (application/octet-stream, text), 265.14 KiB.
[13 Aug 2012 9:41]
Mehmet Onur YALAZI
I also have this problem on Solaris amd64 mc-7.2.7. The crash occurs if I use ndbmtd but will not occur if I use ndbd. Using 8 threads.
[13 Aug 2012 14:18]
Jay Ward
This happens whenever I start the cluster with more than one ldm thread (which makes sense, since with only one ldm thread, it can talk to all ldm threads). I was able to predictably recreate this using 'Recommended Starting Configuration for MySQL Cluster' (http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-config-starting.html) and adding these lines: SharedGlobalMemory=2G // the default value causes out of job buffer memory DiskPageBufferMemory=1G // Just in case // To use multicores efficiently. 12 core machine: // 11 - Main/IO Thread // 10 - Rep // 9 - TC // 8 - Left to OS (shown to receive most interrupts) // 7 - Left to OS (shown to receive second most interrupts) // 6 - TC // 5 - Recv // 4 - Send // 3 - LDM // 2 - LDM // 1 - LDM // 0 - LDM ThreadConfig=main={count=1,cpubind=11},io={count=1,cpubind=11},rep={count=1,cpubind=10},tc={count=2,cpubind=6,9},recv={count=1,cpubind=5},send={count=1,cpubind=4},ldm={count=4,cpubind=0-3} Taking out the ThreadConfig line makes the problem go away, auto assignment looks like this: ThreadConfig: input: LockExecuteThreadToCPU: => parsed: main,ldm,recv,rep NDBMT: MaxNoOfExecutionThreads=4 NDBMT: workers=1 threads=1 tc=0 send=0 receive=1 And with only one worker, all workers can talk to all workers, and so the backup succeeds. I can supply a core dump if needed. Jay
[13 Aug 2012 15:59]
Jay Ward
Jose posted his config.ini in his original forum thread: http://forums.mysql.com/read.php?25,563119,564783#msg-564783 He is just using the line MaxNoOfExecutionThreads=8 to generate this error.
[13 Aug 2012 18:53]
Ole John Aske
Jay: Thank you for a very detailed bug description. Due to that it was very easy to identify the bug. It seems to be a regression introduced in 7.2.7. As you have already indicated, a possible (though poor) workaround is to avoid using ndbmtd, or at least restrict it to have only a single LDM.
[13 Aug 2012 19:39]
Jay Ward
Ole, Thank you very much. I did post this as a question on Mikael Ronstrom's blog and he said, "No, you missed nothing, you hit a bug. We also discovered this very recently and actually discussed the fix today :) A fix is in the works." So... Thanks guys for being on top of this! I'm reducing our LDM thread count to 1 temporarily until the fix is produced. Again, thank you! Jay
[4 Sep 2012 14:03]
Jay Ward
I hate to be a bother, but is there any update on this? Do we have a time frame for it getting resolved?
[7 Sep 2012 6:31]
Ole John Aske
This bug has been fixed in MySQL CLuster 7.2.8 which is now available on http://dev.mysql.com/downloads/cluster/