Bug #44195 | NDBMTD - DD - PGMAN (Line: 1470) - Internal program error (failed ndbrequire) | ||
---|---|---|---|
Submitted: | 9 Apr 2009 20:26 | Modified: | 9 Oct 2009 13:14 |
Reporter: | Jonathan Miller | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Disk Data | Severity: | S2 (Serious) |
Version: | mysql-5.1-telco-7.0 | OS: | Linux |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[9 Apr 2009 20:26]
Jonathan Miller
[9 Apr 2009 20:26]
Jonathan Miller
Error and trace logs
Attachment: pgman.tgz (application/x-compressed-tar, text), 244.91 KiB.
[9 Apr 2009 20:54]
Jonathan Miller
Not sure this will help or not, but... dbt2 was doing a new_order transaction and hit a (Lock wait timeout exceeded) Thu Apr 9 11:54:36 2009 Microseconds : 405658 tid:1095936320 mysql/dbc_new_order.c:88 mysql reports: SQL: call new_order(9, 1, 72, 1, 15, 48856, 9, 5, 67518, 9, 1, 73728, 9, 1, 96256, 9, 2, 53248, 9, 3, 81406, 9, 3, 26592, 9, 4, 38892, 9, 9, 57343, 9, 6, 84911, 9, 6, 36352, 9, 9, 60387, 9, 3, 98303, 9, 8, 47102, 9, 8, 40911, 9, 6, @rc), ERROR: 1205 Lock wait timeout exceeded; try restarting transaction Thu Apr 9 11:54:36 2009 Microseconds : 405680 tid:1095936320 mysql/dbc_common.c:97 ROLLBACK INITIATED Next.... DN Failure Thu Apr 9 11:54:42 2009 Microseconds : 366238 tid:1075546432 mysql/dbc_new_order.c:88 mysql reports: SQL: call new_order(3, 7, 928, 0, 9, 81920, 3, 9, 46588, 3, 5, 24576, 3, 2, 89536, 3, 7, 10172, 3, 2, 52592, 7, 8, 72702, 3, 2, 69613, 3, 4, 97607, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, @rc), ERROR: 1297 Got temporary error 286 'Node failure caused abort of transaction' from NDBCLUSTER Thu Apr 9 11:54:42 2009 Microseconds : 366309 tid:1075546432 mysql/dbc_common.c:97 ROLLBACK INITIATED The New-Order Transaction The New-Order business transaction consists of entering a complete order through a single database transaction. It represents a mid-weight, read-write transaction with a high frequency of execution and stringent response time requirements to satisfy on-line users. This transaction is the backbone of the workload. It is designed to place a variable load on the system to reflect on-line database activity as typically found in production environments. Entering a new order is done in a single database transaction with the following steps: 1. Create an order header, comprised of: 2 row selections with data retrieval, 1 row selection with data retrieval and update, 2 row insertions. 2. Order a variable number of items (average ol_cnt = 10), comprised of: (1 * ol_cnt) row selections with data retrieval, (1 * ol_cnt) row selections with data retrieval and update, (1 * ol_cnt) row insertions.
[10 Apr 2009 3:43]
Jonathan Miller
Sorry, meant to add this sooner: clone:mysql-5.1-telco-7.0 rundate_2009-04-09_09:29 build_script:/space/cluster_rep_auto/scripts/boot.sh ------------------------------------------------------------ revno: 2890 revision-id: frazer@mysql.com-20090408222804-ve6xi3f1gaunzzxy parent: frazer@mysql.com-20090408181037-8hsxlwqqydgl8f72 parent: frazer@mysql.com-20090408222513-d9ddlkxomto876ld committer: Frazer Clement branch nick: mysql-5.1-telco-6.4 timestamp: Wed 2009-04-08 23:28:04 +0100 message: Merge 6.3->7.0
[27 Apr 2009 12:59]
Jonathan Miller
Hi, Found that testing hit this error again, but using a different test. The framework was running TPC-"B" on disk data this time using 4 DN in a "MT Mixed" mode. (i.e. 2 NDB Single Threaded, 2 NDB Multiple threads). The crashing DN was an NDBMTD. Time: Sunday 26 April 2009 - 23:05:18 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: pgman.cpp Error object: PGMAN (Line: 1470) 0x00000006 Program: /data0/cr_autotest/libexec/ndbmtd Pid: 18068 Trace: ./ndb_3_trace.log.1 ./ndb_3_trace.log.1_t1 ./ndb_3_trace.log.1_t2 ./ndb_3_trace.log.1_t3 ./ndb_3_trace.log.1_t4 ./ndb_3_ I will up load trace files shortly
[27 Apr 2009 13:06]
Jonathan Miller
Trace Files from latest crash
Attachment: tpcb-run.tgz (application/x-compressed-tar, text), 306.09 KiB.
[20 Aug 2009 6:31]
Jonas Oreland
see bug#46507
[9 Oct 2009 11:16]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/86332 3105 Jonas Oreland 2009-10-09 ndb - bug#44195 - increase size of protected area, to make sure that multiple threads dont access data in parallel
[9 Oct 2009 11:18]
Bugs System
Pushed into 5.1.39-ndb-7.1.0 (revid:jonas@mysql.com-20091009111806-bi2i4o4kychhf3e8) (version source revid:jonas@mysql.com-20091009111806-bi2i4o4kychhf3e8) (merge vers: 5.1.39-ndb-7.1.0) (pib:12)
[9 Oct 2009 11:21]
Jonas Oreland
pushed to 7.0.9 and 7.1
[9 Oct 2009 13:14]
Jon Stephens
Documented bugfix in the NDB-7.0.9 changelog as follows: Multi-threaded data nodes could in some cases attempt to access the same memory structure in parallel, in a non-safe manner. This could result in data node failure when running ndbmtd while using Disk Data tables. Closed.