Bug #52058 | ndbd crashes in Pgman::get_page_entry during backup | ||
---|---|---|---|
Submitted: | 15 Mar 2010 9:58 | Modified: | 19 Sep 2010 9:49 |
Reporter: | Niklas Hallqvist | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | mysql-5.1-telco-6.3 | OS: | Linux (rhel5) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | 6.3.27a |
[15 Mar 2010 9:58]
Niklas Hallqvist
[15 Mar 2010 11:53]
Pekka Nousiainen
This looks like an unhandled resource problem. A workaround should be to set DiskPageBufferMemory to bigger than the default 64M (see ref manual 5.1). Context: Pgman::get_page_entry() if (find_page_entry(.. if (m_page_entry_pool.getNoOfFree() == 0 ... if (seize_page_entry(.. // fail ndbrequire(false);
[16 Mar 2010 23:22]
Niklas Hallqvist
I raised that parameter to 256M, but tonight one of the nodes failed almost exactly at the same point in time as the last time, that is 26 minutes into the backup process: Time: Tuesday 16 March 2010 - 22:26:21 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: pgman.cpp Error object: PGMAN (Line: 484) 0x0000000a Program: /usr/sbin/ndbd Pid: 1097 Trace: /var/lib/mysql-cluster/data/ndb_3_trace.log.8 Version: mysql-5.1.37 ndb-6.3.27a-GA ***EOM*** or do I need to restart the ndb_mgmd as well as the other nodes after changing config.ini? I just restarted the ndbd nodes. Our ordinary DBA is away for a week at this moment :-( I'll restart all the nodes ASAP.
[17 Mar 2010 9:16]
Pekka Nousiainen
Yes, ndb_mgmd must be restarted first and then the DB nodes.
[18 Mar 2010 7:04]
Niklas Hallqvist
OK, this time the nodes was properly restarted in a running restart. But, at about the samepoint in time (i.e. 26 minutes into the backup), both nodes stopped with another error: Time: Wednesday 17 March 2010 - 22:26:33 Status: Temporary error, restart node Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug) Error: 2303 Error data: Node 2 killed this node because GCP stop was detected Error object: NDBCNTR (Line: 257) 0x0000000a Program: /usr/sbin/ndbd Pid: 2030 Trace: /var/lib/mysql-cluster/data/ndb_2_trace.log.21 Version: mysql-5.1.37 ndb-6.3.27a-GA ***EOM*** The log has details about th GCP that does not fit in this comment. If they are relevant I can attach them separately. The other node looks similar, but with some extra messages during the 20 seconds that it was alive after the first node went down I realize there is some configurable parameter for GCP timeout and interval, but have problems to understand what to try to set it to. I am reading http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#mysql-cluster-nd... and will try to figure things out. I guess we should write the backup to another place or sth, since this seems to have to do with I/O throughput (which surprises me as the system used is rather high profile, but I'll check things out). Anyway, I do understand there are newer versions, although not packaged for RHel5, at least not that I have found. Are there known fixes in this area in there? I am not at trouble with compiling stuff myself, I just don't want to waste time if there are no relevant fixes. Or are there other ways to accomplish backups without disturbing this online system (it validates credit card transactions in real time for mobile terminals)?
[18 Mar 2010 7:38]
Niklas Hallqvist
I found a message that crops upevery so often in the cluster log: send buffer was full, but sleeping solved it, or similar. Thus I have now raised the SendBufferMemory limit to 64M from 2M. I have no idea if this helps for the problem at hand, but hopefully this relieves the pressure somewhat. Is there an analysis tool that can help figure out configurables that are approaching their limits ahead of time?
[18 Mar 2010 8:43]
Pekka Nousiainen
Looking at config: IndexMemory=.. 25% of DataMemory may be unnecessarily large, if memory is needed for other things. TimeBetweenLocalCheckpoints=6 This is an unusual setting and causes continuous writing of LCPs. It may be behind the problem. Setting it to default now can cause other issues, such as need for more REDO and UNDO space. Here is couple of random links: http://umangg.blogspot.com/2009/10/mysql-cluster-errors-faced-while.html http://lists.mysql.com/cluster/3119 GCP stop is wellknown problem. With disk data may be caused by too slow disks (plenty of material to google). High performance setups usual require careful work by our *support engineers*. Send buffer full is usually associated with replication. The problem is sending all changed data back to mysql. Larger buffer helps only to handle peaks but not constant overload.
[18 Mar 2010 9:25]
Niklas Hallqvist
Well memory isn't a sparse resource, there are 26 GB per node, and we have lots to spare to throw at NDB if needed. Also indexing will be added in the future, due to new demands on the application. Is there really a problem with a high index-to-data memory ratio? I will raise the LCP parameter, but that was hinted by other error messages we have gotten in the past. In fact I believe the exact value we chose may well have been hinted by the first link you supply. But he also raises the redo fragment size I see. I'll try that. Thanks for your prompt answer anyway. We are not reluctant to use your support engineers if needed. However, as experienced database engineers ourselves, although coming from other engine backgrounds, our first approach was to tune this ourselves with the help of the net. This is also part of our own learning curve. If it turns out too hard to fix this quickly, we'll sure pay for help.
[18 Mar 2010 9:52]
Pekka Nousiainen
IndexMemory is unfortunately named. It stores only hash indexes, basically primary keys. And only a "pointer" is stored. Normal (i.e. ordered) indexes use DataMemory. About versions, no newer rhel5 rpms exist. There has been many fixes after 6.3.27A. With disk data, at least bug#47832 and bug#48910. You have >= 26G ram and still need disk data? It's best to put tables in memory when possible, such as small but frequently accessed tables. Our disk buffer cache is not advanced.
[18 Mar 2010 10:08]
Niklas Hallqvist
Thanks for the IndexMemory explanation. No we don't need disk data (yet, anyway). At the start we just had 6G per node, and had to go down that path, after upgrading, we have not reverted back to just in-core data. You think that going away from disk data makes things more stable? We can well do that. Another nagging little bit of ignorance: pure in-core data, does not mean less data integrity? Things gets stored on disk anyhow, right, but is always readily available in-core as well?
[18 Mar 2010 10:28]
Pekka Nousiainen
Yes, avoid disk data if you can. NDB was originally a memory-only db with only hash indexes. Disk data has lots of new code (therefore bugs) and issues made worse such as GCP stop. In-memory tables do not exist on disk. Their data goes to only LCPs and REDO logs, and the backup replica. So there is no "loading" a table from disk into memory. And data integrity is based on storing data in 2 main memories.
[18 Mar 2010 10:52]
Niklas Hallqvist
Well, I sort of understand. But what worries me is if both nodes go away, and I restart one of the nodes. Is the LCP+redo log available for getting the DB close to the state where it was when it crashed? Or is the backup replica the last snapshot of the database?
[18 Mar 2010 12:46]
Pekka Nousiainen
The GCP commits to disk (REDO log) on each node about once a second. So if both nodes crash at same time, you can lose some transactions which have been reported as committed to the application.
[18 Mar 2010 13:05]
Niklas Hallqvist
Good! This is good enough for us, since the critical last transactions which may be lost, can be recreated by application log inspection, which we already have automated routines for. We will in a few weeks likely revert to memory-only. Until then I will do some tuning in order to get the current setup stable again. I assume you still want bug reports if new (as in, not formerly reported) crashes turn up?
[18 Mar 2010 13:11]
Pekka Nousiainen
Yes, we want bug reports and are thankful for them (community testing). This particular bug# remains open because there is an unhandled resource problem (the page entries we started with).
[19 Mar 2010 15:34]
Niklas Hallqvist
Further analysis has shown a pretty interesting pattern. The crashes are now consistently occuring during every trial of taking a backup, about 26 minutes into it. Looking at the last working backups, they are finished at about 25-26 minutes just around 4GB of data. Actually the last working backups sees a 32bit wraparound. Maybe the wraparound causes data corruption not triggering assertions until later... Yesterday's backup met yet another assertion in getPtr of ArratPool.hpp, so it looks like we suffer from somewhat random crashes now, but always at the same time in the backup process. from the log ../ndb_1_cluster.log.4:2010-02-23 22:25:10 [MgmSrvr] INFO -- Node 3: Backup 128 started from node 1 completed. StartGCP: 16461312 StopGCP: 16462037 #Records: 11150462 #LogRecords: 3827 Data: 4176110276 bytes Log: 957984 bytes ../ndb_1_cluster.log.4:2010-02-24 22:24:27 [MgmSrvr] INFO -- Node 2: Backup 129 started from node 1 completed. StartGCP: 16503444 StopGCP: 16504149 #Records: 11272317 #LogRecords: 4240 Data: 4193757492 bytes Log: 999888 bytes ../ndb_1_cluster.log.4:2010-02-25 22:24:30 [MgmSrvr] INFO -- Node 2: Backup 130 started from node 1 completed. StartGCP: 16546247 StopGCP: 16546953 #Records: 11412007 #LogRecords: 4021 Data: 4214566668 bytes Log: 1012688 bytes ../ndb_1_cluster.log.4:2010-02-26 22:25:11 [MgmSrvr] INFO -- Node 2: Backup 131 started from node 1 completed. StartGCP: 16589015 StopGCP: 16589741 #Records: 11566933 #LogRecords: 4443 Data: 4238895484 bytes Log: 1104588 bytes ../ndb_1_cluster.log.4:2010-02-27 22:25:01 [MgmSrvr] INFO -- Node 2: Backup 132 started from node 1 completed. StartGCP: 16631771 StopGCP: 16632492 #Records: 11702368 #LogRecords: 4410 Data: 4258702508 bytes Log: 1088188 bytes ../ndb_1_cluster.log.4:2010-02-28 22:25:16 [MgmSrvr] INFO -- Node 2: Backup 133 started from node 1 completed. StartGCP: 16674510 StopGCP: 16675239 #Records: 11823760 #LogRecords: 2652 Data: 4276084372 bytes Log: 731628 bytes ../ndb_1_cluster.log.4:2010-03-01 22:25:22 [MgmSrvr] INFO -- Node 2: Backup 134 started from node 1 completed. StartGCP: 16717243 StopGCP: 16717975 #Records: 11878794 #LogRecords: 3219 Data: 4285519892 bytes Log: 859792 bytes ../ndb_1_cluster.log.4:2010-03-02 22:24:52 [MgmSrvr] INFO -- Node 2: Backup 135 started from node 1 completed. StartGCP: 16759926 StopGCP: 16760642 #Records: 11984669 #LogRecords: 3473 Data: 5888348 bytes Log: 904656 bytes ../ndb_1_cluster.log.4:2010-03-03 22:25:10 [MgmSrvr] INFO -- Node 2: Backup 136 started from node 1 completed. StartGCP: 16802592 StopGCP: 16803317 #Records: 12099903 #LogRecords: 3667 Data: 22498020 bytes Log: 942116 bytes ../ndb_1_cluster.log.4:2010-03-04 22:25:07 [MgmSrvr] INFO -- Node 2: Backup 137 started from node 1 completed. StartGCP: 16845309 StopGCP: 16846033 #Records: 12232455 #LogRecords: 4877 Data: 41521248 bytes Log: 1113348 bytes ../ndb_1_cluster.log.4:2010-03-05 22:25:11 [MgmSrvr] INFO -- Node 2: Backup 138 started from node 1 completed. StartGCP: 16888084 StopGCP: 16888810 #Records: 12362710 #LogRecords: 5209 Data: 63175608 bytes Log: 1478320 bytes ../ndb_1_cluster.log.4:2010-03-06 22:25:15 [MgmSrvr] INFO -- Node 2: Backup 139 started from node 1 completed. StartGCP: 16930850 StopGCP: 16931578 #Records: 12476982 #LogRecords: 5858 Data: 201887820 bytes Log: 3788572 bytes ../ndb_1_cluster.log.4:2010-03-07 22:26:10 [MgmSrvr] INFO -- Node 2: Backup 140 started from node 1 completed. StartGCP: 16973540 StopGCP: 16974294 #Records: 12632474 #LogRecords: 4418 Data: 514356848 bytes Log: 4947736 bytes ../ndb_1_cluster.log.4:2010-03-08 22:26:25 [MgmSrvr] INFO -- Node 2: Backup 141 started from node 1 completed. StartGCP: 17016214 StopGCP: 17016975 #Records: 12893621 #LogRecords: 3238 Data: 733051660 bytes Log: 845640 bytes ../ndb_1_cluster.log.4:2010-03-09 22:25:59 [MgmSrvr] INFO -- Node 2: Backup 142 started from node 1 completed. StartGCP: 17058877 StopGCP: 17059627 #Records: 13033168 #LogRecords: 4636 Data: 753255496 bytes Log: 1051148 bytes ../ndb_1_cluster.log:2010-03-10 22:26:58 [MgmSrvr] INFO -- Node 2: Backup 143 started from node 1 completed. StartGCP: 17101528 StopGCP: 17102304 #Records: 13195566 #LogRecords: 4688 Data: 776490128 bytes Log: 1129676 bytes
[22 Mar 2010 8:02]
Pekka Nousiainen
Please attach ndb_error_reporter output.
[22 Mar 2010 8:27]
Niklas Hallqvist
OK, the report is there. Currently we have stopped trying taking backups. while I am here, may i ask if it's feasible to split the cluster, having one of the nodes acting as a readonly snapshot (e.g. by adding a fw rule emulating network failure between the nodes), and use mysqldump to get backups instead of the NDB backup, and afterwards just, restart the readonly node, with connectivity restored, so that that node will resync with the one that has remained r/w?
[22 Mar 2010 9:03]
Pekka Nousiainen
For the record the ArrayPool crash is somewhere in: DbtupExecQuery.cpp if (Roptype == ZREAD) { jam(); if (setup_read(&req_struct, regOperPtr, regFragPtr, regTabPtr, disk_page != RNIL)) { if(handleReadReq(signal, regOperPtr, regTabPtr, &req_struct) != -1) { req_struct.log_size= 0; sendTUPKEYCONF(signal, &req_struct, regOperPtr);
[22 Mar 2010 9:11]
Pekka Nousiainen
> may i ask if it's feasible to split the cluster, having one > of the nodes acting as a readonly snapshot (e.g. by adding a fw rule It's not an operation supported by us. Maybe never implemented because there is already a snapshot backup. I guess it might work if you disable arbitration, stop API ops between commits, and cut the cable.
[19 Aug 2010 9:49]
Jonas Oreland
This could be caused by http://bugs.mysql.com/bug.php?id=54986. Can this be retried with e.g 6.3.36
[19 Sep 2010 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".