| Bug #46585 | Inconsistent cluster or crash during SR following a table-reorg | ||
|---|---|---|---|
| Submitted: | 6 Aug 2009 14:06 | Modified: | 16 Oct 2009 11:33 |
| Reporter: | Jonas Oreland | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
| Version: | mysql-5.1-telco-7.0 | OS: | Any |
| Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[15 Oct 2009 13:00]
Jonas Oreland
patch ready...now "only" need to write test-prg
[16 Oct 2009 6:33]
Jonas Oreland
pushed to 7.0.9
[16 Oct 2009 11:33]
Jon Stephens
Documented bugfix in the NDB-7.0.9 changelog as follows:
Performing a system restart of the cluster after having
performed a table reorganization which added partitions caused
the cluster to become inconsistent, possibly leading to a forced
shutdown, in either of the following cases:
1. When a local checkpoint was in progress but had not yet
completed, new partitions were not restored; that is, data
that was supposed to be moved could be lost instead, leading
to an inconcistent cluster. This was due to an issue whereby
the DBDIH kernel block did not save the new table definition
and instead used the old one (the version having fewer
partitions).
2. When the most recent LCP had completed, ordered indexes and
unlogged tables were still not saved (since these did not
participate in the LCP). In this case, the cluster crashed
during a subsequent system restart, due to the inconsistency
between the main table and the ordered index.
Now, DBDIH is forced to use the version of the table definition
held by the DBDICT kernel block, which was (already) correct and
up to date.
Closed.

Description: After having performed a table-reorg which add partitions If you perform a SR 1) If a LCP has not completed, new partitions will not be restored i.e all data that moved will be lost This as DIH haven't saved it's table-definition and will have old version which has fewer partitions 2) If a LCP has completed, ordered indexes/no-logging-tables will still not be saved as they don't participate in LCP Which will for ordered index mean that cluster will crash during SR as it will find the inconsistency between the main-table and the ordered index How to repeat: add node group do table reorg perform system restart Suggested fix: dict has correct information force DIH to use this instead of relying on own