Bug #36993 | Falcon reports Index SCHEDULE..PRIMARY_KEY in SYSTEM.SCHEDULE damaged | ||
---|---|---|---|
Submitted: | 26 May 2008 21:01 | Modified: | 15 May 2009 14:12 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S2 (Serious) |
Version: | 6.0-falcon-team | OS: | Any |
Assigned to: | Vladislav Vaintroub | CPU Architecture: | Any |
Tags: | F_STARTUP |
[26 May 2008 21:01]
Philip Stoev
[4 Jun 2008 16:44]
Philip Stoev
Philip needs to reproduce this with falcon_gopher_threads > 0
[26 Jan 2009 10:28]
Philip Stoev
This continues to happen when Falcon is killed and restarted before it has been used much. I am increasing the triage values of this bug because the SCHEDULE table is nothing special, same corruption may happen on other Falcon system or user tables.
[20 Feb 2009 11:30]
Lars-Erik Bjørk
Related to recovery, Vlad's specialty
[10 Apr 2009 22:13]
Christopher Powers
Vlad Vaintroub: Suspected cause: Kill -9 before system tables were completely created. Suggested fix: Won't fix (good workaround) Workaround: Delete all falcon spaces and serial logs.
[10 Apr 2009 22:13]
Christopher Powers
Philip Stoev: Note that this is just an error printed in the log, the database continues to run. Therefore "delete all falcon tablespaces" is not a good workaround because a person may not even notice the problem, since it does not reveal itself in a crash. God knows what else is also damaged. Also, the kill -9 did not happen while the server was starting up. The server had already started and databases and tables were created by the time the kill -9 arrived. Therefore, it is not about "killing before system tables were completely created", it may be about "killing before gophers applied all serial log events related to system tables". So, this remains a valid bug for me. I do intend to test recovery systematically with kill -9 immediately after server startup, so a decision and a solution must be implemented for that one. Maybe the solution is to do extra checkpoints after creating the system tables and waiting for the gophers to write everything to disk.
[10 Apr 2009 22:19]
Christopher Powers
Vlad: And what you do if you kill before checkpoint has run? Philip: It appears to me that the current behavior is as follows: 1. Falcon starts up, system tables are created in memory 2. Server becomes available for connections 3. Queries start arriving 4. A scheduled checkpoint arrives, the gophers write the system tables to disk, etc. If there is a crash in Step #3, you can not use a workaround "delete tablespaces and start from scratch", because you would loose the transactions that were issued by the users. So, instead, maybe this will work: 1. Falcon starts up 2. System tables are created and flushed to disk, force two checkpoints, waits for gophers to complete, whatever is needed 3. Server becomes available for connections 4. Queries start arriving This way, for crashes in Step #2, the workaround can be "delete tablespaces and start from scratch". Crashes in Step #4 should recover properly without waivers. Vlad: If step3 took < 30 seconds, I'd think "delete tablespaces and start from scratch" is still a reasonable workaround. We are not talking about lost terabytes of user data, do we? Philip: I do not think a 30-second data loss is very acceptable :-) If two consequtive forced checkpoints or some other (simple) trick will reduce the window, then let's go for it. Note that by default mysqld is being automatically restarted at every crash by the safe_mysqld script. This means that a customer could easily rack up repeated restarts and recoveries without even noticing.
[19 Apr 2009 20:15]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/72480 3129 Vladislav Vaintroub 2009-04-19 Bug #36993 Falcon reports Index SCHEDULE..PRIMARY_KEY in SYSTEM.SCHEDULE damaged The problem here is that mysqld was killed before database was completely created (i.e before all data dictionary was completely written to the disk). Falcon cannot handle such sutuations gracefully yet and recovery after such point is not guaranteed to succeed. The patch improves the sutation a little bit, disabling user queiries until database is fully created and written to the disk. Also, this patch introduces a clean Falcon shutdown : waiting for background theads to complete their work , followed by flushing the page cache. This will eliminate the need for recovery after a clean shutdown.
[19 Apr 2009 20:18]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/72481 3129 Vladislav Vaintroub 2009-04-19 Bug #36993 Falcon reports Index SCHEDULE..PRIMARY_KEY in SYSTEM.SCHEDULE damaged The problem here is that mysqld was killed before database was completely created (i.e before all data dictionary was completely written to the disk). Falcon cannot handle such sutuations gracefully yet, recovery after such point is not guaranteed to succeed. The patch improves the sutation a little bit, disabling user queiries until database is fully created and written to the disk. Also, this patch introduces a clean Falcon shutdown : waiting for background theads to complete their work , followed by flushing the page cache. This will eliminate the need for recovery after a clean shutdown.
[19 Apr 2009 20:21]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/72482 3129 Vladislav Vaintroub 2009-04-19 Bug #36993 Falcon reports Index SCHEDULE..PRIMARY_KEY in SYSTEM.SCHEDULE damaged The problem here is that mysqld was killed before database was completely created (i.e before all data dictionary was completely written to the disk). Falcon cannot handle such sutuations gracefully yet, recovery after such point is not guaranteed to succeed. The patch improves the sutation a little bit, disabling user queiries until database is fully created and written to the disk. Also, this patch introduces a clean Falcon shutdown : waiting for background theads to complete their work , followed by flushing the page cache. This will eliminate the need for recovery after a clean shutdown.
[23 Apr 2009 7:22]
Bugs System
Pushed into 6.0.11-alpha (revid:alik@sun.com-20090423071213-afmyrzvolemph7mz) (version source revid:hky@sun.com-20090421195958-j33v1cuo3yer9niu) (merge vers: 6.0.11-alpha) (pib:6)
[15 May 2009 14:12]
MC Brown
An entry has been added to the 6.0.11 changelog: Trying to recover Falcon tables after a crash when the corresponding tables and tablespaces have not been created before the crash could cause a recovery failure.