Bug #43498 FRM corruption when crashing during ONLINE ALTER
Submitted: 9 Mar 2009 10:43
Reporter: Philip Stoev Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DDL Severity:S3 (Non-critical)
Version:6.0-falcon-team OS:Any
Assigned to: CPU Architecture:Any

[9 Mar 2009 10:43] Philip Stoev
Description:
When the server is killed while executing ONLINE ALTER operations, 

Svoj thinks that maybe ONLINE ALTER FRM operations are not atomic and do not take advantage of atomic rename in order to update the FRM. If this is the case and FRMs are written in multiple small writes, then corruption on a crash is inevitable.

According to Wlad, this is more likely on Windows.

Here is an example log:

http://clustra.norway.sun.com/~bteam/pb2/web.py?action=archive_download&archive_id=374613&...

# 17:36:39 2: CommitNoUpdates transaction 509
# 17:36:39 Verifying table: o; database: d
# 17:36:39 090306 17:36:39 [ERROR] G:\pb2\test\sb_1-372364-1236356544.56\mysql-6.0.11-alpha-win-x86-test\mysql-test\../sql/RelWithDebInfo/mysqld.exe: Incorrect information in file: '.\d\o.frm'
# 17:36:39 090306 17:36:39 [ERROR] G:\pb2\test\sb_1-372364-1236356544.56\mysql-6.0.11-alpha-win-x86-test\mysql-test\../sql/RelWithDebInfo/mysqld.exe: Incorrect information in file: '.\d\o.frm'
# 17:36:39 090306 17:36:39 [ERROR] G:\pb2\test\sb_1-372364-1236356544.56\mysql-6.0.11-alpha-win-x86-test\mysql-test\../sql/RelWithDebInfo/mysqld.exe: Incorrect information in file: '.\d\o.frm'
# 17:36:39 090306 17:36:39 [ERROR] G:\pb2\test\sb_1-372364-1236356544.56\mysql-6.0.11-alpha-win-x86-test\mysql-test\../sql/RelWithDebInfo/mysqld.exe: Incorrect information in file: '.\d\o.frm'
# 17:36:39 090306 17:36:39 [ERROR] G:\pb2\test\sb_1-372364-1236356544.56\mysql-6.0.11-alpha-win-x86-test\mysql-test\../sql/RelWithDebInfo/mysqld.exe: Incorrect information in file: '.\d\o.frm'

How to repeat:
Clone the mysql-test-extra-6.0 tree and then run:

$ cd mysql-test/gentest
$ perl pb2gentest.pl /path/to/6.0-falcon-team /tmp/vardir - falcon_ddl

This will run a Random Query Generator test with a lot of ONLINE ALTER. After 20 min will then kill the server and initiate recovery. To speed things up, you may wish to kill the server yourself and then the framework will initiate recovery for you.

Suggested fix:
Unless FRM operations are atomic, there is no way to avoid this type of corruption.