MySQL Bugs: #769: Newer Versions of MySQL cannot RESTORE tables BACKUP'ed with older ones

Bug #769	Newer Versions of MySQL cannot RESTORE tables BACKUP'ed with older ones
Submitted:	1 Jul 2003 3:04	Modified:	1 Jul 2003 14:16
Reporter:	Sergei Kulakov (Candidate Quality Contributor)	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: MyISAM storage engine	Severity:	S3 (Non-critical)
Version:	3.23.51+	OS:	FreeBSD (Free BSD)
Assigned to:		CPU Architecture:	Any

Description:
Good day.

We upgraded MySQL from 3.23.38 and 3.23.51 to 3.23.56 on the two of our
servers. Then I tried to RESTORE a table that I had BACKUP'ed with the older
version and got the following errors:

repair info Duplicate key 2 for record at 114416 against new record at
114260
restore status Operation failed

How to repeat:
1. Download http://www.rlsnet.ru/Pages.zip (227.8K) which contains the
backed up table "Pages"
2. Unpack Pages.zip to any directory
3. Execute "Restore Table Pages From 'The-Directory' "
with a newer version of MySQL 3.23.branch (3.23.51+)

Suggested fix:
I had the problem before and I guess I know the reason. The backed up table
"Pages" stores requests and referrers from webserver's access_log's and,
for the liberal nature of the Internet, there is some garbage in it. That is the names of the pages may contain
nonprintable characters. But the table has a unique index, and the garbage
characters look unique for one version (the old) and they look duplicates
for the other (the new). I guess the reason of the change is that some
codes in the files that describe character sets have changed.

Of course, it was out task to prevent the garbage from getting into the
table but the situation when you just can't restore a table that worked well
with an older version is quite surprising as well. My suggestion is that the
command RESTORE would get an option saying that it should somehow ignore
duplicates or even ignore some or all indices (indexes) of the restored
table, just to let the table data be restored in ANY POSSIBLE WAY - after it gets restored, duplicates can be removed and indices can be added.

Yes, it's a charset issue. I can't really think of anything in MySQL to fix it (until 4.1, from where on 
character set info will be stored in the table), but I was able to recover your table with the following 
(I admit it, a really hacky) procedure:

 * copied the Pages.frm and Pages.MYI into a "test" dir under MySQL data dir
 * ran "REPAIR TABLE test.Pages USE_FRM";

This recovered 18369 rows, and reported that 22 went missing due to violating constraints.

 * did "SHOW CREATE TABLE Pages"
 * copied the definition and created Pages2 with the exact same structure
   minus the constraints
 * did "FLUSH TABLES"
 * copied the _original_ Pages.MYD over Pages2.MYD and removed Pages2.MYI
 * ran "REPAIR TABLE test.Pages2 USE_FRM"

This resulted in 18391 rows - 18369+22, which should be the correct amount. It was important 
that I copied the ORIGINAL Pages.MYD from the .zip file, not the one from the same data directory, 
since the 22 violating rows had been removed from there.

[The author of the report, Kulakov Sergey]

Thanks for doing the hacky procedure. Later I found that I complained wrong - actually even though MySQL reports the error it DOES restore the table, but then it should be REPAIR'ed - I got 18390 records  that's I lost 1 (the one that caused the error) but I don't care - it's garbage. 

Also, it would probably easier to recover the data by temporarily substituting the character set file with the one from the older version or even manually editing it to make all characters with codes 0-31 distinct for sorting. 

I just want to add that "SELECT * INTO OUTFILE/LOAD DATA FROM FILE" are a bit slower yet more stable and flexible - when you have a text file you can do all you want and the FRM/MYD files are always a black box. Luckily, this time we had a copy of the table on the other server.