Bug #33023 Online backup could mangle object names if non-std charset is used
Submitted: 5 Dec 2007 21:33 Modified: 2 Sep 2008 18:29
Reporter: Chuck Bell Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Rafal Somla CPU Architecture:Any

[5 Dec 2007 21:33] Chuck Bell
Description:
Online backup needs to preserve object names when the names include characters from optional charsets. This is from WL#4060:

      - Correct table names (if non-std charsets used in them)  [1d]
        Benefit: Users which use charsets other than 
        default will be able to use online backup.

How to repeat:
N/A Feature missing

Suggested fix:
Store charsets and preserve characters in object names.
[14 Dec 2007 8:57] Rafal Somla
PROPOSED SOLUTION

When server needs to identify a table when talking to a storage engine (e.g.
in open() call) it uses a path string where table and database names are encoded
using special character set. This path string for a given table/database name 
can be created using build_table_filename() function (defined in sql_table.cc). 
The function assumes that the input table and database names are encoded using
system_charset_info.

To give access to the internal representation of table name the following methods will be added to Table_ref class:

size_t Table_ref::internal_name(Iname_buf buf);
size_t Table_ref::internal_name(char *buf, size_t buflen);

Type Table_ref::Iname_buf will be a char array of size appropriate for storing
internal table name representation (FN_REFLEN?). It can be used as follows:

Table_ref t;
Table_ref::Iname_buf tname;

size_t len= t.internal_name(tname);
DBUG_ASSERT(tname[len] == '\0');

Alternatively, user can decide itself what buffer size to use:

Table_ref t;
char tname[1024];
size_t len= t.internal_name(tname,1024);
[14 Dec 2007 13:28] Rafal Somla
Here is a small test script which can be used to see how table name in non-latin character set is handled by online backup system. In the current tree it already exhibits some problems:

- restore fails when executing DROP TABLE statement since table name is not
  quoted properly,
- restore fails when the connection character set settings are non UTF8.

These issues will be addressed in the patch. The main issue of not translating table name to the internal representation (in this case it should be 'test/@ff71@ff71@ff71') does not appear in the current tree since we have no native backup engines yet.

------------------------------------------------------------------
SET NAMES utf8;
SET character_set_database = utf8;

USE test;
CREATE TABLE `アアア`(`キキキ` char(5)) DEFAULT CHARSET = utf8;
SHOW TABLES;
SHOW CREATE TABLE `アアア`;

INSERT INTO `アアア` VALUES ("Rafal");
SELECT * FROM `アアア`;

BACKUP DATABASE test TO "test.bak";

DROP DATABASE test;

SET NAMES latin1;

RESTORE FROM "test.bak";

SELECT @@character_set_client;
SELECT @@character_set_results;
SELECT @@character_set_connection;

SHOW TABLES IN test;

SET NAMES utf8;

SHOW TABLES IN test;
[14 Dec 2007 16:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/40014

ChangeSet@1.2752, 2007-12-14 17:17:32+01:00, rafal@quant.(none) +5 -0
  BUG#33023 (Table name mangling).
  
  This patch defines Table_ref::internal_name() method for getting an internal, 
  character set independent string identifying given table. This string is in the 
  format expected by storage engines.
  
  Apart from that, two more issues are fixed:
  
  - When constructing a DROP statement (used for dropping objects during restore), 
  the name of the object is quoted as necessary.
  
  - The default character sets of the connection executing RESTORE command are set 
  to system's default (utf8) which is used in the queries creating the objects. 
  Without that, restore was failing if non-standard characters were used in object 
  names and the default character set was different from utf8.
[14 Dec 2007 16:35] Rafal Somla
There are small differences between the proposed solution and what the patch 
implements:

- The Table_ref::Iname_buf type is named Table_ref::name_buf. The same type is 
used for Table_ref::internal_name() and Table_ref::describe() methods.

- Method Table_ref::internal_name() returns pointer to the resulting string, not 
its length.
[22 Apr 2008 9:36] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45805

ChangeSet@1.2612, 2008-04-22 11:35:52+02:00, rafal@quant.(none) +11 -0
  BUG#33023 (Online backup could mangle object names if non-std charset is used)
  
  The following problems are fixed by this patch:
  
  1. If during restore a connection character set different than at backup time 
  was used, object names were wrongly interpreted leading to failures.
  
  2. Character set and collation settings associated with a view were not stored 
  in backup image and not restored correctly.
  
  3. When errors were detected during restore of table data, the 
  Backup_restore_ctx object was not correctly set to error state.
  
  4. Tables in the list passed from backup kernel to backup/restore drivers were 
  not identified using the same convention as used by storage engines. The 
  internal table name representation uses only US-ascii characters and can handle 
  names written using any character set.
  
  The solutions are as follows.
  
  Ad 1) Charset settings are changed to system defaults inside obs::Obj::execute() 
  method and restored to previous values at the end.
  
  Ad 2) obs::TableObj::serialize() method is modified to prepend 
  "SET CHARACTER_SET_CLIENT" and "SET COLLATION_CONNECTION" statements in front of 
  view's serialization string.
  
  Ad 3) Backup_restore_ctx::fatal_error() method is used for reporting errors 
  which interrupt restore process.
  
  Ad 4) Method internal_name() is added to backup::Table_ref(). It can be used by 
  backup/restore drivers to obtain internal table name repesentation.
[22 Apr 2008 9:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45806

ChangeSet@1.2612, 2008-04-22 11:37:50+02:00, rafal@quant.(none) +13 -0
  BUG#33023 (Online backup could mangle object names if non-std charset is used)
  
  The following problems are fixed by this patch:
  
  1. If during restore a connection character set different than at backup time 
  was used, object names were wrongly interpreted leading to failures.
  
  2. Character set and collation settings associated with a view were not stored 
  in backup image and not restored correctly.
  
  3. When errors were detected during restore of table data, the 
  Backup_restore_ctx object was not correctly set to error state.
  
  4. Tables in the list passed from backup kernel to backup/restore drivers were 
  not identified using the same convention as used by storage engines. The 
  internal table name representation uses only US-ascii characters and can handle 
  names written using any character set.
  
  The solutions are as follows.
  
  Ad 1) Charset settings are changed to system defaults inside obs::Obj::execute() 
  method and restored to previous values at the end.
  
  Ad 2) obs::TableObj::serialize() method is modified to prepend 
  "SET CHARACTER_SET_CLIENT" and "SET COLLATION_CONNECTION" statements in front of 
  view's serialization string.
  
  Ad 3) Backup_restore_ctx::fatal_error() method is used for reporting errors 
  which interrupt restore process.
  
  Ad 4) Method internal_name() is added to backup::Table_ref(). It can be used by 
  backup/restore drivers to obtain internal table name repesentation.
[22 Apr 2008 9:41] Rafal Somla
Since backup kernel has changed considerably, a new patch for this bug had to be created. Please review the new patch.
[22 Apr 2008 9:49] Rafal Somla
BUG#33022 is a duplicate of this one.
[5 May 2008 13:54] Chuck Bell
Patch approved.
[5 May 2008 15:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/46356

ChangeSet@1.2612, 2008-05-05 17:03:24+02:00, rafal@quant.(none) +13 -0
  BUG#33023 (Online backup could mangle object names if non-std charset is used)
  
  The following problems are fixed by this patch:
  
  1. If during restore a connection character set different than at backup time 
  was used, object names were wrongly interpreted leading to failures.
  
  2. Character set and collation settings associated with a view were not stored 
  in backup image and not restored correctly.
  
  3. When errors were detected during restore of table data, the 
  Backup_restore_ctx object was not correctly set to error state.
  
  4. Tables in the list passed from backup kernel to backup/restore drivers were 
  not identified using the same convention as used by storage engines. The 
  internal table name representation uses only US-ascii characters and can handle 
  names written using any character set.
  
  The solutions are as follows.
  
  Ad 1) Charset settings are changed to system defaults inside obs::Obj::execute() 
  method and restored to previous values at the end.
  
  Ad 2) obs::TableObj::serialize() method is modified to prepend 
  "SET CHARACTER_SET_CLIENT" and "SET COLLATION_CONNECTION" statements in front of 
  view's serialization string.
  
  Ad 3) Backup_restore_ctx::fatal_error() method is used for reporting errors 
  which interrupt restore process.
  
  Ad 4) Method internal_name() is added to backup::Table_ref(). It can be used by 
  backup/restore drivers to obtain internal table name repesentation.
[1 Sep 2008 14:01] Rafal Somla
Updating status as this patch has been already pushed into main 6.0.7 tree.
[2 Sep 2008 18:29] Paul DuBois
Noted in 6.0.7 changelog.

BACKUP DATABASE followed by RESTORE could mangle object names if a
non-standard charset was used.
[14 Sep 2008 5:05] Bugs System
Pushed into 6.0.7-alpha  (revid:sp1r-rafal@quant.(none)-20080505150324-05691) (version source revid:john.embretsen@sun.com-20080724122511-9c0oudz1xrdrs6y6) (pib:3)