Bug #3906 CREATE TABLE does not save the Japanese table names correctly in UTF8.
Submitted: 27 May 2004 4:40 Modified: 26 Jan 2006 2:51
Reporter: Shuichi Tamagawa Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.0 OS:Windows (Window XP Professional)
Assigned to: Alexander Barkov CPU Architecture:Any

[27 May 2004 4:40] Shuichi Tamagawa
Description:
When Japanese characters are used for table names in a SJIS environment, the table names are saved in UTF8 as described in the manual. However, it seems that some SJIS codes are converted to the same UTF8 code. Thus, in some cases new table creation fails with the "ERROR 1050 (42S01): Table '***' already exists" even though the table names are different in SJIS.

How to repeat:
mysql> CREATE TABLE 0x82A0(a char(1));
Query OK, 0 rows affected (0.16 sec)

mysql> CREATE TABLE 0x82A2(a char(1));
ERROR 1050 (42S01): Table '0xE3818145 already exists
mysql> CREATE TABLE 0x82A4(a char(1));
ERROR 1050 (42S01): Table '0xE3818145 already exists
mysql> CREATE TABLE 0x82A6(a char(1));
ERROR 1050 (42S01): Table '0xE3818145 already exists
mysql> CREATE TABLE 0x82A8(a char(1));
ERROR 1050 (42S01): Table '0xE3818145 already exists

*Replace "0x82A0 - 0x82A8" with the Japanese character of the SJIS code point.
*Replace "0xE3818145" with the Japanese characters of the SJIS code points.

Character set variables are as follows.

mysql> SHOW VARIABLES LIKE 'character_%';
+--------------------------+-------------------------+
| Variable_name            | Value                   |
+--------------------------+-------------------------+
| character_set_server     | sjis                    |
| character_set_system     | utf8                    |
| character_set_database   | sjis                    |
| character_set_client     | sjis                    |
| character_set_connection | sjis                    |
| character-sets-dir       | C:/mysql/share/charsets |
| character_set_results    | sjis                    |
+--------------------------+-------------------------+

Suggested fix:
Appearently after converting the character to UTF8, if the last 2 digits of the code are less than 'A0'in HEX, the character is treated as the same character as others. 

Example:
After the operation in "How to repeat", do the following operation.

mysql> create table 0x82BC(a char(1));	#0x82BC(SJIS)=0xE3819E(UTF8)
ERROR 1050 (42S01): Table '0xE3818145 already exists
mysql> create table 0x82BD(a char(1));	#0x82BD(SJIS)=0xE3819F(UTF8)
ERROR 1050 (42S01): Table '0xE3818145 already exists

mysql> create table 0x82BE(a char(1));	#0x82BE(SJIS)=0xE381A0(UTF8)
Query OK, 0 rows affected (0.15 sec)

mysql> create table 0x82BF(a char(1));	#0x82BF(SJIS)=0xE381A1(UTF8)
Query OK, 0 rows affected (0.17 sec)

mysql> create table 0x82C0(a char(1));  #0x82C0(SJIS)=0xE381A2(UTF8)
Query OK, 0 rows affected (0.14 sec)

mysql> create table 0x82C1(a char(1));	#0x82C1(SJIS)=0xE381A3(UTF8)
Query OK, 0 rows affected (0.15 sec)

*Replace "0x82BC - 0x82C1" with the Japanese character of SJIS code point.
*Replace "0xE3818145" with the Japanese characters of SJIS code points.

It is suggested that these codes are converted to each UTF8 code accordingly.
[28 May 2004 14:06] Alexander Barkov
Shuichi, I could not reproduce this bug, and successfully created all these
tables simultaniously. Can you please try the follwing:
SELECT HEX(COVERT(x USING utf8));

where x in 0x82a0, 0x82a2, 0x82a4, etc. 

Does conversion return the same result for them, or different?
[28 May 2004 22:56] Shuichi Tamagawa
When I use the convert function, it returns the different resuts for these characters. But when I create tables, it looks these are treated as the same characters. See the attached file.

Note that these tables are created successfully in Version 4.1.18.
[28 May 2004 22:59] Shuichi Tamagawa
Test result - 4.1.1a-alpha-nt

Attachment: example4.1.1a.JPG (image/pjpeg, text), 73.52 KiB.

[28 May 2004 22:59] Shuichi Tamagawa
Test result - 4.0.18

Attachment: example4.0.18.JPG (image/pjpeg, text), 35.88 KiB.

[3 Jun 2004 8:22] Alexander Barkov
Shuichi, It looks like Windows doesn't allow to create files
with these names at the same time. Can you please try to create
files with these names: 

0xE38182
0xE38184
0xE38186
0xE38188
0xE3818A

Does Windows allow to do it? Thank you!
Sorry, I cannot try it, I don't have a Windows box to try on.
[3 Jun 2004 10:37] Shuichi Tamagawa
Bar,

I think this issue can be explained as follows:

1.Windows File system uses Shift-JIS.
2.When MySQL set the file name '0xE38182' in UTF8, Windows interprets it in Shift-JIS.
3.In Shift-JIS '0xE38182' is interpreted as two characters, '0xE381' and '0x82'.
4.Since there are no corresponding character for code point '0x82' in Shift-JIS, it is expressed in another character(It looks '0x8145').
5.Since there are no corresponding character for code point from '0x80' to '0x9E' these codes are expressed in the same character, which result in file name confliction.

When I tried some other characters like '0xEFBC91', '0xEFBC92', '0xEFBCA1' it behaved in the same way. If my understanding is right, the file name of table data need to be set in Shift-JIS on Windows environment to avoid this problem.
[3 Jun 2004 11:44] Alexander Barkov
I agree that it is wrong that Japanese names are
stored in UTF8 on a SJIS system. Using SJIS for
file system is not good too, we want table files
to be copied between boxes, including between boxes
with different localization. The only way is to store
names in unicode, but not in UTF8. It could be a 
pseudo character set which encodes ASCII as is, but
all national characters, including latin letters with
accents, Japanes, Cyrillic, etc, in their Unicode code
in textual representation:

For example:
ab12.frm       -> ab12, pure ASCII name
ab12%3062      -> ab12 + Japanese character corresponding to SJIS 0x82C0
ab12%00E5.frm  -> ab12 + LATIN LETTER A WITH RING ABOVE (Swedish letter)
ab12%0430.frm  -> ab12 + CYRILLIC LETTER C

What do you think?
[3 Jun 2004 15:53] Sergei Golubchik
see WL#1324

by the way, you cannot give % a special meaning, as old tables with % in the name will become unreadable (consider, e.g. CREATE TABLE `ab12%3062` (...))
[4 Jun 2004 14:53] Alexander Barkov
Serge, what is your erstimation of the priority of this task?
[7 Jun 2004 13:17] Alexander Barkov
Serge, what should I do with this bug?
Should i close it with a reference to the worklog item?
[27 Aug 2004 13:35] Alexander Barkov
A relative problem happens on MAC OSX.
This indicates that it's currently not possible to use non-ASCII characters for
identifiers on Mac OS X, as the file system would mangle the file name. 

See
http://developer.apple.com/documentation/MacOSX/Conceptual/SystemOverview/FileSy
stem/chapter_8_section_6.html
[24 Sep 2004 14:31] Alexander Barkov
We won't fix this problem in 4.1.
It's to be fixed in 5.0 or 5.1.

We should add a note about this problem into the manual.
[7 May 2005 8:20] Alexander Barkov
See also bug#5566
[12 Jan 2006 7:54] Alexander Barkov
This problem was fixed in 5.1 after pushing WL#1324 "table name to file name encoding".

We will not fix this problem in 4.1 and 5.0.
[18 Jan 2006 19:14] Mike Hillyer
Need version number for the fix.
[19 Jan 2006 7:03] Alexander Barkov
Fixed in 5.1.6.

P.S.

Please make sure "table name to file name encoding" is documenting
together with this bug. Stefan told he assigned it to Paul.
[26 Jan 2006 2:51] Mike Hillyer
Documented in 5.1.6 changelog:

   <listitem>
        <para>
          Certain Japanese table names were not properly saved during a
          <literal>CREATE TABLE</literal> statement. (Bug #3906)
        </para>
      </listitem>

Will followup with Paul.