Bug #3290 Can't convert sjis&ujis half-width katakana correctly
Submitted: 25 Mar 2004 2:23 Modified: 25 Mar 2004 4:19
Reporter: Alexander Barkov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:4.1.2-current-bk OS:Any (Any)
Assigned to: Alexander Barkov CPU Architecture:Any

[25 Mar 2004 2:23] Alexander Barkov
Description:
Hello.

Now I'm evaluating the functionality of Japanese character sets
conversion of MySQL 5.0.0 alpha.

When I inserted Japanese half-width katakana into tables and selected
from them, 
the following problems occurred (probably bugs).

1. Sjis half-width katakana characters were converted to wrong
characters.
2. Ujis half-width katakana characters were converted to wrong
characters.

Environment:
	Version: MySQL 5.0.0 alpha (source code install)
	Platform: RedHat Linux 9
	Install option :./configure --prefix=/usr/local/mysql50
--with-unix-socket-path=/usr/local/mysql50/tmp/mysql.sock
--with-extra-charsets=all --with-charset=sjis 

How-To-Repeat:
	1.sjis problem
	create database db1 character set utf8;
	CREATE TABLE table1(
	col1 varchar(100)
	)  default character set utf8;
	
	set names sjis;
	source /path/to/sjis-half-width-katakana-format-file;
	select * from table1;

	2.ujis problem
	truncate table table1;
	set names ujis;
	source /path/to/ujis-half-width-katakana-format-file;
	select * from table1;

Best Regards,

Matsunobu Yoshinori.

How to repeat:
Run the above.

Suggested fix:
Fix:
I read MySQL source code,and I found there are some kinds of bugs.
I fixed the following, tested, and confirmed that all the above problems
were solved(in my environment).

1.ctype-sjis.c
1-1.function my_mb_wc_sjis

current:
....
  if (hi<0x80)
  {
    pwc[0]=hi;
    return 1;
  }
  if (s+2>e)
    return MY_CS_TOOFEW(0);
....

I fixed:
....
  if (hi<0x80)
  {
    pwc[0]=hi;
    return 1;
  }

  //I added
  if((hi>=0xA1)&&(hi<=0xDF))
  {
      pwc[0]=func_sjis_uni_onechar(hi);
      return 1;
  }

  if (s+2>e)
    return MY_CS_TOOFEW(0);
....

1-2.function my_mb_wb_sjis

current:
....
  if (!(code=func_uni_sjis_onechar(wc)))
    return MY_CS_ILUNI;
  if (s+2>e)
    return MY_CS_TOOSMALL;
  
  s[0]=code>>8;
  s[1]=code&0xFF;
  return 2;
....

I fixed:
....
  if (!(code=func_uni_sjis_onechar(wc)))
    return MY_CS_ILUNI;

  //I added
  if((code>=0xA1)&&(code<=0xDF))
  {
      s[0]=code;
      return 1;
  }

  if (s+2>e)
    return MY_CS_TOOSMALL;
  
  s[0]=code>>8;
  s[1]=code&0xFF;
  return 2;
....

2.ctype-ujis.c  
function my_wc_mb_euc_jp

current:
....
  ret=my_wc_mb_jisx0201(c,wc,buf,buf+2);
  if (ret==1)
  {
    if (s+1>e)
      return MY_CS_TOOSMALL;
      
    s[0]=0x8E;
    s[1]=buf[0];
    return 1;
  }
....

I fixed:
....
  ret=my_wc_mb_jisx0201(c,wc,buf,buf+2);
  if (ret==1)
  {
    if (s+1>e)
      return MY_CS_TOOSMALL;
      
    s[0]=0x8E;
    s[1]=buf[0];
    return 2;     //Because ujis half-width katakana is 2 byte.
  }
....
[25 Mar 2004 2:27] Alexander Barkov
The fix for SJIS is correct.
[25 Mar 2004 4:19] Alexander Barkov
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

I applied the fixes to 4.1.2 sources.
Soon they'll be merged into 5.0 tree too.