Description:
Hello.
Now I'm evaluating the functionality of Japanese character sets
conversion of MySQL 5.0.0 alpha.
When I inserted Japanese half-width katakana into tables and selected
from them,
the following problems occurred (probably bugs).
1. Sjis half-width katakana characters were converted to wrong
characters.
2. Ujis half-width katakana characters were converted to wrong
characters.
Environment:
Version: MySQL 5.0.0 alpha (source code install)
Platform: RedHat Linux 9
Install option :./configure --prefix=/usr/local/mysql50
--with-unix-socket-path=/usr/local/mysql50/tmp/mysql.sock
--with-extra-charsets=all --with-charset=sjis
How-To-Repeat:
1.sjis problem
create database db1 character set utf8;
CREATE TABLE table1(
col1 varchar(100)
) default character set utf8;
set names sjis;
source /path/to/sjis-half-width-katakana-format-file;
select * from table1;
2.ujis problem
truncate table table1;
set names ujis;
source /path/to/ujis-half-width-katakana-format-file;
select * from table1;
Best Regards,
Matsunobu Yoshinori.
How to repeat:
Run the above.
Suggested fix:
Fix:
I read MySQL source code,and I found there are some kinds of bugs.
I fixed the following, tested, and confirmed that all the above problems
were solved(in my environment).
1.ctype-sjis.c
1-1.function my_mb_wc_sjis
current:
....
if (hi<0x80)
{
pwc[0]=hi;
return 1;
}
if (s+2>e)
return MY_CS_TOOFEW(0);
....
I fixed:
....
if (hi<0x80)
{
pwc[0]=hi;
return 1;
}
//I added
if((hi>=0xA1)&&(hi<=0xDF))
{
pwc[0]=func_sjis_uni_onechar(hi);
return 1;
}
if (s+2>e)
return MY_CS_TOOFEW(0);
....
1-2.function my_mb_wb_sjis
current:
....
if (!(code=func_uni_sjis_onechar(wc)))
return MY_CS_ILUNI;
if (s+2>e)
return MY_CS_TOOSMALL;
s[0]=code>>8;
s[1]=code&0xFF;
return 2;
....
I fixed:
....
if (!(code=func_uni_sjis_onechar(wc)))
return MY_CS_ILUNI;
//I added
if((code>=0xA1)&&(code<=0xDF))
{
s[0]=code;
return 1;
}
if (s+2>e)
return MY_CS_TOOSMALL;
s[0]=code>>8;
s[1]=code&0xFF;
return 2;
....
2.ctype-ujis.c
function my_wc_mb_euc_jp
current:
....
ret=my_wc_mb_jisx0201(c,wc,buf,buf+2);
if (ret==1)
{
if (s+1>e)
return MY_CS_TOOSMALL;
s[0]=0x8E;
s[1]=buf[0];
return 1;
}
....
I fixed:
....
ret=my_wc_mb_jisx0201(c,wc,buf,buf+2);
if (ret==1)
{
if (s+1>e)
return MY_CS_TOOSMALL;
s[0]=0x8E;
s[1]=buf[0];
return 2; //Because ujis half-width katakana is 2 byte.
}
....