Bug #13577 HKSCS characters problem
Submitted: 28 Sep 2005 16:39 Modified: 29 Sep 2005 17:13
Reporter: Cyrus Choi Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:4.1.x, 5.0.x, 5.1.x OS:Linux (Linux)
Assigned to: Assigned Account CPU Architecture:Any

[28 Sep 2005 16:39] Cyrus Choi
Description:
When using big5 character set and a HKSCS character ( Big5 extension character ), it seems the characters is not shown up ( and it seems not stored also ) 

How to repeat:
in mysql 4.1.14 

create a database and create a table 

CREATE TABLE `storesubdistrict` (
  `subdistrictid` tinyint(4) NOT NULL auto_increment,
  `name` varchar(20) NOT NULL default '',
  PRIMARY KEY  (`subdistrictid`) 
 ) ENGINE=InnoDB DEFAULT CHARSET=big5 ;

and then 

insert into storesubdistrict ( subdistrictid, name ) values(75,'鰂魚涌');

mysql> select * from storesubdistrict;
+---------------+------+
| subdistrictid | name |
+---------------+------+
|            75 |      |
+---------------+------+

Note for the empty string in the name field

note that character 鰂 has a ascii code 916F ( which is a HKSCS character ) 

and it will 

Suggested fix:
it seems the 

#define isbig5head(c) (0xa1<=(uchar)(c) && (uchar)(c)<=0xf9)

in 
./strings/ctype-big5.c
./libmysql_r/ctype-big5.c
./libmysql/ctype-big5.c

need to be changed to 

#define isbig5head(c) (0x81<=(uchar)(c) && (uchar)(c)<=0xfe)

note it is not yet tested.
[28 Sep 2005 16:48] Cyrus Choi
the changes has been compiled. Prelimeary test shows the result is display correctly. 

Does the changes will have any side-effect on the existing code ?
[7 Oct 2005 3:52] Alexander Barkov
Dear Cyrus, your fix looks correct. It allows
to store and fetch HKSCS data into a Big5 column.

However, please note, character set conversion
to/from other character sets will not work. We cannot
just add mapping for extra HKSCS characters into
Big5 implementation, because Big5 and HKSCS have
different mapping for some characters.

Also, sorting order for extra characters will be unpredictable.

Thus adding full featured HKSCS character set needs
to introduce a new separate character set "big5hkscs",
not just fixes in "big5" implementation.

What do you think?
[20 Oct 2005 8:16] Alexander Barkov
Ok, full HKSCS support needs a separate character set
due to unicode conversion differences with regular "big5".
Changeing status to feature request.
[20 Jul 2006 18:54] Valeriy Kravchuk
Bug #20960 was marked as a duplicate of this one.
[27 Sep 2006 17:59] Valeriy Kravchuk
Bug #22691 was marked as a duplicate of this one.