Bug #67739 | GBK charset is not Fully supported in mysql | ||
---|---|---|---|
Submitted: | 28 Nov 2012 9:29 | Modified: | 3 Dec 2012 7:28 |
Reporter: | vin chen | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | All | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | charset, gbk, MySQL |
[28 Nov 2012 9:29]
vin chen
[28 Nov 2012 19:33]
Sveta Smirnova
Thank you for the report. MySQL supports gbk fully, including your character: create table t_gbk(c1 int, c2 varchar(20)) engine=innodb default charset= gbk; set names gbk; insert into t_gbk values(1,'��'); insert into t_gbk values(1,'��'); select * from t_gbk; c1 c2 1 �� 1 � select hex(c2) from t_gbk; hex(c2) FE80 CBDF In your test you insert into table with UTF8 charset and this is the reason why you get error. So technically this is not a bug, but feature request: "Add UTF support for more Chinese characters". Do you know to which Unicode code this character corresponds?
[29 Nov 2012 14:45]
Sveta Smirnova
We discussed this internally and I got confirmation that GBK symbols 0xFE50..0xFEA0 are not converted to U+E815..U+E864 in our Unicode implementation. What you want looks like GBK 1.0 as described at http://en.wikipedia.org/wiki/GBK : ----<q>---- In 1995, China National Information Technology Standardization Technical Committee set down the Chinese Internal Code Specification (Chinese: 汉字内码扩展规范(GBK); pinyin: Hànzì Nèimǎ Kuòzhǎn Guīfàn (GBK)), Version 1.0, known as GBK 1.0, which is a slight extension of Codepage 936. The newly added 95 characters were not found in GB 13000.1-1993, and were provisionally assigned Unicode PUA code points. ----</q>---- We can add these conversions, but we need some kind of official document. Web site you are linking is not "China National Information Technology Standardization Technical Committee", therefore we can not use it as such a confirmation. But maybe do you know where is "Chinese Internal Code Specification (GBK), Version 1.0. " standard located?
[30 Nov 2012 3:23]
vin chen
Sorry,I can't find the offical document. But 0xfe50-0xFEA0 also marked valid in http://ff.163.com/newflyff/gbk-list/ which published by "全国信息技术标准化技术委员会"(China National Information Technology Standardization Technical Committee) And from http://www.fmddlmyy.cn/text24.html said, 在制定GBK时,Unicode中还没有这些字符,所以使用了专用区的码位,这80个字符的码位是0xE815-0xE864。后来,Unicode将52个汉字收录到“CJK统一汉字扩充A”。28个部首中有14个部首被收录到“CJK部首补充区”。所以在上图中,这些字符都有两个Unicode编码。 which means that these characters have not corresponding Unicode while formulating GBK charset,and Unicode added them to "CJK Unified Ideographs Extension A" later. Maybe MySQL doesn't synchronized these modification.
[30 Nov 2012 16:42]
Sveta Smirnova
Thank you for the feedback. I set report to "Verified", so we will consider if we can implement this. > which means that these characters have not corresponding Unicode while formulating GBK charset,and Unicode added them to "CJK Unified Ideographs Extension A" later. ... > Maybe MySQL doesn't synchronized these modification. Yes, MySQL does not support these new additions to GBK.
[3 Dec 2012 7:28]
vin chen
GBK:0xD7FA~0xD7FE unicode:0xe810~0xe814 means space character in GBK These conversions shoule also be added to mysql.
[8 Sep 2013 18:18]
Peter Laursen
Also see http://bugs.mysql.com/bug.php?id=70270 http://bugs.mysql.com/bug.php?id=70271
[8 Sep 2013 18:20]
Peter Laursen
And this is DEEFINITELY more than a feature request. This means that a dump cannot be restored. This is CRITICAL (at least 'S2' in the categorizations available here).
[27 Apr 2015 7:24]
Chiranjeevi Battula
http://bugs.mysql.com/bug.php?id=76822 marked as duplicate of this one.