Bug #11987 mysql will truncate the text when the text contain GBK char:"0xA3A0" and "0xA1"
Submitted: 17 Jul 2005 3:56 Modified: 3 Aug 2005 20:46
Reporter: haka haka Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1 .. 5.0 OS:Any (*)
Assigned to: Alexander Barkov CPU Architecture:Any
Tags: corruption, myisam

[17 Jul 2005 3:56] haka haka
Description:
the chinese gbk charset chars:"0xA3A0" and "0xA1A1" will display as BLANK,like "0x20".
If you try to insert a gbk string which contain "0xA3A0" and "0xA1" into a text type column,all string after the "0xA3A0" and "0xA1A1"  will be trim by server.But you can insert this string into a blob column.

PS:firefox will auto convert the 0XA3A0 to char "??",so...the string below will be convert by firefox,maybe you should use some tools to edit it.

How to repeat:
1.we have a table have these column:
create table if not exists C_ARTICLE
(
   POST_ID                        int unsigned                   not null AUTO_INCREMENT,
   CONTENT                       LONGTEXT      
 );

2.we have a string need to be insert(the string contain the chars "0xA3A0"):
INSERT INTO C_ARTICLE(TEXT)values("编剧:廖一梅人物:马路 明明");

3.after run this script,the mysql server's cpu usage will up to 100%.after it 
finish,only the string before "0xA3A0" can be insert into the table.It means
we only can get "编剧:廖一梅" from table.
[17 Jul 2005 12:21] Aleksey Kishkin
can it be dublicate of http://bugs.mysql.com/bug.php?id=10903 ?
[18 Jul 2005 10:56] Aleksey Kishkin
Test case 
DROP TABLE IF EXISTS `test`.`gbtest`;
CREATE TABLE `gbtest` (
  `content` longtext
) ENGINE=InnoDB DEFAULT CHARSET=gb2312;

INSERT INTO gbtest VALUES(_gb2312 0xB0B0B0B0A3C1B0C0B0C0);
INSERT INTO gbtest VALUES(_gb2312 0xB0B0B0B0A3A0B0C0B0C0);

in result gbtest must contain 2 rows with the same length. (But as the matter of fact first row is longer)
[19 Jul 2005 6:47] Alexander Barkov
I cannot reproduce this problem with 0xA1A1,
but can with A3A0:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a longtext) CHARSET=gb2312;
INSERT INTO t1 VALUES(_gb2312 0xB0B0B0B0A1A1B0C0B0C0);
INSERT INTO t1 VALUES(_gb2312 0xB0B0B0B0A3A0B0C0B0C0);
SELECT hex(a) from t1;

This is the result:

hex(a)
B0B0B0B0A1A1B0C0B0C0
B0B0B0B0

So, A1A1 does not cut the string. A3A0 does.
However, according to this page, A3A0 is an undefined character in GBK:

http://www.microsoft.com/globaldev/reference/dbcs/936/936_A1.mspx

Can you please confirm that A3A0 is an undefinite character in GBK?
If yes, what is the reason to store undefined characters?
If not, can you please give some URLs proving this character to be defined?
Thanks!
[20 Jul 2005 14:19] haka haka
In this page,http://www.microsoft.com/globaldev/reference/dbcs/936/936_A3.mspx
MS has said that A3A0 means nothing.
But,In GBK standard,the encoding rule is:
0x81<char1<0xFE
0x40<ch2<0x7E,0x80<ch2<0xFE(0x7F is realy not exists)

I have down a gbk char definition file,the 0xA3A0 have been define.and I can
get "A1A1" char by using MS PINGYIN2003.

You can get the gbk char definition file in attachment.
[20 Jul 2005 14:21] haka haka
gbk charset

Attachment: GBK1.rar (octet/stream, text), 32.20 KiB.

[22 Jul 2005 16:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/27483
[22 Jul 2005 16:19] Alexander Barkov
Fixed in 4.1.14 and 5.0.11.
[3 Aug 2005 20:46] Mike Hillyer
Documented in 5.0.11 and 4.1.14 changelogs:

<listitem><para>Character data truncated when GBK characters </literal>0xA3A0</literal> and <literal>0xA1</literal> are present. (Bug #11987)</para></listitem>