| Bug #11987 | mysql will truncate the text when the text contain GBK char:"0xA3A0" and "0xA1" | ||
|---|---|---|---|
| Submitted: | 17 Jul 2005 3:56 | Modified: | 3 Aug 2005 20:46 | 
| Reporter: | haka haka | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server | Severity: | S3 (Non-critical) | 
| Version: | 4.1 .. 5.0 | OS: | Any (*) | 
| Assigned to: | Alexander Barkov | CPU Architecture: | Any | 
| Tags: | corruption, myisam | ||
   [17 Jul 2005 12:21]
   Aleksey Kishkin        
  can it be dublicate of http://bugs.mysql.com/bug.php?id=10903 ?
   [18 Jul 2005 10:56]
   Aleksey Kishkin        
  Test case DROP TABLE IF EXISTS `test`.`gbtest`; CREATE TABLE `gbtest` ( `content` longtext ) ENGINE=InnoDB DEFAULT CHARSET=gb2312; INSERT INTO gbtest VALUES(_gb2312 0xB0B0B0B0A3C1B0C0B0C0); INSERT INTO gbtest VALUES(_gb2312 0xB0B0B0B0A3A0B0C0B0C0); in result gbtest must contain 2 rows with the same length. (But as the matter of fact first row is longer)
   [19 Jul 2005 6:47]
   Alexander Barkov        
  I cannot reproduce this problem with 0xA1A1, but can with A3A0: DROP TABLE IF EXISTS t1; CREATE TABLE t1 (a longtext) CHARSET=gb2312; INSERT INTO t1 VALUES(_gb2312 0xB0B0B0B0A1A1B0C0B0C0); INSERT INTO t1 VALUES(_gb2312 0xB0B0B0B0A3A0B0C0B0C0); SELECT hex(a) from t1; This is the result: hex(a) B0B0B0B0A1A1B0C0B0C0 B0B0B0B0 So, A1A1 does not cut the string. A3A0 does. However, according to this page, A3A0 is an undefined character in GBK: http://www.microsoft.com/globaldev/reference/dbcs/936/936_A1.mspx Can you please confirm that A3A0 is an undefinite character in GBK? If yes, what is the reason to store undefined characters? If not, can you please give some URLs proving this character to be defined? Thanks!
   [20 Jul 2005 14:19]
   haka haka        
  In this page,http://www.microsoft.com/globaldev/reference/dbcs/936/936_A3.mspx MS has said that A3A0 means nothing. But,In GBK standard,the encoding rule is: 0x81<char1<0xFE 0x40<ch2<0x7E,0x80<ch2<0xFE(0x7F is realy not exists) I have down a gbk char definition file,the 0xA3A0 have been define.and I can get "A1A1" char by using MS PINGYIN2003. You can get the gbk char definition file in attachment.
   [22 Jul 2005 16:05]
   Bugs System        
  A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/27483
   [22 Jul 2005 16:19]
   Alexander Barkov        
  Fixed in 4.1.14 and 5.0.11.
   [3 Aug 2005 20:46]
   Mike Hillyer        
  Documented in 5.0.11 and 4.1.14 changelogs: <listitem><para>Character data truncated when GBK characters </literal>0xA3A0</literal> and <literal>0xA1</literal> are present. (Bug #11987)</para></listitem>


Description: the chinese gbk charset chars:"0xA3A0" and "0xA1A1" will display as BLANK,like "0x20". If you try to insert a gbk string which contain "0xA3A0" and "0xA1" into a text type column,all string after the "0xA3A0" and "0xA1A1" will be trim by server.But you can insert this string into a blob column. PS:firefox will auto convert the 0XA3A0 to char "??",so...the string below will be convert by firefox,maybe you should use some tools to edit it. How to repeat: 1.we have a table have these column: create table if not exists C_ARTICLE ( POST_ID int unsigned not null AUTO_INCREMENT, CONTENT LONGTEXT ); 2.we have a string need to be insert(the string contain the chars "0xA3A0"): INSERT INTO C_ARTICLE(TEXT)values("编剧:廖一梅人物:马路 明明"); 3.after run this script,the mysql server's cpu usage will up to 100%.after it finish,only the string before "0xA3A0" can be insert into the table.It means we only can get "编剧:廖一梅" from table.