Bug #16072 some problem in charset(gb2312)
Submitted: 29 Dec 2005 14:56 Modified: 19 May 2006 8:11
Reporter: zhang min Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.0.19-BK, 5.0.17 OS:Linux (Linux, Windows)
Assigned to: Alexander Barkov CPU Architecture:Any

[29 Dec 2005 14:56] zhang min
Description:
i used in mysql with insert row in database which has been created with charactor gb2312. Then i create a table with the SQL:
CREATE TABLE `dm_syscatalogue` (
  `ID` int(10) unsigned NOT NULL auto_increment COMMENT '标示',
  `TYPE` enum('GEOMETRY','POINT','LINESTRING','POLYGON','IMAGE','GRID','3DM','3DV') NOT NULL COMMENT '类型',
  `Name` varchar(256) NOT NULL COMMENT '名称',
  `aliasName` varchar(256) NOT NULL COMMENT '别名',
  `tableName` varchar(256) NOT NULL COMMENT '表名',
  `minX` double NOT NULL COMMENT '最小空间X值',
  `minY` double NOT NULL COMMENT '最小空间Y值',
  `minZ` double NOT NULL COMMENT '最小空间Z值',
  `maxX` double NOT NULL COMMENT '最大空间X值',
  `maxY` double NOT NULL COMMENT '最大空间Y值',
  `maxZ` double NOT NULL COMMENT '最大空间Z值',
  PRIMARY KEY  (`ID`),
  UNIQUE KEY `AliasNameIndex` (`aliasName`),
  UNIQUE KEY `NameIndex` (`Name`),
  KEY `SpIndex` (`minX`,`minY`,`maxX`,`maxY`)
) ENGINE=MyISAM DEFAULT CHARSET=gb2312 COMMENT='数据目录';

Then i insert this row is failed:
insert into l_kkk(Data,oID,TYPE,SUBTYPE,NAME,TELEPHONE,ADDRESS,POSTAL,COMMENT,UPDATED) values(0x0000000001010000007EE36BCF2C155D40802A6EDC62FE4340,"        977","         50","         93","新陆峰汌家常菜 ","64956391","海淀区东升园公寓12号","","","2005/10");

mysql told me : Data too long for column 'NAME' at row 1.

the problem is at "新陆峰汌家常菜 ",and these chars is chinese.
if i modify these chars to "新陆峰汌家常菜 "(Drop the "汌"),the problem is not happened.

How to repeat:
mysql> CREATE TABLE `dm_syscatalogue` (
  `ID` int(10) unsigned NOT NULL auto_increment COMMENT '标示',
  `TYPE` enum('GEOMETRY','POINT','LINESTRING','POLYGON','IMAGE','GRID','3DM','3DV') NOT NULL COMMENT '类型',
  `Name` varchar(256) NOT NULL COMMENT '名称',
  `aliasName` varchar(256) NOT NULL COMMENT '别名',
  `tableName` varchar(256) NOT NULL COMMENT '表名',
  `minX` double NOT NULL COMMENT '最小空间X值',
  `minY` double NOT NULL COMMENT '最小空间Y值',
  `minZ` double NOT NULL COMMENT '最小空间Z值',
  `maxX` double NOT NULL COMMENT '最大空间X值',
  `maxY` double NOT NULL COMMENT '最大空间Y值',
  `maxZ` double NOT NULL COMMENT '最大空间Z值',
  PRIMARY KEY  (`ID`),
  UNIQUE KEY `AliasNameIndex` (`aliasName`),
  UNIQUE KEY `NameIndex` (`Name`),
  KEY `SpIndex` (`minX`,`minY`,`maxX`,`maxY`)
) ENGINE=MyISAM DEFAULT CHARSET=gb2312 COMMENT='数据目录';

mysql>insert into l_kkk(Data,oID,TYPE,SUBTYPE,NAME,TELEPHONE,ADDRESS,POSTAL,COMMENT,UPDATED) values(0x0000000001010000007EE36BCF2C155D40802A6EDC62FE4340,"        977","         50","         93","新陆峰汌家常菜 ","64956391","海淀区东升园公寓12号","","","2005/10");
[30 Dec 2005 15:46] Valeriy Kravchuk
Sorry, but your INSERT statement is NOT for table `dm_syscatalogue`. So, please, provide all the statements to demonstrate the problem.
[1 Jan 2006 2:05] zhang min
sorry,it's my mistake.

CREATE TABLE `l_kkk` (
  `ID` int(10) unsigned NOT NULL auto_increment,
  `Data` geometry NOT NULL,
  `oID` double default NULL,
  `TYPE` double default NULL,
  `SUBTYPE` double default NULL,
  `NAME` varchar(100) default NULL,
  `TELEPHONE` varchar(32) default NULL,
  `ADDRESS` varchar(100) default NULL,
  `POSTAL` varchar(10) default NULL,
  `COMMENT` varchar(200) default NULL,
  `UPDATED` varchar(10) default NULL,
  PRIMARY KEY  (`ID`),
  SPATIAL KEY `SpIndex` (`Data`(32))
) ENGINE=MyISAM DEFAULT CHARSET=gb2312;

the INSERT statement is successed:
INSERT INTO `l_kkk` (`Data`,`oID`,`TYPE`,`SUBTYPE`,`NAME`,`TELEPHONE`,`ADDRESS`,`POSTAL`,`COMMENT`,`UPDATED`) VALUES 
 (0x000000000101000000C9C859D8D30C5D403620425C39FB4340,1,50,70,'珠联大酒楼','82591834','海淀区香南黄土坡9号','','','2005/06/16');

But this is failed:
insert into
l_kkk(Data,oID,TYPE,SUBTYPE,NAME,TELEPHONE,ADDRESS,POSTAL,COMMENT,UPDATED)
values(0x0000000001010000007EE36BCF2C155D40802A6EDC62FE4340,"        977","     
   50","         93","新陆峰汌家常菜","64956391","海淀区东升园公寓12号","","","2005/10");

If i change the "新陆峰汌家常菜" to "新陆峰家常菜" , the insert become successed.
[1 Jan 2006 2:08] zhang min
And the other SQL Before create table and insert:

set character_set_results=gb2312;
set character_set_client=gb2312;
set character_set_connection=gb2312;
set character_set_server=gb2312;

create database CHARACTER SET 'gb2312';
[1 Jan 2006 2:09] zhang min
And the other SQL Before create table and insert:

set character_set_results=gb2312;
set character_set_client=gb2312;
set character_set_connection=gb2312;
set character_set_server=gb2312;

create database ddd CHARACTER SET 'gb2312';
[11 Jan 2006 13:37] Valeriy Kravchuk
Sorry, but I was not able to repeat with latest 5.0.19-BK build on Linux:

mysql> set character_set_results=gb2312;
Query OK, 0 rows affected (0.00 sec)

mysql> set character_set_client=gb2312;
Query OK, 0 rows affected (0.00 sec)

mysql> set character_set_connection=gb2312;
Query OK, 0 rows affected (0.00 sec)

mysql> set character_set_server=gb2312;
Query OK, 0 rows affected (0.01 sec)

mysql> create database ddd CHARACTER SET 'gb2312';
Query OK, 1 row affected (0.00 sec)

mysql> use ddd;
Database changed
mysql> CREATE TABLE `l_kkk` (
    ->   `ID` int(10) unsigned NOT NULL auto_increment,
    ->   `Data` geometry NOT NULL,
    ->   `oID` double default NULL,
    ->   `TYPE` double default NULL,
    ->   `SUBTYPE` double default NULL,
    ->   `NAME` varchar(100) default NULL,
    ->   `TELEPHONE` varchar(32) default NULL,
    ->   `ADDRESS` varchar(100) default NULL,
    ->   `POSTAL` varchar(10) default NULL,
    ->   `COMMENT` varchar(200) default NULL,
    ->   `UPDATED` varchar(10) default NULL,
    ->   PRIMARY KEY  (`ID`),
    ->   SPATIAL KEY `SpIndex` (`Data`(32))
    -> ) ENGINE=MyISAM DEFAULT CHARSET=gb2312;
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT INTO `l_kkk`
    -> (`Data`,`oID`,`TYPE`,`SUBTYPE`,`NAME`,`TELEPHONE`,`ADDRESS`,`POSTAL`,`COMMENT`,
    -> `UPDATED`) VALUES
    ->  (0x000000000101000000C9C859D8D30C5D403620425C39FB4340,1,50,70,'?????',
    ->  '82591834','????????9?','','','2005/06/16');
Query OK, 1 row affected (0.04 sec)

mysql> insert into `l_kkk`
    -> (Data, oID, TYPE, SUBTYPE, NAME, TELEPHONE, ADDRESS, POSTAL, COMMENT, UPDATED)
    -> values(0x0000000001010000007EE36BCF2C155D40802A6EDC62FE4340, "       977",
    -> "       50","93","???????","64956391","????????12?","","","2005/10");
Query OK, 1 row affected (0.04 sec)

mysql> select * from l_kkk\G
*************************** 1. row ***************************
       ID: 1
     Data:
      oID: 1
     TYPE: 50
  SUBTYPE: 70
     NAME: ?????
TELEPHONE: 82591834
  ADDRESS: ????????9?
   POSTAL:
  COMMENT:
  UPDATED: 2005/06/16
*************************** 2. row ***************************
       ID: 2
     Data:
      oID: 977
     TYPE: 50
  SUBTYPE: 93
     NAME: ???????
TELEPHONE: 64956391
  ADDRESS: ????????12?
   POSTAL:
  COMMENT:
  UPDATED: 2005/10
2 rows in set (0.01 sec)

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.0.19    |
+-----------+
1 row in set (0.00 sec)

So, there is either a difference in my test (please, check) or something was fixed since 5.0.17, or it is Windows-only bug. Please, try to repeat with latest version on Windows, 5.0.18, and inform about the results.
[13 Jan 2006 14:45] zhang min
o , i have some founds.
the bug happened with the some special chars : '焗'[0x9F68],'汌'[0x9AF6].
if the ascii is 0xCDFE,it has no problem.
something happend with the ascii with the head '0x9...'.

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.0.17-nt |
+-----------+
mysql> set character_set_results=gb2312;
Query OK, 0 rows affected (0.01 sec)
mysql> set character_set_client=gb2312;
Query OK, 0 rows affected (0.00 sec)
mysql> select charset('威威焗鸡王酒楼');
+---------------------------+
| charset('威威?h鸡王酒楼') |
+---------------------------+
| latin1                    |
+---------------------------+
1 row in set (0.00 sec)

then the problem is at '?h'. That is not correctly.
[21 Jan 2006 16:03] Valeriy Kravchuk
I was not able to verify everything you described completely (as had not found any way to insert and display that real gb2312 characters on my FC1 or Windows, not simply their codes), but I had verified at least that some codes are problematic, and it is not obvious why. Look:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6 to server version: 5.0.19
 
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
 
mysql> create table tt_gb2312 (
    -> id int,
    -> name varchar(100)) engine=MyISAM default charset=gb2312;
Query OK, 0 rows affected (0.01 sec)
 
mysql> set character_set_results=gb2312;
Query OK, 0 rows affected (0.03 sec)
 
mysql> set character_set_client=gb2312;
Query OK, 0 rows affected (0.03 sec)
 
mysql> set character_set_connection=gb2312;
Query OK, 0 rows affected (0.02 sec)
 
mysql> set character_set_server=gb2312;
Query OK, 0 rows affected (0.00 sec)
 
mysql> insert into tt_gb2312 values(1, 0x9F689AF6);
Query OK, 1 row affected, 1 warning (0.08 sec)
 
mysql> show warnings;
+---------+------+-------------------------------------------+
| Level   | Code | Message                                   |
+---------+------+-------------------------------------------+
| Warning | 1265 | Data truncated for column 'name' at row 1 |
+---------+------+-------------------------------------------+
1 row in set (0.00 sec)
 
mysql> insert into tt_gb2312 values(1, 0xCDFECDFE);
Query OK, 1 row affected (0.00 sec)
 
mysql> select * from tt_gb2312;
+------+------+
| id   | name |
+------+------+
|    1 |      |
|    1 | ���� |
+------+------+
2 rows in set (0.00 sec)

mysql> select charset(name) from tt_gb2312;
+---------------+
| charset(name) |
+---------------+
| gb2312        |
| gb2312        |
+---------------+
2 rows in set (0.00 sec)
 
mysql> select hex(name) from tt_gb2312;
+-----------+
| hex(name) |
+-----------+
|           |
| CDFECDFE  |
+-----------+
2 rows in set (0.08 sec)

So, why the first value gives that warning, whyle second, of the same length, inserted without any problem? Looks like a bug that should be invetigated by developers.
[19 May 2006 8:11] Alexander Barkov
Dear Zhang Min,

This is not a bug. A correct gb2312 double byte character can only
consist of these bytes:

[0xA1..0xF7][0xA1..0xFE]

0x9AF6 is not a correct gb2312 character.

Most likely you need to use the "gbk" character set instead.
[1 Nov 2006 5:07] Jiangtao Zhou
Hi,
I've the similiar problem and change into gbk indeed helps. But for those Chinese chars,in gbk it will replace with some different chars such as "?", but why in IE and choose "encoding" as "gb2312", those chars can be displayed correctly? what charset IE browser used for gb2312? any equivalent in mysql?