Bug #9357 TEXT columns break string with special word in BIG5 charset.
Submitted: 23 Mar 2005 12:57 Modified: 26 Apr 2005 0:10
Reporter: Sasimicat Wang Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1.10 OS:Any (all)
Assigned to: Alexander Barkov CPU Architecture:Any

[23 Mar 2005 12:57] Sasimicat Wang
Description:
I found that an word "裏" in big5 charset will cause some problems in TEXT column.
This multibyte word "裏"'s ASCII code is 0D 0A.(same as \r\n, or CrLF)

BLOB column test:(every thing is ok)
CREATE TABLE ctest1(aaa blob);
INSERT INTO ctest1 VALUES('test word:裏.1234567890abcdefghijk');
mysql> SELECT * from ctest1;
+------------------------------------+
| aaa                                |
+------------------------------------+
| test word:裏.1234567890abcdefghijk |
+------------------------------------+

TEXT colomn test:
CREATE TABLE ctest(aaa text);
INSERT INTO ctest VALUES('test word:裏.1234567890abcdefghijk');
mysql> SELECT * from ctest;
+------------+
| aaa        |
+------------+
| test word: |
+------------+

I tested it on Mysql server 4.1.10 and 4.1.7, running on serveral different OS(Server and Client's characterset are big5). Mysql server running in latin1 characterset doen't find this problem.

How to repeat:
in Mysql Server 4.1.7 or 4.1.10, characterset=BIG5

CREATE TABLE ctest(aaa text);
INSERT INTO ctest VALUES('test word:裏.1234567890abcdefghijk');

This multibyte word "裏"'s ASCII code is 0D 0A.(same as \r\n, or CrLF)

mysql> SELECT * from ctest;
+------------+
| aaa        |
+------------+
| test word: |
+------------+
[23 Mar 2005 17:43] Jose Alejandro Guizar
Looks to me like mysql is trying to encode your input into some other charset before inserting it into the table. And it finds your input invalid somehow. Try doing:

SET NAMES BIG5;
CREATE TABLE T (A TEXT) CHARSET=BIG5;

and then try inserting your data again to see if it works. Also make sure that whatever program you're using to insert the data can handle BIG5.

The reason it works if it's in latin1 is that every byte value is valid, so mysql saves your data 'as is' and doesn't perform any conversions on it.
[24 Mar 2005 2:31] Sasimicat Wang
Sorry I've made a mistake,
the word "裏"'s ascii code is F9 D8, not 0D 0A.
I tried to create table with CHARSET=BIG5 setting, 
but still the same.
And I found that VARCHAR, CHAR column has the same problem.

It seems strings/ctype-big5.c 's problem.....
[24 Mar 2005 3:55] Jorge del Conde
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

mysql> CREATE TABLE ctest(aaa text);
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO ctest VALUES('test word:裏.1234567890abcdefghijk');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * from ctest;
+------------------------------------------+
| aaa                                      |
+------------------------------------------+
| test word:裏.1234567890abcdefghijk |
+------------------------------------------+
1 row in set (0.00 sec)

mysql> drop table ctest;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE ctest(aaa text) charset=big5;
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO ctest VALUES('test word:裏.1234567890abcdefghijk');
Query OK, 1 row affected (0.00 sec)

mysql> select * from ctest;
+------------------------------------------+
| aaa                                      |
+------------------------------------------+
| test word:裏.1234567890abcdefghijk |
+------------------------------------------+
1 row in set (0.00 sec)

mysql>
[24 Mar 2005 12:06] Sergei Golubchik
You are right, MySQL does not consider 0xF9D8 a valid Big5 character because this character is not strictly speaking in Big5 according to

ftp://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT

Looks like it's part of Big5 extension.

I assigned this bug to our charset developer for comments.
[25 Mar 2005 12:49] Alexander Barkov
Fixed in 4.1.11 and 5.0.4.
[25 Mar 2005 12:52] Alexander Barkov
Information for documenting:

Fixed that extra HKSCS and cp950 characters were not accepted into a Big5 column.
[26 Apr 2005 0:10] Paul DuBois
Noted in 4.1.11, 5.0.4 changelogs.