Bug #1414 UTF8 AND CHAR_LENGTH()
Submitted: 26 Sep 2003 17:31 Modified: 7 Oct 2003 20:25
Reporter: Pigo Chu Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1 OS:Linux (Linux)
Assigned to: Alexander Barkov CPU Architecture:Any

[26 Sep 2003 17:31] Pigo Chu
Description:
if the table define utf8 , i input a Chinese-UTF8 string in a varchar field,the string is "浓ª3".
then I type command : select CHAR_LENGTH(field_name) FROM table_name

It return length is 7 ... but my string is two chinese words + "3" . The length should be 3 right ? but it return 7

How to repeat:
email
[30 Sep 2003 2:47] Alexander Barkov
To give you full answer, I need the following output:

SHOW CREATE TABLE table_name;
SELECT CHAR_LENGTH(field_name), HEX(field_name), charset(field_name)
FROM table_name;

Thanks!
[1 Oct 2003 11:32] Pigo Chu
CREATE TABLE `user_data` (
  `data_id` int(10) unsigned NOT NULL auto_increment,
  `user_id` int(11) unsigned NOT NULL default '0',
  `real_name` varchar(64) NOT NULL default '',
  `email` varchar(128) NOT NULL default '',
  `nickname` varchar(64) NOT NULL default '',
  `point` int(10) unsigned NOT NULL default '0',
  `contact_tel` varchar(32) NOT NULL default '',
  `contact_address` varchar(255) NOT NULL default '',
  `cellular_phone` varchar(32) NOT NULL default '',
  `identity_card` varchar(10) NOT NULL default '',
  PRIMARY KEY  (`data_id`),
  UNIQUE KEY `user_id` (`user_id`),
  UNIQUE KEY `email` (`email`)
) TYPE=InnoDB CHARSET=utf8 PACK_KEYS=0

+-----------------------+----------------------------+-------------------+
| CHAR_LENGTH(nickname) | HEX(nickname)              | charset(nickname) |
+-----------------------+----------------------------+-------------------+
|                     7 | C3A8C2B1C2ACC3A9C2A0C2AD33 | utf8              |
|                     5 | 5269766572                 | utf8              |
+-----------------------+----------------------------+-------------------+
[2 Oct 2003 23:03] Alexander Barkov
The data doesn't look right. How do you insert it?
What is the output of:

SHOW VARIABLES LIKE 'character_set%';
SELECT CHARSET("浓ª3");
[5 Oct 2003 7:16] Pigo Chu
I am a Taiwiner .
I don't know how to tell u.

My string is big5 , ASCII is BDDEC05933 .
But my web page is defined "utf-8"
So the big5 string will auto convert to utf8 in IE6 , and send to php.
I don't use PHP iconv, I just insert the utf8 string to mysql.
[5 Oct 2003 7:20] Pigo Chu
SHOW VARIABLES LIKE 'character_set%';

character_sets | big5 latin2_czech_ci dec8_swedish_ci cp850_general_ci latin1_german1_ci hp8_english_ci koi8r_general_ci latin1_swedish_ci latin2_general_ci swe7_swedish_ci ascii_general_ci ujis sjis cp1251_bulgarian_ci latin1_danish_ci hebrew tis620 euckr latin7_estonian_ci latin2_hungarian_ci koi8u_general_ci cp1251_ukrainian_ci gb2312 greek cp1250_general_ci latin2_croatian_ci gbk cp1257_lithuanian_ci latin5_turkish_ci latin1_german2_ci armscii8_general_ci utf8 cp1250_czech_ci ucs2 cp866_general_ci keybcs2 macce macroman cp852_general_ci latin7_general_ci latin7_general_cs macce_bin macce_ci macce_cs latin1_bin latin1_general_ci latin1_general_cs cp1251_bin cp1251_general_ci cp1251_general_cs macroman_bin macroman_ci macroman_cs cp1256_general_ci cp1257_bin cp1257_ci_ai cp1257_ci cp1257_cs binary armscii_bin ascii_bin cp1250_bin cp1256_bin cp866_bin dec8_bin greek_bin hebrew_bin hp8_bin keybcs2_bin koi8r_bin koi8u_bin latin2_bin latin5_bin latin7_bin cp850_bin cp852_bin swe7_bin utf8_bin

mysql> SELECT CHARSET("aì“a3");
+--------------------+
| CHARSET("aì“a3") |
+--------------------+
| latin1             |
+--------------------+
[6 Oct 2003 3:41] Alexander Barkov
Server thinks your client works in latin1 character set, not in utf8.
There are several workarounds.
1. Try to use character set introducer:

INSERT INTO user_data (nickname)  VALUES (_utf8'aЛ⌠a3')

2. Try to specify client's character set just after connecting using:

SET NAMES utf8;

I'm not sure the last will work for you, there were several charset
related bugs in 4.1.0 which have already been fixed in 4.1.1.

Please give feedback.
[7 Oct 2003 14:58] Pigo Chu
Yes . I can work now , Thank Q.

U teach me 2 functions can work.

If I don't use "SET NAMES utf8",I will get the result(from PHP) ASCII  is "3f3f33".
So I must use "SET NAMES uft8" after connection , and CHAR_LENGTH() will return "3".

Thank Q ^^
[7 Oct 2003 15:00] Pigo Chu
In console mode

SELECT CHAR_LENGTH(nickname), HEX(nickname), charset(nickname) from user_data;
+-----------------------+----------------+-------------------+
| CHAR_LENGTH(nickname) | HEX(nickname)  | charset(nickname) |
+-----------------------+----------------+-------------------+
|                     3 | E8B1ACE9A0AD33 | utf8              |
+-----------------------+----------------+-------------------+

Its ok now