Bug #5081 UCS2 fields are filled with '0x2020' after extending field length
Submitted: 18 Aug 2004 4:13 Modified: 19 Aug 2004 13:13
Reporter: Shuichi Tamagawa Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.3-beta OS:Windows (WinXP Pro / SuSE Linux 9.0)
Assigned to: Alexander Barkov CPU Architecture:Any

[18 Aug 2004 4:13] Shuichi Tamagawa
Description:
If the character set of the field is UCS2 and the field length is extend by 'ALTER TABLE' statement, the extended part are filled with '0x2020'.

How to repeat:
mysql> show variables like 'char%';
+--------------------------+----------------------------------------------+
| Variable_name            | Value                                        |
+--------------------------+----------------------------------------------+
| character_set_client     | ujis                                         |
| character_set_connection | ujis                                         |
| character_set_database   | ucs2                                         |
| character_set_results    | ujis                                         |
| character_set_server     | ucs2                                         |
| character_set_system     | utf8                                         |
| character_sets_dir       | /usr/local/mysql/41030/share/mysql/charsets/ |
+--------------------------+----------------------------------------------+
7 rows in set (0.01 sec)

mysql> create table t1(a char(1)) default charset = ucs2;
Query OK, 0 rows affected (0.14 sec)

mysql> insert into t1 values ('a'),('b'),('c');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0
mysql> select * from t1;
+------+
| a    |
+------+
| a    |
| b    |
| c    |
+------+
3 rows in set (0.00 sec)

mysql> alter table t1 modify a char(5);
Query OK, 3 rows affected (0.48 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select a, hex(a) from t1;
+-----------+----------------------+
| a         | hex(a)               |
+-----------+----------------------+
| a†††† | 00612020202020202020 |
| b†††† | 00622020202020202020 |
| c†††† | 00632020202020202020 |
+-----------+----------------------+
3 rows in set (0.05 sec)

Suggested fix:
Leave the extended part blank.
[18 Aug 2004 21:10] MySQL Verification Team
This is expected behaviour.

0x20202020 blanks. 

Take a look at Field_string::store()
[18 Aug 2004 22:27] Shuichi Tamagawa
Example: character set = ucs2

Attachment: ucs2.JPG (image/pjpeg, text), 59.12 KiB.

[18 Aug 2004 22:27] Shuichi Tamagawa
Example: character set = utf8

Attachment: utf8.JPG (image/pjpeg, text), 52.54 KiB.

[18 Aug 2004 22:27] Shuichi Tamagawa
Example: character set = latin1

Attachment: latin1.JPG (image/pjpeg, text), 56.12 KiB.

[18 Aug 2004 22:40] Shuichi Tamagawa
Please take a look at the attached files(ucs2.jpg). You can see that 0x2020 is not blank. There should be no filler like tha case of other character set(utf8.jpg, latin1.jpg).
[18 Aug 2004 23:51] Sergei Golubchik
I still think that the filler should be a space as in any other charset - that is 0x0020 in UCS2 :)

0x2020 is "Dagger" character
[18 Aug 2004 23:52] Sergei Golubchik
oops, I was replying to the Sinisa.
Shuichi, you're right, of course
[19 Aug 2004 13:13] Alexander Barkov
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html