Bug #40698 after binary to varchar convertion varchar isn't variable
Submitted: 13 Nov 2008 13:43 Modified: 17 Nov 2008 17:13
Reporter: Susanne Ebrecht Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Paul DuBois CPU Architecture:Any

[13 Nov 2008 13:43] Susanne Ebrecht
Description:
CREATE TABLE t(v VARCHAR(100) CHARSET latin1);

Let's say you environment is set to UTF8 but MySQL Client is set to LATIN1 (because you forgot to use SET NAMES utf8).

Produce wrong data:
INSERT INTO t VALUES('Bär');

Now use proper session settings:

SET NAMES utf8;

SELECT * FROM t;
+--------+
| v      |
+--------+
| Bär | 
+--------+

SELECT length(v) FROM t;
+-----------+
| length(v) |
+-----------+
|         4 | 
+-----------+

Because column is latin1 this should be 3 and not 4. 2 bytes are used for the the 'ä' instead of one byte.

Now follow the documentation:
http://dev.mysql.com/doc/refman/5.1/en/charset-conversion.html
http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html

ALTER TABLE t MODIFY v BINARY(100);
ALTER TABLE t MODIFY v VARCHAR(100) CHARSET utf8;

SELECT * FROM t;
+------------------------------------------------------+
| v                                                                                                    |
+------------------------------------------------------+
| Bär                                                                                                 | 
+------------------------------------------------------+

Seems that conversion worked as expected.

SELECT length(v) FROM t;
+-----------+
| length(v) |
+-----------+
|       100 | 
+-----------+

Why is it 100 and not 4?

For comparing:

INSERT INTO t VALUES('Bär');

SHOW CREATE TABLE t;
...
`v` varchar(100) CHARACTER SET utf8 DEFAULT NULL
...

So the 'ä' should have two bytes ...

SELECT length(v) FROM t;

+-----------+
| length(v) |
+-----------+
|       100 | 
|         4 | 
+-----------+

SELECT hex(v) FROM t;
+--------------------------------------------------------+
| hex(v)                                                                                                                                                                                                   |
+--------------------------------------------------------+
| 42C3A4720000000000000000000000000000000000000000000000 | 
| 42C3A472                                                                                                                                                                                                 | 
+--------------------------------------------------------+

To make a proper display here I just cutted tons of '0' and tons of '-' signs.

Result for the binary to varchar converted text is:
42C3A472 followed from 192 times '0' (I hope that I counted right here)

This means that at the conversion from Binary into Varchar it just converted from Binary into Char and the non used signed wouldn't get removed.

How to repeat:
See above

Suggested fix:
I have no clue if this is data type related, conversion related or ALTER TABLE related.
[13 Nov 2008 13:46] Susanne Ebrecht
This is related to http://bugs.mysql.com/bug.php?id=21508.

There you will read that you should use VARBINARY instead of BINARY.

The documentation is confusing here:
http://dev.mysql.com/doc/refman/5.1/en/charset-conversion.html
http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html

It would be nice when there would not only be an example for BINARY and CHAR. An example for VARBINARY and VARCHAR would help here to not confuse the users.
[17 Nov 2008 17:13] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.

I've done two things:

1) Updated the examples to use variable-length binary types, thus not introducing the pad-with-0x00 problem in the first place.

2) Pointed out that if you do use BINARY, you can remove trailing 0x00 after the conversion like this:

UPDATE t SET col1 = TRIM(TRAILING 0x00 FROM col1);