| Bug #40698 | after binary to varchar convertion varchar isn't variable | ||
|---|---|---|---|
| Submitted: | 13 Nov 2008 13:43 | Modified: | 17 Nov 2008 17:13 |
| Reporter: | Susanne Ebrecht | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Documentation | Severity: | S3 (Non-critical) |
| Version: | OS: | Any | |
| Assigned to: | Paul DuBois | CPU Architecture: | Any |
[13 Nov 2008 13:46]
Susanne Ebrecht
This is related to http://bugs.mysql.com/bug.php?id=21508. There you will read that you should use VARBINARY instead of BINARY. The documentation is confusing here: http://dev.mysql.com/doc/refman/5.1/en/charset-conversion.html http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html It would be nice when there would not only be an example for BINARY and CHAR. An example for VARBINARY and VARCHAR would help here to not confuse the users.
[17 Nov 2008 17:13]
Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products. I've done two things: 1) Updated the examples to use variable-length binary types, thus not introducing the pad-with-0x00 problem in the first place. 2) Pointed out that if you do use BINARY, you can remove trailing 0x00 after the conversion like this: UPDATE t SET col1 = TRIM(TRAILING 0x00 FROM col1);

Description: CREATE TABLE t(v VARCHAR(100) CHARSET latin1); Let's say you environment is set to UTF8 but MySQL Client is set to LATIN1 (because you forgot to use SET NAMES utf8). Produce wrong data: INSERT INTO t VALUES('Bär'); Now use proper session settings: SET NAMES utf8; SELECT * FROM t; +--------+ | v | +--------+ | Bär | +--------+ SELECT length(v) FROM t; +-----------+ | length(v) | +-----------+ | 4 | +-----------+ Because column is latin1 this should be 3 and not 4. 2 bytes are used for the the 'ä' instead of one byte. Now follow the documentation: http://dev.mysql.com/doc/refman/5.1/en/charset-conversion.html http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html ALTER TABLE t MODIFY v BINARY(100); ALTER TABLE t MODIFY v VARCHAR(100) CHARSET utf8; SELECT * FROM t; +------------------------------------------------------+ | v | +------------------------------------------------------+ | Bär | +------------------------------------------------------+ Seems that conversion worked as expected. SELECT length(v) FROM t; +-----------+ | length(v) | +-----------+ | 100 | +-----------+ Why is it 100 and not 4? For comparing: INSERT INTO t VALUES('Bär'); SHOW CREATE TABLE t; ... `v` varchar(100) CHARACTER SET utf8 DEFAULT NULL ... So the 'ä' should have two bytes ... SELECT length(v) FROM t; +-----------+ | length(v) | +-----------+ | 100 | | 4 | +-----------+ SELECT hex(v) FROM t; +--------------------------------------------------------+ | hex(v) | +--------------------------------------------------------+ | 42C3A4720000000000000000000000000000000000000000000000 | | 42C3A472 | +--------------------------------------------------------+ To make a proper display here I just cutted tons of '0' and tons of '-' signs. Result for the binary to varchar converted text is: 42C3A472 followed from 192 times '0' (I hope that I counted right here) This means that at the conversion from Binary into Varchar it just converted from Binary into Char and the non used signed wouldn't get removed. How to repeat: See above Suggested fix: I have no clue if this is data type related, conversion related or ALTER TABLE related.