| Bug #65676 | character_set_connection probably useless | ||
|---|---|---|---|
| Submitted: | 19 Jun 2012 18:50 | Modified: | 30 Jan 2013 18:23 |
| Reporter: | Loic Etienne | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
| Version: | 5.1.49 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[20 Jun 2012 8:23]
Peter Laursen
I posted this: http://bugs.mysql.com/bug.php?id=65681 (as I wanted to extend the discussion to include "character_set_results" as well)
[20 Jun 2012 18:32]
Sveta Smirnova
Thank you for the report. Please provide output of SELECT s , HEX(s) FROM t; in your environment: I want to compare with mine.
[21 Jun 2012 8:28]
Loic Etienne
Output with character_set_connection = latin1
Attachment: con_lat1.txt (text/plain), 32 bytes.
[21 Jun 2012 8:34]
Loic Etienne
I just uploaded con_lat1.txt. I did not upload con_utf8.txt (with character_set_connection = utf8), since it is absolutely identical (I verified it with hexdump). Note, however, that using characters not in latin1 (for instance u tilde, 'ũ'), would produce different results, because the intermediary conversion fails and uses '?' as replacement character.
[30 Jan 2013 18:23]
Sveta Smirnova
Thank you for the feedback. This is not a bug. See at http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_character_set_c...: "The character set for the current connection. Used for literals that do not have an explicit character set specification and for number-to-string conversion. " So you would see effect of this option only if run tests like this: DROP TABLE IF EXISTS t; CREATE TABLE t ( s VARCHAR(20) CHARACTER SET utf8 collate utf8_swedish_ci ) ENGINE=InnoDB; INSERT INTO t (s) VALUES ('Zürich'); SELECT s , HEX(s) FROM t; SET character_set_connection = utf8; select * from t where s < cast(10 as char); Note custom collation in the table.
[31 Jan 2013 3:51]
Peter Laursen
Couldn't we ask for a documentation update covering both this and my bug report? It is absolutely not clear to most users what exactly c_collection and c_result do, when and where conversions take place etc.
[15 Feb 2013 13:46]
Sveta Smirnova
Peter, of course you could, but I think that http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_character_set_c...: "The character set for the current connection. Used for literals that do not have an explicit character set specification and for number-to-string conversion. " clearly describes purpose of this variable.

Description: Logically, only two character sets are needed (for writing): * character_set_client (source) * character set of each pertained column (target) An intermediary conversion with character_set_connection seems to serve no purpose. It seems (blogs, docu) that character_set_connection and character_set_client should be identical anyway. If character_set_connection is more restrictive than necessary, unmappable characters are replaced with '?'. It damages the data unnecessarily (and silently). Anyway, it is really confusing. How to repeat: SET sql_mode = 'STRICT_ALL_TABLES'; SET character_set_client = utf8; SET character_set_results = utf8; SET character_set_connection = latin1; DROP TABLE IF EXISTS t; CREATE TABLE t ( s VARCHAR(20) CHARACTER SET utf8 ) ENGINE=InnoDB; INSERT INTO t (s) VALUES ('Zürich'); -- editor in utf8 required SELECT s , HEX(s) FROM t; -- and repeat it with character_set_connection = utf8 instead of latin1 -- the result are identical Suggested fix: Remove character_set_connection Or explain what character_set_connection is good for, or, in other words, why character_set_client alone is not sufficient.