Bug #65676 character_set_connection probably useless
Submitted: 19 Jun 2012 18:50 Modified: 30 Jan 2013 18:23
Reporter: Loic Etienne Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.1.49 OS:Any
Assigned to: CPU Architecture:Any

[19 Jun 2012 18:50] Loic Etienne
Description:
Logically, only two character sets are needed (for writing):
 * character_set_client (source)
 * character set of each pertained column (target)

An intermediary conversion with character_set_connection seems to serve no purpose.

It seems (blogs, docu) that character_set_connection and character_set_client should be identical anyway.

If character_set_connection is more restrictive than necessary, unmappable characters are replaced with '?'. It damages the data unnecessarily (and silently).

Anyway, it is really confusing.

How to repeat:
SET sql_mode = 'STRICT_ALL_TABLES';

SET character_set_client = utf8;
SET character_set_results = utf8;
SET character_set_connection = latin1;

DROP TABLE IF EXISTS t;
CREATE TABLE t (
    s VARCHAR(20) CHARACTER SET utf8 
) ENGINE=InnoDB;

INSERT INTO t (s)
VALUES ('Zürich'); -- editor in utf8 required

SELECT s , HEX(s) FROM t;

-- and repeat it with character_set_connection = utf8 instead of latin1
-- the result are identical

Suggested fix:
Remove character_set_connection

Or explain what character_set_connection is good for, or, in other words, why character_set_client alone is not sufficient.
[20 Jun 2012 8:23] Peter Laursen
I posted this:
http://bugs.mysql.com/bug.php?id=65681

(as I wanted to extend the discussion to include "character_set_results" as well)
[20 Jun 2012 18:32] Sveta Smirnova
Thank you for the report.

Please provide output of SELECT s , HEX(s) FROM t; in your environment: I want to compare with mine.
[21 Jun 2012 8:28] Loic Etienne
Output with character_set_connection = latin1

Attachment: con_lat1.txt (text/plain), 32 bytes.

[21 Jun 2012 8:34] Loic Etienne
I just uploaded con_lat1.txt.
I did not upload con_utf8.txt (with character_set_connection = utf8), since it is absolutely identical (I verified it with hexdump).

Note, however, that using characters not in latin1 (for instance u tilde, 'ũ'), would produce different results, because the intermediary conversion fails and uses '?' as replacement character.
[30 Jan 2013 18:23] Sveta Smirnova
Thank you for the feedback.

This is not a bug.

See at http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_character_set_c...: "The character set for the current connection. Used for literals that do not have an explicit character set specification and for number-to-string conversion. "

So you would see effect of this option only if run tests like this:

DROP TABLE IF EXISTS t;
CREATE TABLE t (
    s VARCHAR(20) CHARACTER SET utf8 collate utf8_swedish_ci
) ENGINE=InnoDB;

INSERT INTO t (s)
VALUES ('Zürich');

SELECT s , HEX(s) FROM t;

SET character_set_connection = utf8;
select * from t where s < cast(10 as char);

Note custom collation in the table.
[31 Jan 2013 3:51] Peter Laursen
Couldn't we ask for a documentation update covering both this and my bug report?

It is absolutely not clear to most users what exactly c_collection and c_result do, when and where conversions take place etc.
[15 Feb 2013 13:46] Sveta Smirnova
Peter,

of course you could, but I think that http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_character_set_c...: "The character set for the current connection. Used for literals that do not have an explicit character set specification and for number-to-string conversion. " clearly describes purpose of this variable.