MySQL Bugs: #65676: character_set_connection probably useless

Bug #65676	character_set_connection probably useless
Submitted:	19 Jun 2012 18:50	Modified:	30 Jan 2013 18:23
Reporter:	Loic Etienne	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Charsets	Severity:	S3 (Non-critical)
Version:	5.1.49	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Logically, only two character sets are needed (for writing):
 * character_set_client (source)
 * character set of each pertained column (target)

An intermediary conversion with character_set_connection seems to serve no purpose.

It seems (blogs, docu) that character_set_connection and character_set_client should be identical anyway.

If character_set_connection is more restrictive than necessary, unmappable characters are replaced with '?'. It damages the data unnecessarily (and silently).

Anyway, it is really confusing.

How to repeat:
SET sql_mode = 'STRICT_ALL_TABLES';

SET character_set_client = utf8;
SET character_set_results = utf8;
SET character_set_connection = latin1;

DROP TABLE IF EXISTS t;
CREATE TABLE t (
    s VARCHAR(20) CHARACTER SET utf8 
) ENGINE=InnoDB;

INSERT INTO t (s)
VALUES ('Zürich'); -- editor in utf8 required

SELECT s , HEX(s) FROM t;

-- and repeat it with character_set_connection = utf8 instead of latin1
-- the result are identical

Suggested fix:
Remove character_set_connection

Or explain what character_set_connection is good for, or, in other words, why character_set_client alone is not sufficient.

I posted this:
http://bugs.mysql.com/bug.php?id=65681

(as I wanted to extend the discussion to include "character_set_results" as well)

Thank you for the report.

Please provide output of SELECT s , HEX(s) FROM t; in your environment: I want to compare with mine.

Output with character_set_connection = latin1

Attachment: con_lat1.txt (text/plain), 32 bytes.

I just uploaded con_lat1.txt.
I did not upload con_utf8.txt (with character_set_connection = utf8), since it is absolutely identical (I verified it with hexdump).

Note, however, that using characters not in latin1 (for instance u tilde, 'ũ'), would produce different results, because the intermediary conversion fails and uses '?' as replacement character.

Thank you for the feedback.

This is not a bug.

See at http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_character_set_c...: "The character set for the current connection. Used for literals that do not have an explicit character set specification and for number-to-string conversion. "

So you would see effect of this option only if run tests like this:

DROP TABLE IF EXISTS t;
CREATE TABLE t (
    s VARCHAR(20) CHARACTER SET utf8 collate utf8_swedish_ci
) ENGINE=InnoDB;

INSERT INTO t (s)
VALUES ('Zürich');

SELECT s , HEX(s) FROM t;

SET character_set_connection = utf8;
select * from t where s < cast(10 as char);

Note custom collation in the table.

Couldn't we ask for a documentation update covering both this and my bug report?

It is absolutely not clear to most users what exactly c_collection and c_result do, when and where conversions take place etc.

Peter,

of course you could, but I think that http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_character_set_c...: "The character set for the current connection. Used for literals that do not have an explicit character set specification and for number-to-string conversion. " clearly describes purpose of this variable.