MySQL Bugs: #36502: Description of "useUnicode" and "characterEncoding" options unclear

Bug #36502	Description of "useUnicode" and "characterEncoding" options unclear
Submitted:	5 May 2008 9:55	Modified:	19 Jan 2010 16:32
Reporter:	David Tonhofer	Email Updates:
Status:	Closed	Impact on me:	None
Category:	Connector / J Documentation	Severity:	S3 (Non-critical)
Version:		OS:	Any
Assigned to:	Tony Bedford	CPU Architecture:	Any

Description:
At 

http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-configuration-properties.html

the description of option "useUnicode" is:

"Should the driver use Unicode character encodings when handling strings? Should only be used when the driver can't determine the character set mapping, or you are trying to 'force' the driver to use a character set that MySQL either doesn't natively support (such as UTF-8), true/false, defaults to 'true'"

So it seems that part of the phrase is missing. Moreover the default for this option is "true" which is in contradiction with the text.

Also, it is not clear what the implications are on the "characterEncoding" option:

"If 'useUnicode' is set to true, what character encoding should the driver use when dealing with strings?"

I would say if "useUnicode" is true, then you do not need "characterEncoding". Should the text maybe say if "useUnicode is set to false"?

How to repeat:
n/a

Suggested fix:
n/a

Assigning to Docs team for checking.

This change is required in the Java source, not the documentation team's XML source.

Hihi, 
Like David, I found difficult to read the definition.
With your reply, I try to review the statement in the documentation again.

Question 1: May you reconfirm if I understand correct? Is the description written in a Q & A format?

useUnicode 
<>Question
Should the driver use Unicode character encodings when handling strings? 
<>Answer
Should only be used when 
<>Case (i)
the driver can't determine the character set mapping, or 
<>Case (ii)
you are trying to 'force' the driver to use a character set that MySQL either doesn't natively support (such as UTF-8), 
true/false, defaults to 'true' 

characterEncoding 
<>Question
If 'useUnicode' is set to true, what character encoding should the driver use when dealing with strings? 
<>Answer
(defaults is to 'autodetect')

Question 2: Operation of “useUnicode=true”
By default, this parameter is assigned to true, does that means if 
in Case (i), if the driver can’t determine the string’s character set, Unicode will be used?
in Case (ii), if the driver is assigned to use another character set* (other than UTF-8) and can’t determine the string’s character set, Unicode will be used?

Question 3: Operation of “characterEncoding =XX” or “without specify characterEncoding”
The purpose of useUnicode is post-handling of string handling, i.e. specified by characterEncoding. If the initial handling is failed, Unicode will be used afterwards.
Does that mean setting over ‘useUnicode’ does not interfere with that of characterEncoding?

Question 4: MySQL natively support (such as UTF-8). I find that we still need to specify “characterEncoding=UTF-8”. 
Does that right?

Thank you for your kind attention.
Yours,
Grace

The properties files that contain this information have been updated. The updated versions should appear in overnight rebuilds of the documentation.

Dear Mr. Brown,

Thanks for your prompt & detailed explanation. I have viewed it at 
http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-configuration-properties.html

I, now, understand why. The encoding set of the data I used was UTF-8.

"MySQL either doesn't natively support (such as UTF-8),"
=> "the driver can't determine the character set mapping ", i.e. UTF-8

As UTF-8 is not initially supported, I have to particularly specify 
characterEncoding property to be "UTF-8", also useUnicode to be "false".

Thanks a lot.

Yours,
Grace