Bug #36502 Description of "useUnicode" and "characterEncoding" options unclear
Submitted: 5 May 2008 9:55 Modified: 19 Jan 2010 16:32
Reporter: David Tonhofer Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Documentation Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Tony Bedford CPU Architecture:Any

[5 May 2008 9:55] David Tonhofer
Description:
At 

http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-configuration-properties.html

the description of option "useUnicode" is:

"Should the driver use Unicode character encodings when handling strings? Should only be used when the driver can't determine the character set mapping, or you are trying to 'force' the driver to use a character set that MySQL either doesn't natively support (such as UTF-8), true/false, defaults to 'true'"

So it seems that part of the phrase is missing. Moreover the default for this option is "true" which is in contradiction with the text.

Also, it is not clear what the implications are on the "characterEncoding" option:

"If 'useUnicode' is set to true, what character encoding should the driver use when dealing with strings?"

I would say if "useUnicode" is true, then you do not need "characterEncoding". Should the text maybe say if "useUnicode is set to false"?

How to repeat:
n/a

Suggested fix:
n/a
[20 Jul 2009 10:20] Tonci Grgin
Assigning to Docs team for checking.
[18 Jan 2010 14:22] Tony Bedford
This change is required in the Java source, not the documentation team's XML source.
[18 Jan 2010 15:17] Grace Chan
Hihi, 
Like David, I found difficult to read the definition.
With your reply, I try to review the statement in the documentation again.

Question 1: May you reconfirm if I understand correct? Is the description written in a Q & A format?

useUnicode 
<>Question
Should the driver use Unicode character encodings when handling strings? 
<>Answer
Should only be used when 
<>Case (i)
the driver can't determine the character set mapping, or 
<>Case (ii)
you are trying to 'force' the driver to use a character set that MySQL either doesn't natively support (such as UTF-8), 
true/false, defaults to 'true' 

characterEncoding 
<>Question
If 'useUnicode' is set to true, what character encoding should the driver use when dealing with strings? 
<>Answer
(defaults is to 'autodetect')

Question 2: Operation of “useUnicode=true”
By default, this parameter is assigned to true, does that means if 
in Case (i), if the driver can’t determine the string’s character set, Unicode will be used?
in Case (ii), if the driver is assigned to use another character set* (other than UTF-8) and can’t determine the string’s character set, Unicode will be used?

Question 3: Operation of “characterEncoding =XX” or “without specify characterEncoding”
The purpose of useUnicode is post-handling of string handling, i.e. specified by characterEncoding. If the initial handling is failed, Unicode will be used afterwards.
Does that mean setting over ‘useUnicode’ does not interfere with that of characterEncoding?

Question 4: MySQL natively support (such as UTF-8). I find that we still need to specify “characterEncoding=UTF-8”. 
Does that right?

Thank you for your kind attention.
Yours,
Grace
[19 Jan 2010 16:32] MC Brown
The properties files that contain this information have been updated. The updated versions should appear in overnight rebuilds of the documentation.
[21 Jan 2010 1:21] Grace Chan
Dear Mr. Brown,

Thanks for your prompt & detailed explanation. I have viewed it at 
http://dev.mysql.com/doc/refman/5.1/en/connector-j-reference-configuration-properties.html

I, now, understand why. The encoding set of the data I used was UTF-8.

"MySQL either doesn't natively support (such as UTF-8),"
=> "the driver can't determine the character set mapping ", i.e. UTF-8

As UTF-8 is not initially supported, I have to particularly specify 
characterEncoding property to be "UTF-8", also useUnicode to be "false".

Thanks a lot.

Yours,
Grace