Bug #62728 ODBC Connector 5.1 utf8 stored incorrectly
Submitted: 13 Oct 2011 22:59 Modified: 7 Nov 2011 10:51
Reporter: alon s Email Updates:
Status: Verified Impact on me:
None 
Category:Connector / ODBC Severity:S2 (Serious)
Version:5.1.8 OS:Microsoft Windows
Assigned to: Bogdan Degtyariov CPU Architecture:Any
Tags: ASP, odbc 5.1, utf8

[13 Oct 2011 22:59] alon s
Description:
In some cases utf8 data is not stored correctly in the database.

I have a varchar column and i'm using classic asp.
All table collation and server collations and charsets are utf8_general_ci

How to repeat:
I tried to input foreign language data in 4 ways as follows:
(the hard coded foreign language text in all examples is utf8 because the .asp file is saved in utf-8 encoding)

1. with recordset, without charset=utf8 parameter

    SQL = "SELECT 1 FROM test WHERE 1=0"
    Set rs1 = CreateObject ("ADODB.Recordset")
   
    rs1.CursorType = 0
    rs1.LockType = 3

    ' cn1 is a connection string WITHOUT charset=utf8 parameter
    rs1.open SQL, cn1

    rs1.addNew

    rs1("test") = "Foreign Language Text in hebrew"

    rs1.update

2. with recordset, with charset=utf8 parameter
 
    SQL = "SELECT 1 FROM test WHERE 1=0"
    Set rs1 = CreateObject ("ADODB.Recordset")
   
    rs1.CursorType = 0
    rs1.LockType = 3

    ' cn1 is a connection string WITH charset=utf8 parameter
    rs1.open SQL, cn2

    rs1.addNew

    rs1("test") = "Foreign Language Text in hebrew"

    rs1.update

3. without recordset, without charset=utf8

   SQL = "INSERT INTO test (test) VALUES ('Foreign Language Text in hebrew)"
   cn1.execute SQL

4. without recordset, with charset=utf8

   SQL = "INSERT INTO test (test) VALUES ('Foreign Language Text in hebrew)"
   cn2.execute SQL

Only option 2 is storing the utf8 data currectly.
(verified by viewing the hex data of the column and trying to decode it with a hex decoder, and also by comparing the length of the string in the column in all 4 tests)

THE QUESTION: why doesnt option 4 store the data currectly along with option 2?

by the way, the same this occours ofcourse if i have a text box that i submit data into and my page has a meta tag of utf8. its the same thing as my hard coded text example.
[13 Oct 2011 23:02] alon s
connection definitions:

cn1
----
Set cn1 = CreateObject ("ADODB.Connection")

cn1.ConnectionString = "Driver={MySQL ODBC 5.1 Driver};Port=3306;Option=3;Server=xxxxxx;User ID=xxxxx;Password=xxxxx;Database=xxxxx;"

cn1.open

connection definitions:

cn2
----
Set cn2 = CreateObject ("ADODB.Connection")

cn2.ConnectionString = "Driver={MySQL ODBC 5.1 Driver};Port=3306;Option=3;Server=xxxxxx;User ID=xxxxx;Password=xxxxx;Database=xxxxx;charset=utf8"

cn2.open
[17 Oct 2011 10:34] Lawrenty Novitsky
As for the question "why doesn't option #4 work". Since 5.1 is a unicode driver MS driver translate all non-unicode calls(SQLExecDirect) to its unicode counterpart(SQLExecDirectW) call. Doing that it converts input string parameters to SQLWCHAR. It assumes that string data is encoded in ANSI charset of the system it is being executed on. And that is unlikely to be UTF8. Thus driver receives wrong utf16 strings and thus wrong data goes to a database.
[18 Oct 2011 11:38] alon s
understood.

so my only option to work with utf8 and asp without opening a recordset object for nothing is to decode the utf8 to ansi before sending it to the execute method?
[18 Oct 2011 17:44] Lawrenty Novitsky
Yes, that should work.
[19 Oct 2011 11:20] Bogdan Degtyariov
Alon,

Can you attach a short ASP test case with some utf-8 data you intended to insert?
We have to make sure there is no double string conversion. This happens when each UTF-8 byte is interpreted as wide character (assuming UTF-16) and then converted back to UTF-8.
Thanks.
[22 Oct 2011 15:59] alon s
test case for hebrew text insertion with utf8 encoding

Attachment: test.asp (text/plain), 900 bytes.

[22 Oct 2011 16:03] alon s
I have added a test file.

thank you
[22 Oct 2011 16:04] alon s
* you will have to change the connection string info to your db
[7 Nov 2011 10:51] Bogdan Degtyariov
Thanks for uploading the test case.

Verified with Connector/ODBC 5.1.9