Bug #3348 ODBC Driver Does Not Return UTF8 Data Properly
Submitted: 31 Mar 2004 13:57 Modified: 5 Mar 2008 9:23
Reporter: Miguel Solorzano Email Updates:
Status: Won't fix Impact on me:
None 
Category:Connector / ODBC Severity:S1 (Critical)
Version: OS:Any
Assigned to: CPU Architecture:Any

[31 Mar 2004 13:57] Miguel Solorzano
Description:
I have been unable to get ODBC 3.51.06 to return UTF8 data properly using MS Access, MS Excel, MS Query
and some other ODBC tools. E.g. if I select the Greek string 'αβγΑΒΓ' using

SELECT x'CEB1CEB2CEB3CE91CE92CE93';

it comes back as the hexstring x'CE00B100CE00B200CE00B300CE009100CE009200CE009300'

I have tried issuing 'set character set utf8' and setting the entry

[client]
default-character-set = utf8

in c:\my.cnf without any success.

In the download notice for the 3.52 driver, it mentions that there will be a UNICODE API. Does this
mean that utf8/unicode is broken in 3.51?

How to repeat:
see description
[22 Jun 2004 11:50] J ester
The problem is in the BLOB field. Adding data to the BLOB field causes it to go multibyte ('ab' -> 65006600). Since the unicode signature FFFE is missing, conversion doesn't work anymore.

A workaround is changing the column to mediumtext, adding data through the connector and changing it back to blob.
[11 Jul 2004 23:28] Peter Harvey
Proper UNICODE support in MyODBC will be sometime between MyODBC v3.53 Alpha and MyODBC v3.53 Production.
[15 Jul 2004 16:59] Timothy Smith
This is a bug that we can't fix in 3.51.x - it is not written to handle multiple character sets, or to handle many of the features of MySQL 4.1.

A new version, 3.53 (3.52 will not be released), will be available as an alpha release soon, and it will eventually be fully compliant with the MyODBC spec, and will support all MySQL features.

Our priority now is 3.53 development (while solving critical bugs in 3.51.x which deal with existing functionality).
[6 Aug 2004 9:51] Alexander Kushnirenko
Hi,

I'm not sure if this is a bug related to Greek symbols specifically.  Anyhow I was able to extract correctly UTF-8 encoded strings which contained russian characters into MS Access.  I user MySQL 4.1 and MyODBC 3.51.  What was needed to do is to force conversion of UTF-8 to cp1251 (native language in Win for russian) using SET NAMES cp1251

After that everything worked fine including SELECTS and UPDATES.

I apologize if I'm missing the point of this disscussion.

Alexander Kushnirenko
[25 Jan 2006 2:00] Grace Coronado
SUBJECT: Unicode (MSAccess/MySQL)

Handling unicode is also one of our problems when we migrated from MS Access to MySQL which we haven’t resolved yet.  We have a field which can store different character sets.

Our current settings:
    Windows XP 5.1
    MS Access 2002
    MySQL Server 5.0.18
    MyODBC 3.51.12 
    MS Jet Engine 4.0

In this link from MySQL Forum:  http://forums.mysql.com/read.php?37,57105,58616#msg-58616, it suggested to change the DSN Connection Option, but we can’t still make it work.

I have tried the following setup (to at least try entering the Greek characters):
    DSN Connection Option:  “set names greek”
    MySQL: “utf8”
    MS Access:  (Arial Unicode MS or Arial Greek)

Data entered through MS Access or Navicat become “?” question marks.

Also tried different combinations for the following settings:
   DSN Connection Option:  utf8, greek, cp1250, cp1251
   MySQL:  default char, utf8, greek, cp1250
   MS Access:  utf8, Arial CE, Arial Unicode MS, Arial Greek

Is there any update about this issue after August 2004?  I also tested our application under  MyODBC 5.0 Alpha version, but when I tried to define a DSN connection the dropdown list for Database shows “garbage”.  Thus I reinstalled MyODBC 3.51.12.  So far the latest downloadable versions are: MyODBC 3.51.12 (recommended version) and 5.0 Alpha version.  But as mentioned above, 3.51.x doesn’t support Unicode characters.  So we just have to wait for MyODBC 5.0 Production. 

Thanks,
Grace
[2 Mar 2006 21:21] Alkis Balasis
Ok I Solve This
Put on the Connection Option of the ODBC at "Initial Statement" the command : set names greek
[17 Jan 2007 19:21] Ivan
I have a similar problem but with serbian chars... The columns in my table are all utf-8 and I have 'set names cp1250' Most of the chars display properly except ČĆčć which are displayed as ÈÆèæ in Excel... I have tried all of the fonts but the problem still remains... Interestingly enough when I try to display those chars in windows controls (editbox etc.) with Arial font and Central European encoding it works great so it must be some bug in the connector...
[29 Mar 2007 17:22] Fred Pauwels
We are now in 2007, this utf8 case it still not solved, has anyone had any luck with this ?
[30 Mar 2007 10:16] Ivan
I've tried it in 3.51.14 and it's still there...
[11 Apr 2007 10:42] Tonci Grgin
Ivan and all.

C:\mysql507\bin>mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 218
Server version: 5.0.38-log Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show create table a;
| Table | Create Table                              |
| a     | CREATE TABLE `a` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `name` varchar(30) default NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=12 DEFAULT CHARSET=utf8 |
1 row in set (0.00 sec)

mysql> select * from a where Id = 2;
+----+------------------------+
| id | name                   |
+----+------------------------+
|  2 | 1234567890abcdefš???.- |
+----+------------------------+
1 row in set (0.03 sec)

mysql> set names cp1250;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from a where Id = 2;
+----+------------------------+
| id | name                   |
+----+------------------------+
|  2 | 1234567890abcdefšđčć.- |
+----+------------------------+
1 row in set (0.00 sec)

A test in generic MS ODBC client, odbcte32.exe (non unicode aware), with SET NAMES cp1250 in "Initial Statement" of DSN config screen, shows correct characters so I would recommend to retest.

MyODBC 3.51 was never meant to be unicode aware so it maps unicode (W) functions into non-unicode counterpart thus resulting in "wrong" characters returned.

connector/ODBC 5 is to have full unicode support but it's still in beta. So I don't really see that "nothing is done". We have made entirely new connector that should solve this problem once it's out of testing stage.

Thank you all for your interest in MySQL.
[11 Apr 2007 12:14] Ivan
I've found somewhere that 3.53 was supposed to fix this but no such version exists... Also, the problem is returnin data to Excel... If I write a program that saves the same data into DBF file and then imports it to Excel, all is fine... But If I try to import the data directly then it doesn't display properly... When will 5.x be out of beta?
[16 May 2007 12:07] Tonci Grgin
Ivan, please see the remark on using "Initial statement" field of DSN config and do read my entire last post... Unicode support is re-scheduled for version 5. When will it be out of beta I really can't tell.
[16 May 2007 13:07] Ivan
Screenshot of the configuration dialog

Attachment: initial.JPG (image/jpeg, text), 35.17 KiB.

[16 May 2007 13:10] Ivan
There's a screenshot of the dialog... As you can see I've set the Initial Statement property, and the problem is still there... As I import to Excel the characters don't show correctly... I've found a workaround for this... I just save all the data in a DBF table using OEM encoding and import it from there...
[16 May 2007 18:47] Tonci Grgin
Ivan, please reconsider my example:
mysql> show create table a;
| Table | Create Table                              |
| a     | CREATE TABLE `a` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `name` varchar(30) default NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=12 DEFAULT CHARSET=utf8 |
and the image attached. I found no problems importing data to Excel... Please upgrade to 3.51.15 and test again.
[16 May 2007 18:48] Tonci Grgin
Excel with correct 1205 chars imported

Attachment: 3348.jpg (image/jpeg, text), 27.05 KiB.

[3 Nov 2007 1:50] Srdjan Mitrovic
I think that there is more to it.
UTF is not made for the Russians only, or for the Greek only,
but for all alphabets (in one sentence).
E.g. Πύργος Günter Байкалец,
you will not be able to put it in one
database field unles you have UTF support,
and no 'set names cpXXXX' will help
for you will need the different XXXXs
at once. That's what UTF is all about,
and if you find the ODBC driver capable of
handling this, please let me know.

Regards, hani
[3 Mar 2008 10:31] Tonci Grgin
Behavior described by Srdjan is present in 5.1 branch.
[4 Mar 2008 6:49] Tonci Grgin
Tim, others, this has already been implemented in MyODBC 5.1, please see Bug#32570, and attached file http://bugs.mysql.com/file.php?id=8197. Tim, if I'm right, please revert your ruling.