| Bug #57389 | Do not force UTF8 charset when connection string has charset option set | ||
|---|---|---|---|
| Submitted: | 11 Oct 2010 21:46 | Modified: | 18 Nov 2010 7:03 |
| Reporter: | Milan Crha | Email Updates: | |
| Status: | Duplicate | Impact on me: | |
| Category: | Connector / ODBC | Severity: | S1 (Critical) |
| Version: | OS: | Any | |
| Assigned to: | CPU Architecture: | Any | |
[11 Oct 2010 21:47]
Milan Crha
proposed odbc connector patch
Attachment: conn.patch (text/plain), 490 bytes.
[12 Oct 2010 9:38]
Tonci Grgin
Milan, what is the charset_results? It should be NULL allowing c/ODBC to detect data is in different code-page. Please paste the output of SHOW VARIABLES LIKE "%charset%" after your program starts (just issue the query from your program after the connection has been established).
[12 Oct 2010 17:49]
Milan Crha
The command returns no rows, so I changed it slightly :) and I got:
before SETs:
Variable_name Value
character_set_client utf8
character_set_connection utf8
character_set_database cp1250
character_set_filesystem binary
character_set_results
character_set_server cp1250
character_set_system utf8
character_sets_dir C:\Program Files\MySQL\MySQL Server 5.1\share\charsets\
after SETs:
Variable_name Value
character_set_client cp1250
character_set_connection cp1250
character_set_database cp1250
character_set_filesystem binary
character_set_results cp1250
character_set_server cp1250
character_set_system utf8
character_sets_dir C:\Program Files\MySQL\MySQL Server 5.1\share\charsets\
If I add also SET character_set_results=NULL, then I endup with WideString string columns, which is something I do not want. I want to persuade the driver that there is no need for character set conversions at all, which is the only thing to use cp1250 also as character_set_system.
I realized that I can workaround this issue also by adding _utf8 prefix to my SELECT statements string, even I'm passing it in the cp1250. As an example:
the a) SELECT * FROM tab WHERE strfield='ě';
will be
SELECT * FROM tab WHERE strfield=_utf8'ě';
and similar for b) from comment #0.
[21 Oct 2010 6:32]
Tonci Grgin
Milan, there is no bug here... Each connector has it's own, internal, charset. For c/ODBC 5.1 it's UTF8. If you do not want wide strings, please use c/ODBC 3.51. And yes, if you're messing with SET NAMES you definitely need SET character_set_results=NULL. This is a special flag meaning that connector will try deciphering actual charset data is in, metadata queries will be returned in charset_server and so on. In short, SET is a nasty nasty command usually leading to double-encoding errors in data.
[21 Oct 2010 7:04]
Bogdan Degtyariov
Hi Milan, You should not look for "SET NAMES UTF8" statements in the server general query log. The MyODBC driver always sets the character set to UTF-8 for inner data processing. The CHARSET connection string option advises the driver to convert the results from UTF-8 into the character set specified. So, this is not a bug, but the way it works. Setting status "Not a Bug.
[21 Oct 2010 9:04]
Milan Crha
> You should not look for "SET NAMES UTF8" statements in the server general > query log. Sure, I didn't look for it. > The MyODBC driver always sets the character set to UTF-8 for inner data > processing. The CHARSET connection string option advises the driver to > convert the results from UTF-8 into the character set specified. What do you mean with 'advices' here? To be honest I do not care what does the connector itself behind the scene, I only understood the CHARSET parameter as a way to tell it what charset I expect from it and in what charset I will pass my commands and data to the connector. Being this true I would be more that happy. But it isn't true, and thus a bug (or my misunderstanding?). When I remove all my SET commands, which I agree are ugly and only to workaround this connector bug, and I set the connection string CHARSET to cp1250, then I expect all table string columns returned from the connector being in cp1250, but they aren't, they are in unicode. And as I said earlier, I do not want unicode columns, I want cp1250 columns, that's why I specify the CHARSET for the connection. Having there all those ugly SETs, or the patch applied in the connector to really honour my CHARSET will, makes it behave as I expect it.
[18 Nov 2010 7:03]
Bogdan Degtyariov
Milan, we discovered another bug 58038, with similar problem (just different charset) and made a patch for it. The patch is now under review by other developers. Please monitor this report when we send the hot-fix driver build: http://bugs.mysql.com/bug.php?id=58038

Description: I'm using MySQL ODBC connector through ADO components in Borland C++ Builder 2006 environment, and I realized that I cannot force it to use cp1250 charset in SQL commands. I'm setting this to highest severity, because incorrect results. How to repeat: a) Create table with cp1250_czech_cs COLLATE with a string field and add one row with letter "ě" in the string field. b) make sure that connection string contains CHARSET=cp1250 c) using the connection string and ODBC connector, invoke one of these statements: 1) SELECT * FROM tab WHERE strfield='ě'; 2) SELECT * FROM tab WHERE strfield LIKE '%ě%'; d) both above statements return empty dataset, which is incorrect. Turning on LOG_QUERY I realized that the SQL statements are automatically changed to UTF8, which breaks things, because I also force cp1250 to the connection with these SETs (in this order, invoked just after connect): SET sql_mode='MSSQL'; SET CHARACTER SET 'cp1250'; SET character_set_connection='cp1250'; SET collation_server='cp1250_czech_cs'; SET collation_connection='cp1250_czech_cs'; these are necessary, because either: - string fields are returned as Wide Strings without it, which I do not want; or - doing UNION fails with error on incompatible collations, though both tables are created with same DEFAULT COLLATE. Suggested fix: I'm attaching simple patch to force utf8 charset only when not set in the connection string (and the unicode support is available). It fixes the issue for me.