Bug #57389 | Do not force UTF8 charset when connection string has charset option set | ||
---|---|---|---|
Submitted: | 11 Oct 2010 21:46 | Modified: | 18 Nov 2010 7:03 |
Reporter: | Milan Crha | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | Connector / ODBC | Severity: | S1 (Critical) |
Version: | OS: | Any | |
Assigned to: | CPU Architecture: | Any |
[11 Oct 2010 21:46]
Milan Crha
[11 Oct 2010 21:47]
Milan Crha
proposed odbc connector patch
Attachment: conn.patch (text/plain), 490 bytes.
[12 Oct 2010 9:38]
Tonci Grgin
Milan, what is the charset_results? It should be NULL allowing c/ODBC to detect data is in different code-page. Please paste the output of SHOW VARIABLES LIKE "%charset%" after your program starts (just issue the query from your program after the connection has been established).
[12 Oct 2010 17:49]
Milan Crha
The command returns no rows, so I changed it slightly :) and I got: before SETs: Variable_name Value character_set_client utf8 character_set_connection utf8 character_set_database cp1250 character_set_filesystem binary character_set_results character_set_server cp1250 character_set_system utf8 character_sets_dir C:\Program Files\MySQL\MySQL Server 5.1\share\charsets\ after SETs: Variable_name Value character_set_client cp1250 character_set_connection cp1250 character_set_database cp1250 character_set_filesystem binary character_set_results cp1250 character_set_server cp1250 character_set_system utf8 character_sets_dir C:\Program Files\MySQL\MySQL Server 5.1\share\charsets\ If I add also SET character_set_results=NULL, then I endup with WideString string columns, which is something I do not want. I want to persuade the driver that there is no need for character set conversions at all, which is the only thing to use cp1250 also as character_set_system. I realized that I can workaround this issue also by adding _utf8 prefix to my SELECT statements string, even I'm passing it in the cp1250. As an example: the a) SELECT * FROM tab WHERE strfield='ě'; will be SELECT * FROM tab WHERE strfield=_utf8'ě'; and similar for b) from comment #0.
[21 Oct 2010 6:32]
Tonci Grgin
Milan, there is no bug here... Each connector has it's own, internal, charset. For c/ODBC 5.1 it's UTF8. If you do not want wide strings, please use c/ODBC 3.51. And yes, if you're messing with SET NAMES you definitely need SET character_set_results=NULL. This is a special flag meaning that connector will try deciphering actual charset data is in, metadata queries will be returned in charset_server and so on. In short, SET is a nasty nasty command usually leading to double-encoding errors in data.
[21 Oct 2010 7:04]
Bogdan Degtyariov
Hi Milan, You should not look for "SET NAMES UTF8" statements in the server general query log. The MyODBC driver always sets the character set to UTF-8 for inner data processing. The CHARSET connection string option advises the driver to convert the results from UTF-8 into the character set specified. So, this is not a bug, but the way it works. Setting status "Not a Bug.
[21 Oct 2010 9:04]
Milan Crha
> You should not look for "SET NAMES UTF8" statements in the server general > query log. Sure, I didn't look for it. > The MyODBC driver always sets the character set to UTF-8 for inner data > processing. The CHARSET connection string option advises the driver to > convert the results from UTF-8 into the character set specified. What do you mean with 'advices' here? To be honest I do not care what does the connector itself behind the scene, I only understood the CHARSET parameter as a way to tell it what charset I expect from it and in what charset I will pass my commands and data to the connector. Being this true I would be more that happy. But it isn't true, and thus a bug (or my misunderstanding?). When I remove all my SET commands, which I agree are ugly and only to workaround this connector bug, and I set the connection string CHARSET to cp1250, then I expect all table string columns returned from the connector being in cp1250, but they aren't, they are in unicode. And as I said earlier, I do not want unicode columns, I want cp1250 columns, that's why I specify the CHARSET for the connection. Having there all those ugly SETs, or the patch applied in the connector to really honour my CHARSET will, makes it behave as I expect it.
[18 Nov 2010 7:03]
Bogdan Degtyariov
Milan, we discovered another bug 58038, with similar problem (just different charset) and made a patch for it. The patch is now under review by other developers. Please monitor this report when we send the hot-fix driver build: http://bugs.mysql.com/bug.php?id=58038