Bug #58038 Some part of answer for request not converted to odbc "charset=cp1251;"
Submitted: 7 Nov 2010 18:25 Modified: 8 May 2012 0:50
Reporter: Yuri Ivanchenko Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / ODBC Severity:S3 (Non-critical)
Version:5.1.7 OS:Windows (Windows 7)
Assigned to: Bogdan Degtyariov CPU Architecture:Any
Tags: charset, cp1251, ODBC

[7 Nov 2010 18:25] Yuri Ivanchenko
Description:
version of mysql 5.0.67 (windows)
table charset cp1251
version of odbc 5.1.7 (windows), installed over windows 7 x64

odbc connection string looks like "bla-bla-bla; charset=cp1251; bla-bla-bla;"
(also OPTION = 335544323 to switch on batch, NO_BINARY_RESULTS, etc)
NO ANY additional commands applied (like "set names", etc)...

show variables like 'char%':
character_set_client = utf8
character_set_connection = utf8
character_set_database = cp1251
character_set_filesystem = binary
character_set_results = null
character_set_server = cp1251
character_set_system = utf8

All works fine except requests like:

select if(1=1,'mama',0) as 'papa'; 

(* - 'mama' and 'papa' written russian in cp1251)

as a result I see 'mama' in utf8 displayed as cp1251, not cp1251 as cp1251
at the same time request like 

select 'mama' as 'papa'; 

works perfect, I see cp1251 as cp1251

In SQL log we see next:
for start of session:
22 Connect     user@host on db
22 Query       SET NAMES utf8
22 Query       SET character_set_results = NULL
22 Query       SET SQL_AUTO_IS_NULL = 0
22 Query       select database()
22 Query       select database()
22 Query       SELECT @@tx_isolation
22 Query       set @@sql_select_limit=DEFAULT

for "bad" query:
select if(1=1,'мама',0) as 'папа'

for normal query:
select 'мама' as 'папа'

of course its utf8 (god blessed odbc 5.1 :) ), 
but you easily convert it to cp1251 by Notepad++ or any other app

IS THIS A BUG OR SOME KIND OF MY FALT??????????

PS: on 3.51 everything was perfect )

How to repeat:

connect to mysql over odbc 5.1.7 with "charset=cp1251"

make two request(encode it from to cp1251):

select if(1=1,'мама',0) as 'папа'
select 'мама' as 'папа'

you will see different results
[8 Nov 2010 4:25] Bogdan Degtyariov
Hi Yuri,

This is not a bug in Connector/ODBC 5.1.
The way the driver works may seem confusing, but it is the expected behavior.

Please note that the driver is always connecting to MySQL server using UTF8 character set. However, specifying CHARSET=CP1251 in the connection string will make Connector/ODBC to convert the result from UTF8 into CP1251. This conversion is performed internally in the driver, so the client application receives the result in the requested character set (which is CP1251 in your case).

That is why you see UTF8 in the server variables and general query log.
Please check the result, which your application shows, it should be correct.

I am marking this report as "Not a bug".
[8 Nov 2010 6:06] Yuri Ivanchenko
Woh-woh, Bogdan!

Please, read my bug report again and up to the end!

I am not confused, that driver connects to db using internal utf8 charset, NOT! Also I am not confused, that sqllog has information in utf8... NOT!

I mean these:
================
How to repeat:

connect to mysql over odbc 5.1.7 with "charset=cp1251"

make two request(encode it from to cp1251):

select if(1=1,'мама',0) as 'папа'
select 'мама' as 'папа'

you will see different results (NOT in sqllog, but in ADODB.Recordset)
================

Or this is still not a bug???????????????????????????????????????
[8 Nov 2010 7:29] Tonci Grgin
Yuri, can you please check the option "Always handle binary function results as character data" and retest.
[8 Nov 2010 7:39] Bogdan Degtyariov
Yuri,

You should have mentioned that it is ADODB-related problem.
I have been able to repeat it in VB.
Setting bug to verified.
[8 Nov 2010 7:54] Yuri Ivanchenko
2Tonci Grgin

My connection string looks like:
....OPTION = 335544323;...

FLAG_FIELD_LENGTH = 1
FLAG_FOUND_ROWS = 2
FLAG_MULTI_STATEMENTS = 67108864
FLAG_NO_BINARY_RESULT = 268435456

So total it is 335544323. These value I set to "OPTION=" in connection string

2Bogdan Degtyariov:

I am not sure its ADODB related, because under 3.51 the same code(*) work as it should be.

* - for 3.51 I have connection string without "charset=cp1251;", but with query "set names cp1251;" after connection complete to db.
[8 Nov 2010 8:06] Yuri Ivanchenko
2Bogdan Degtyariov:

Correct me if I am wrong, but:

imho no matter who (I mean ADODB.Recordset or some else) recieving reply from odbc driver. Driver should convert all utf8 data to charset specified by "charset" option in connection string? 

In these case driver replies 'мама' insted of 'мама'. And 'мама' is the result of function if(1=1,'мама',0)

On the other hand, we can see that server got all queries in right way, because in sqllog we see all cyrilic data in utf8 (as it should be). But answer of driver is different. Who knows - may be this is because answer on servevr is different?

select 'мама' as 'папа'
select if(1=1,'мама',0) as 'папа'
[8 Nov 2010 8:19] Bogdan Degtyariov
Yuri,

Thanks for your comment. Usually ADODB bugs are separate category of ODBC problems. I am not saying that the bug is in ADODB. It is important to us to know all details how to repeat the problem (it was not quite clear to me at the beginning).

Now when the bug is verified we will try to find the solution for it.
[8 Nov 2010 8:36] Yuri Ivanchenko
Ok! I am glad to be useful ))

So, what is procedure? How can I know when fixed driver will be available for update?

I realy need it, because 3.51.27 don't alerts for errors in batch queries like:
"select '1'; select '2'; select 3';"

There are mistake in "select 3';", but 3.51.27 doesn't report it ((((
[8 Nov 2010 8:42] Bogdan Degtyariov
Yuri,

Now the report is "verified".
The status will change to "patch pending" when we elaborate the solution and send it for review. The next stage is "patch approved", which means that the next release version will have this problem fixed. 

I think you can have one of intermediate (non-release hot fix) driver builds, which passes the test.
[8 Nov 2010 8:55] Yuri Ivanchenko
Thanks!
[8 Nov 2010 10:16] Bogdan Degtyariov
Yuri,

I have found the solution for the problem. Sorry, I overlooked it at the beginning.

Here is your current query:

SELECT if(1=1, 'мама', 0) as 'папа';

3rd parameter in IF() function is number (0). Numeric data does not belong to any character set. Therefore, the server assigns binary character set when converting numbers into characters. When parsing IF() the server reduces parameter #2 ('мама') in correspondence to parameter #3 (0). In order to avoid this reduction both parameters have to be character strings (or table columns).

My VB application returned correct result in CP1251 when I changed the parameter from 0 to empty string ('') or 'bla-bla-bla':

SELECT if(1=1, 'мама', '') as 'папа';
[8 Nov 2010 10:21] Bogdan Degtyariov
Strange why NO_BINARY_RESULT=1 did not work....
[8 Nov 2010 11:27] Yuri Ivanchenko
>Strange why NO_BINARY_RESULT=1 did not work....

That is a bug I think :)

PS: May be it depends on if() specs, what is the type of data it's return? 
If it's array or else? Reason is somewhere here I guess...
[8 Nov 2010 11:29] Tonci Grgin
Yuri, you can always check for yourself, just start mysql command line client with --column-type-info (-T for 5.0 server) and issue the same query.
[8 Nov 2010 14:00] Yuri Ivanchenko
Another solution (not bugfix):

THESE ONE GIVES WRONG RESULT:
select cast(if(1=1,'мама',0) as binary) as 'мама'

THESE ONE GIVES RIGHT RESULT:
select cast(if(1=1,'мама',0) as char) as 'мама'

PS: NO_BINARY_RESULT = 1 on both cases!!!
[8 Nov 2010 14:03] Tonci Grgin
So it is as I suspected all along... Bogdan will give a final ruling but I'd say "Always handle binary function results as character data" is malfunctioning...
[8 Nov 2010 14:04] Tonci Grgin
Ah yes, and 'binary' in "if(1=1,'мама',0) as binary" means "charset-less" so it was bound to fail imo.
[8 Nov 2010 14:07] Yuri Ivanchenko
In any case modifying all queries in project with cast( as char) - not the best idea. 

First of all - I have several tens or even hundred of queries containing if() function, spreaded over ~40 thousands lines of code :)

And second - not sure that converting everything to "char" would be nice for all code contructions...
[8 Nov 2010 14:16] Tonci Grgin
Yuri, bug is still "Verified" meaning you will not have to rewrite anything. Bogdan will pick up on this again tomorrow (his time).
[8 Nov 2010 14:19] Yuri Ivanchenko
>Ah yes, and 'binary' in "if(1=1,'мама',0) as binary" means "charset-less" so it was bound to fail imo.

Oh! Yeh! But I not asked anybody to convert my query to utf8!
So why should I care about convertion it back to cp1251?
Let that man do it :)

You right, but with that new ODBC you took away from me option to set up charset_server, charset_client and charset_result. All of them now controlled by driver. That globaly means that with new version (5.1) of driver I have to recode partly my project. 

And these is not server falt and no project falt, it is only limitation of driver, no more! If some man who made architecture of odbc 5.1 decided to change internal charset to utf8 (actualy - good idea), he had to provide some actions to make all codes of all projects written before compatible, at least in charset relation.
[8 Nov 2010 14:23] Yuri Ivanchenko
>Yuri, bug is still "Verified" meaning you will not have to rewrite anything. Bogdan will pick up on this again tomorrow (his time).

Oh, its okay, I have time and have no claims to anybody :)
Piece! :)
[8 Nov 2010 14:33] Yuri Ivanchenko
>So it is as I suspected all along... Bogdan will give a final ruling but I'd say "Always handle binary function results as character data" is malfunctioning...

But from other hand - driver shouldn't convert realy all binary results as char.
In these case - how we can store realy binary data (files, pictures, etc)? Who said that binary result of if() should be always converted according to NO_BINARY_RESULT flag?

May be we can try provide combination of settings for driver like:

character_set_client = utf8
character_set_connection = cp1251
and back to
character_set_result = utf8

Binary data could still be binary with no troubles, ah?
[8 Nov 2010 14:49] Tonci Grgin
Yuri, you got most of charset manipulation wrong... IF() has nothing to do with BLOB's. Result of IF and many other function calls is returned with binary flag set (in protocol) causing problems with all of the connectors as they can not say should they use (var)binary or utf8 for, say, "show create table" and such ad-hoc queries.

Next, please do study http://forge.mysql.com/wiki/How_to_Use_Charsets_and_Collations_Properly.

Finally, c/ODBC 5.1 is a muiltibyte charset aware driver. Ie, it uses W (wide) ODBC functions (as opposed to c/ODBC 3.51). It is not possible for 5.1 not to use them. Next, setting session encoding the way you want it is a shortcut to double encoding errors (see presentation) thus c/ODBC and all of other connectors do not set charset_results signaling server to return data in any charset data is stored in and so on and so on. But you'd really really need to dig into charsets deeper to make this conversation meaningful.
[8 Nov 2010 15:00] Yuri Ivanchenko
>setting session encoding the way you want it is a shortcut to double encoding errors

Okay, but does 5.1 connector with "charset=cp1251" string make double (on the way up to server and on the way down) covertion by itself??? 

Anyway - architectonically it's not a best idea to include some intermediator between client and server, especially stupid intermediator who makes some convertion for some good intentions, but in some group of cases it has no idea what to do :)

IMHO
[8 Nov 2010 15:10] Yuri Ivanchenko
>But you'd really really need to dig into charsets deeper to make this conversation meaningful.

Ah, no.
I want to be on position of user. Actualy user shouldn't know how odbc works inside. Agree?

In facts we have:
1. There are some reasons to choose way odbc 5.1 architectonically constructed.

2. There are some set of mySQL commands, described in manual, that unexpectedly cannot be used with odbc 5.1. I cannot use all flexibility of mysql itself due to driver internal!!! architecture.

So there are some software (mysql server), and there are some driver (driver) which limits the functions of sofware. 

But ok, it only in theory. Nothing is perfect in these world ))
[9 Nov 2010 11:30] Bogdan Degtyariov
The problem with NO_BINARY_RESULT flag was that it fixed the metadata (for SQLDescribeCol() etc), but it did not actually perform the data conversion. So, the type was correct CHAR/VARCHAR instead of BINARY, but the contents of the result set remained untouched (UTF8).

The first version of patch for this problem is ready. Now we shall check if it causes any compatibility problems with existing applications/tests.
[15 Nov 2010 9:39] Bogdan Degtyariov
patch

Attachment: bug58038.diff (text/x-diff), 3.19 KiB.

[16 Nov 2010 12:16] Bogdan Degtyariov
Patch and the test case pushed here:

http://bazaar.launchpad.net/%7Emyodbc-developers/myodbc/bug58038
[18 Nov 2010 19:15] Lawrenty Novitsky
i'm approving the patch.

2 small things. my_types is probably not the best place for the test. but not the worst on other hand. besides i'm probably the one who started the fire of extending existing tests to cover new cases.

2nd a tad bigger thing - i'd prefer that expression calculating should the value be treated as binary or not, to be moved to a separate function and used from there. we already have minimum 2 similar(and too big to be called trivial) expressions used in different places. perhaps we would not have that bug if that was done before. also i remember other functions calculate "binariness" of a value differently. that is not necessary wrong, but needs to be checked too.

bogdan, will you push the patch or want me to do that? and do you volunteer to care about that my 2nd note or want me to do that? i don't mind, but tomorrow begins earlier for you ;)
[22 Nov 2010 12:47] Yuri Ivanchenko
I am sorry, but... 
I looked through MySQL Connector/ODBC (MyODBC) Change History and got that
5.1.8 released at the date I posted bug report :)
and previous verison on 24 august...

Please let me know - when next release will be available??? Or may be there some way to get it earlier?
[22 Nov 2010 16:28] Lawrenty Novitsky
then you need bzr, cmake and VS. you can get latest revision using bzr from lp:myodbc. also you can get latest snapshot from http://downloads.mysql.com/snapshots.php

then you just need to follow instructions to build your own c/odbc.

the other way is to wait for official release.
[19 Apr 2012 16:07] Lawrenty Novitsky
heh... this patch has been released in 5.1.9. pushed as rev#936
[8 May 2012 0:50] John Russell
Added to changelog for 5.1.19: 

With the option charset=cp1251 specified in the connection string,
the results could be returned as CP1251 or as UTF8 depending on the
query. For example, these queries could give results in different
code pages:

select if (1=1,'string in cp1251 code page',0) as 'string in cp1251 code page';
select 'string in cp1251 code page' as 'string in cp1251 code page';
[21 May 2012 21:41] John Russell
Correction, it's in the changelog for 5.1.9, not .19.