Bug #31601 | decode function recognize the charset as binary | ||
---|---|---|---|
Submitted: | 15 Oct 2007 10:21 | Modified: | 29 Oct 2007 8:55 |
Reporter: | Roland LACROUTE | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S2 (Serious) |
Version: | all | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[15 Oct 2007 10:21]
Roland LACROUTE
[15 Oct 2007 13:03]
Mark Matthews
The result of the ENCODE() function is binary data. It can't reliably be represented in any character set. You need to treat it like byte[] in your Java application (i.e. by calling getBytes(), or getObject() which should return a byte[] to you in this case). The fact that it sometimes worked with MySQL-4.0 is mere coincidence, MySQL-4.1 and newer are better at telling a client what the exact type and character set of a value are.
[15 Oct 2007 13:40]
Roland LACROUTE
The result of the ENCODE() function is maybe binary data, but not the result of the DECODE() function. In the piece of code I put in example, the result should be "é" and not the encoded version of "é". If I get the result in byte[], I have to encode the result in the good charset which I have to find in the column specifications. I thought that it was the work of the connector. Moreover, the problem does not exist with ODBC connector.
[15 Oct 2007 21:07]
Mark Matthews
The ODBC driver isn't as pedantic as the JDBC driver when it comes to correct type mappings. If anything this is a server bug if you think that the result of DECODE() should be a string in some character set, because it's not: mysql> select decode(encode('abcdef', 'secret'), 'secret'); Field 1: `decode(encode('abcdef', 'secret'), 'secret')` Catalog: `def` Database: `` Table: `` Org_table: `` Type: VAR_STRING Collation: binary (63) Length: 6 Max_length: 6 Decimals: 31 Flags: NOT_NULL BINARY +----------------------------------------------+ | decode(encode('abcdef', 'secret'), 'secret') | +----------------------------------------------+ | abcdef | +----------------------------------------------+ 1 row in set (0.07 sec) (i.e. the JDBC driver is doing what it's supposed to, reporting the result as BINARY, and thus mapping it to byte[], or in the case of getString(), the closest thing to a "binary" character set, ASCII. There is no way, from the metadata given for the JDBC driver to know what "original" character set this data was given in).
[17 Oct 2007 10:48]
Roland LACROUTE
The problem seems indeed to come from the MySQL server and not from the JDBC driver though I didn't succeed to reproduce your example (starting my server with the -T option doesn't display the collation). Anyway, do I have to open a new bug on the MySQL Server ? Or can I transfer this bug to the new Category ? thanks.
[23 Oct 2007 7:56]
Roland LACROUTE
According to Mark Matthews, this problem seems to come from the mysql server. Is it possible to have a fix from the MySQL server ?
[28 Oct 2007 13:03]
Valeriy Kravchuk
Server can not "guess" correct character set of binary data returned by DECODE, in general. If you know what character set should be used, please, use CONVERT: mysql> select convert(decode(encode('abcdef', 'secret'), 'secret') using latin1) as a; Field 1: `a` Catalog: `def` Database: `` Table: `` Org_table: `` Type: VAR_STRING Collation: latin1_swedish_ci (8) Length: 6 Max_length: 6 Decimals: 31 Flags: +--------+ | a | +--------+ | abcdef | +--------+ 1 row in set (0.00 sec)
[29 Oct 2007 8:55]
Roland LACROUTE
In this request, I agree that MySQL can not guess the correct charset from the data : select convert(decode(encode('abcdef', 'secret'), 'secret') using latin1). But in the reference guide, decode function is defined as : - DECODE(crypt_str,pass_str) Decrypts the encrypted string crypt_str using pass_str as the password. crypt_str should be a string returned from ENCODE(). If the result should by a string, returning the collation as binary is the less probable ! With MySQL, one can define several level of charset (in the field, in the table, with one of the system variables : character_set_server, character_set_database, character_set_client, character_set_connection, character_set_results), why the server can not use one of these charsets definition ?