Bug #31601 decode function recognize the charset as binary
Submitted: 15 Oct 2007 10:21 Modified: 29 Oct 2007 8:55
Reporter: Roland LACROUTE Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:all OS:Any
Assigned to: CPU Architecture:Any

[15 Oct 2007 10:21] Roland LACROUTE
Description:
It is possible to encrypt data with the MySQL function encode. But when using the J connector to decrypt on MySQL 4.1 or 5.*, the charset is not well recognized and all special caracters are replace by "?".

Note : it does work correctly on MySQL 4.0. But on a MySQL 4.1, it does not work with any of the J-Connector (3.0, 3.1 nor 5.0).

How to repeat:
First create a new table with an encrypted column :

CREATE TABLE test (
  label varchar(255) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO test VALUES ('é');

UPDATE test SET label = ENCODE(label, 'code');

Then using, the J-connector, decrypt the data :
try {
	String url="jdbc:mysql://127.0.0.1:3306/test?user=root&password=pwd";
	Connection conn = DriverManager.getConnection(url);
        ResultSet rs = conn.createStatement().executeQuery("SELECT decode(label, 'code') FROM test;");

        if (rs.next())
        	return rs.getString(1);
	        
} catch (SQLException e) {

}

The return value is '?'.

Suggested fix:
Actually, when the column is encrypted, the recognized charset is US-ASCII and not the charset of the column.

In the class com.mysql.jdbc.CharsetMapping of the J-Connector, replace line 122 by tempMap.put("binary", "ISO8859_1"); // closest match 

The problem could be that the column is recognized as binary and then the used charset isn't the one of the column but the one of this mapping.

The fix I suggest is obviously not the right thing to do, but actually it does work and could help to find something better.
[15 Oct 2007 13:03] Mark Matthews
The result of the ENCODE() function is binary data. It can't reliably be represented in any character set. You need to treat it like byte[] in your Java application (i.e. by calling getBytes(), or getObject() which should return a byte[] to you in this case).

The fact that it sometimes worked with MySQL-4.0 is mere coincidence, MySQL-4.1 and newer are better at telling a client what the exact type and character set of a value are.
[15 Oct 2007 13:40] Roland LACROUTE
The result of the ENCODE() function is maybe binary data, but not the result of the DECODE() function. In the piece of code I put in example, the result should be "é" and not the encoded version of "é".

If I get the result in byte[], I have to encode the result in the good charset which I have to find in the column specifications. I thought that it was the work of the connector.

Moreover, the problem does not exist with ODBC connector.
[15 Oct 2007 21:07] Mark Matthews
The ODBC driver isn't as pedantic as the JDBC driver when it comes to correct type mappings.

If anything this is a server bug if you think that the result of DECODE() should be a string in some character set, because it's not:

mysql> select decode(encode('abcdef', 'secret'), 'secret');
Field   1:  `decode(encode('abcdef', 'secret'), 'secret')`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  binary (63)
Length:     6
Max_length: 6
Decimals:   31
Flags:      NOT_NULL BINARY

+----------------------------------------------+
| decode(encode('abcdef', 'secret'), 'secret') |
+----------------------------------------------+
| abcdef                                       |
+----------------------------------------------+
1 row in set (0.07 sec)

(i.e. the JDBC driver is doing what it's supposed to, reporting the result as BINARY, and thus mapping it to byte[], or in the case of getString(), the closest thing to a "binary" character set, ASCII. There is no way, from the metadata given for the JDBC driver to know what "original" character set this data was given in).
[17 Oct 2007 10:48] Roland LACROUTE
The problem seems indeed to come from the MySQL server and not from the JDBC driver though I didn't succeed to reproduce your example (starting my server with the -T option doesn't display the collation).

Anyway, do I have to open a new bug on the MySQL Server ? Or can I transfer this bug to the new Category ?

thanks.
[23 Oct 2007 7:56] Roland LACROUTE
According to Mark Matthews, this problem seems to come from the mysql server. Is it possible to have a fix from the MySQL server ?
[28 Oct 2007 13:03] Valeriy Kravchuk
Server can not "guess" correct character set of binary data returned by DECODE, in general. If you know what character set should be used, please, use CONVERT:

mysql> select convert(decode(encode('abcdef', 'secret'), 'secret') using latin1)
 as a;
Field   1:  `a`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  latin1_swedish_ci (8)
Length:     6
Max_length: 6
Decimals:   31
Flags:

+--------+
| a      |
+--------+
| abcdef |
+--------+
1 row in set (0.00 sec)
[29 Oct 2007 8:55] Roland LACROUTE
In this request, I agree that MySQL can not guess the correct charset from the data : 
select convert(decode(encode('abcdef', 'secret'), 'secret') using latin1).

But in the reference guide, decode function is defined as :
 - DECODE(crypt_str,pass_str)
Decrypts the encrypted string crypt_str using pass_str as the password. crypt_str should be a string returned from ENCODE(). 

If the result should by a string, returning the collation as binary is the less probable !

With MySQL, one can define several level of charset (in the field, in the table, with one of the system variables : character_set_server, character_set_database, character_set_client, character_set_connection, character_set_results), why the server can not use one of these charsets definition ?