Bug #2502 Problem with Unicode in DatabaseMetaData under Windows
Submitted: 24 Jan 2004 14:24 Modified: 28 Mar 2014 13:47
Reporter: Yuriy Semevsky Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S3 (Non-critical)
Version:3.0.10 OS:Linux (Linux, Windows)
Assigned to: Alexander Soklakov CPU Architecture:Any

[24 Jan 2004 14:24] Yuriy Semevsky
Description:
I have MyISAM, utf8 encoded tables. Server: 5.0.0-alpha-nightly-20040120, pc-linux.
Some databases/tables/columns have russian(cyrillic) names.
Besides I have the two clients machines:
- Windows 2000 SP3, Russian, JRE 1.4.2, Connector/J 3.0.10
- RedHat Linux 9, default locale: en_US.UTF-8, JRE 1.4.2, Connector/J 3.0.10
Under Linux my java client application works fine. But under Windows, I found that
DatabaseMetaData.getTables(...) returns ResultSet with wrong table names. However
stmt.executeQuery("SHOW TABLES") (in Windows) returns correct ResultSet.

How to repeat:
// Testcase.java
    ...
    Properties info = new Properties();

    info.put("user", user);
    info.put("password", password);
    info.put("useUnicode", "true");
    info.put("characterEncoding", "UTF-8");

    String url = "jdbc:mysql://" + host + ":" + port + "/";

    Connection connection = DriverManager.getConnection(url, info);

    DatabaseMetaData md = connection.getMetaData();

    ResultSet rs = md.getTables(anyDatabaseName, null, null, null);

    showResultSet(rs); /* under Windows show wrong cyrillic table names */
    ...
    Statement stmt = connection.createStatement();
    ResultSet rs = stmt.executeQuery("SHOW TABLES FROM " + anyDatabaseName);

    showResultSet(rs); /* always show correct cyrillic table names */
    ...
[24 Jan 2004 14:28] Yuriy Semevsky
.
[24 Jan 2004 17:32] Mark Matthews
What does showResult() actually do? Does it use System.out.println()?

In situations such as this, I've found that some windows consoles and GUIs can't correctly display all character sets that Java uses. Try comparing the strings you are expecting to see returned from _within_ your Java program, rather than relying on what you see on your screen, and let us know if you're getting values that you don't expect to see.
[25 Jan 2004 10:50] Yuriy Semevsky
stdout log of work result DatabaseMetaData.getTables(...)

Attachment: one.log (application/octet-stream, text), 55 bytes.

[25 Jan 2004 10:51] Yuriy Semevsky
stdout log of "SHOW TABLES" query

Attachment: two.log (application/octet-stream, text), 59 bytes.

[25 Jan 2004 11:08] Yuriy Semevsky
Thanks for fast answer.
My shower method looks so:
-----------------------------------------------------------------------------------------------------------
private void showResult(ResultSet rs) throws SQLException
{
	ResultSetMetaData rsmd = rs.getMetaData();
	tableModel = new MyTableModel();

	while(rs.next())
	{
		for(int count = 1; count <= rsmd.getColumnCount(); count++)
		{
			tableModel.setValueAt(rs.getObject(count), rs.getRow() - 1, count - 1);

			System.out.println(rs.getObject(count));
		}
	}

	myTable.setModel(tableModel);
}
-----------------------------------------------------------------------------------------------------------
As you can see in the uploaded log files:
- after DatabaseMetaData.getTables(...) stdout log show "????", GUI show emty squares("one.log" file).
- after "SHOW TABLES" query stdout & GUI shows correct table names with cp1251("two.log" file).
In "one.log" file cyrillic characters converted into "0x3F" bytes sequence.
[27 Jan 2004 10:08] Yuriy Semevsky
Suggested fix:
String.getBytes() (with no argument) convert strings by use system character encoding.
following patch work properly:
--- DatabaseMetaData.java.old 2004-01-27 19:36:17.000000000 +0300
+++ DatabaseMetaData.java 2004-01-27 19:36:28.000000000 +0300
@@ -3362,11 +3362,10 @@
             }
 
             while (results.next()) {
-                String name = results.getString(1);
                 row = new byte[5][];
                 row[0] = connectionCatalogAsBytes;
                 row[1] = null;
-                row[2] = name.getBytes();
+                row[2] = results.getBytes(1);
                 row[3] = TABLE_AS_BYTES;
                 row[4] = new byte[0];
                 tuples.add(row);
[27 Jan 2004 10:17] Yuriy Semevsky
Suggested patch

Attachment: DatabaseMetaData.java.patch (application/octet-stream, text), 587 bytes.

[27 Jan 2004 14:15] Mark Matthews
Unfortunately your fix will be approx 10x slower, and won't work when the character set for metadata doesn't match the JVM's system character set.

I will come up for a patch for this shortly.
[28 Jan 2004 5:49] Yuriy Semevsky
Another suggested fix:
May be that's better?

--- DatabaseMetaData.java.old	2004-01-25 03:00:12.000000000 +0300
+++ DatabaseMetaData.java	2004-01-28 16:33:44.000000000 +0300
@@ -28,6 +28,7 @@
 import java.util.List;
 import java.util.StringTokenizer;
 import java.util.TreeMap;
+import java.io.UnsupportedEncodingException;
 
 
 /**
@@ -3366,7 +3367,14 @@
                 row = new byte[5][];
                 row[0] = connectionCatalogAsBytes;
                 row[1] = null;
-                row[2] = name.getBytes();
+				try
+				{
+					row[2] = name.getBytes(conn.getEncoding());
+				}
+				catch(UnsupportedEncodingException ex1)
+				{
+					row[2] = name.getBytes();
+				}
                 row[3] = TABLE_AS_BYTES;
                 row[4] = new byte[0];
                 tuples.add(row);
[28 Jan 2004 5:49] Yuriy Semevsky
Another suggested fix

Attachment: DatabaseMetaData.java.patch (application/octet-stream, text), 754 bytes.

[28 Jan 2004 6:37] Mark Matthews
Sorry, still slow. Please wait until I patch this, it will be in 3.0.12.

Anytime you touch Sun's character converters for single-byte character sets, it will be anywhere from 5-10x slower than doing the process by hand, as well as litter the heap with a bunch of shortly-used converter instances.
[28 Mar 2014 13:47] Alexander Soklakov
Fixed in 3.0.12