MySQL Bugs: #2502: Problem with Unicode in DatabaseMetaData under Windows

Bug #2502	Problem with Unicode in DatabaseMetaData under Windows
Submitted:	24 Jan 2004 14:24	Modified:	28 Mar 2014 13:47
Reporter:	Yuriy Semevsky	Email Updates:
Status:	Closed	Impact on me:	None
Category:	Connector / J	Severity:	S3 (Non-critical)
Version:	3.0.10	OS:	Linux (Linux, Windows)
Assigned to:	Alexander Soklakov	CPU Architecture:	Any

Description:
I have MyISAM, utf8 encoded tables. Server: 5.0.0-alpha-nightly-20040120, pc-linux.
Some databases/tables/columns have russian(cyrillic) names.
Besides I have the two clients machines:
- Windows 2000 SP3, Russian, JRE 1.4.2, Connector/J 3.0.10
- RedHat Linux 9, default locale: en_US.UTF-8, JRE 1.4.2, Connector/J 3.0.10
Under Linux my java client application works fine. But under Windows, I found that
DatabaseMetaData.getTables(...) returns ResultSet with wrong table names. However
stmt.executeQuery("SHOW TABLES") (in Windows) returns correct ResultSet.

How to repeat:
// Testcase.java
    ...
    Properties info = new Properties();

    info.put("user", user);
    info.put("password", password);
    info.put("useUnicode", "true");
    info.put("characterEncoding", "UTF-8");

    String url = "jdbc:mysql://" + host + ":" + port + "/";

    Connection connection = DriverManager.getConnection(url, info);

    DatabaseMetaData md = connection.getMetaData();

    ResultSet rs = md.getTables(anyDatabaseName, null, null, null);

    showResultSet(rs); /* under Windows show wrong cyrillic table names */
    ...
    Statement stmt = connection.createStatement();
    ResultSet rs = stmt.executeQuery("SHOW TABLES FROM " + anyDatabaseName);

    showResultSet(rs); /* always show correct cyrillic table names */
    ...

What does showResult() actually do? Does it use System.out.println()?

In situations such as this, I've found that some windows consoles and GUIs can't correctly display all character sets that Java uses. Try comparing the strings you are expecting to see returned from _within_ your Java program, rather than relying on what you see on your screen, and let us know if you're getting values that you don't expect to see.

stdout log of work result DatabaseMetaData.getTables(...)

Attachment: one.log (application/octet-stream, text), 55 bytes.

stdout log of "SHOW TABLES" query

Attachment: two.log (application/octet-stream, text), 59 bytes.

Thanks for fast answer.
My shower method looks so:
-----------------------------------------------------------------------------------------------------------
private void showResult(ResultSet rs) throws SQLException
{
	ResultSetMetaData rsmd = rs.getMetaData();
	tableModel = new MyTableModel();

	while(rs.next())
	{
		for(int count = 1; count <= rsmd.getColumnCount(); count++)
		{
			tableModel.setValueAt(rs.getObject(count), rs.getRow() - 1, count - 1);

			System.out.println(rs.getObject(count));
		}
	}

	myTable.setModel(tableModel);
}
-----------------------------------------------------------------------------------------------------------
As you can see in the uploaded log files:
- after DatabaseMetaData.getTables(...) stdout log show "????", GUI show emty squares("one.log" file).
- after "SHOW TABLES" query stdout & GUI shows correct table names with cp1251("two.log" file).
In "one.log" file cyrillic characters converted into "0x3F" bytes sequence.

Suggested fix:
String.getBytes() (with no argument) convert strings by use system character encoding.
following patch work properly:
--- DatabaseMetaData.java.old 2004-01-27 19:36:17.000000000 +0300
+++ DatabaseMetaData.java 2004-01-27 19:36:28.000000000 +0300
@@ -3362,11 +3362,10 @@
             }
 
             while (results.next()) {
-                String name = results.getString(1);
                 row = new byte[5][];
                 row[0] = connectionCatalogAsBytes;
                 row[1] = null;
-                row[2] = name.getBytes();
+                row[2] = results.getBytes(1);
                 row[3] = TABLE_AS_BYTES;
                 row[4] = new byte[0];
                 tuples.add(row);

Suggested patch

Attachment: DatabaseMetaData.java.patch (application/octet-stream, text), 587 bytes.

Unfortunately your fix will be approx 10x slower, and won't work when the character set for metadata doesn't match the JVM's system character set.

I will come up for a patch for this shortly.

Another suggested fix:
May be that's better?

--- DatabaseMetaData.java.old	2004-01-25 03:00:12.000000000 +0300
+++ DatabaseMetaData.java	2004-01-28 16:33:44.000000000 +0300
@@ -28,6 +28,7 @@
 import java.util.List;
 import java.util.StringTokenizer;
 import java.util.TreeMap;
+import java.io.UnsupportedEncodingException;
 
 
 /**
@@ -3366,7 +3367,14 @@
                 row = new byte[5][];
                 row[0] = connectionCatalogAsBytes;
                 row[1] = null;
-                row[2] = name.getBytes();
+				try
+				{
+					row[2] = name.getBytes(conn.getEncoding());
+				}
+				catch(UnsupportedEncodingException ex1)
+				{
+					row[2] = name.getBytes();
+				}
                 row[3] = TABLE_AS_BYTES;
                 row[4] = new byte[0];
                 tuples.add(row);

Another suggested fix

Attachment: DatabaseMetaData.java.patch (application/octet-stream, text), 754 bytes.

Sorry, still slow. Please wait until I patch this, it will be in 3.0.12.

Anytime you touch Sun's character converters for single-byte character sets, it will be anywhere from 5-10x slower than doing the process by hand, as well as litter the heap with a bunch of shortly-used converter instances.

Fixed in 3.0.12