Bug #51206 length of char field return is 3x as big on utf8 then on latin
Submitted: 16 Feb 2010 10:23 Modified: 17 Feb 2010 12:11
Reporter: Peter Hagen Email Updates:
Status: Can't repeat Impact on me:
None 
Category:Connector / NET Severity:S1 (Critical)
Version:6.2.2 OS:Linux
Assigned to: CPU Architecture:Any

[16 Feb 2010 10:23] Peter Hagen
Description:
After changing the default character set of my tables to utf8, I get the wrong sizes of the character column (this happens also with the 1.0.5 connector). I read the schema with:

command.CommandText = "SELECT * FROM " + table.Name + " limit 1";
command.Connection = (MySqlConnection) conn;
rs = command.ExecuteReader();
DataTable schema = rs.GetSchemaTable();
				
foreach (DataRow dr in schema.Rows) 
{
  int length = Convert.ToInt32(dr[2]);
  ...

when the table is in latin the size is correct, but when changed to utf8 the size is 3x the size.

How to repeat:
after a "alter table default character set=utf8" the size returned is 3x as big, when using the DataTable schema = rs.GetSchemaTable();

Suggested fix:
I guess the length returned is the length in bytes, not the character length
[16 Feb 2010 10:34] Tonci Grgin
Hi Peter. This is a duplicate of Bug#14592 (and some other) and the code changes are in http://lists.mysql.com/commits/12553.
[16 Feb 2010 10:57] Tonci Grgin
I stand corrected... Original patch code
+                int maxByteCount = f.Encoding.GetMaxByteCount(1) >> 1;
+                r["ColumnSize"] = f.IsTextField ? f.ColumnLength / maxByteCount :
f.ColumnLength;

was reduced to just
r["ColumnSize"] = f.IsTextField ? f.ColumnLength / maxByteCount : f.ColumnLength;
in newer versions and I can't find out why.

So, I'd like an explanation from c/NET devs before declaring this as a regression.
[16 Feb 2010 15:05] Tonci Grgin
Peter, I need to know the MySQL server version you're using before proceeding.
[16 Feb 2010 15:17] Peter Hagen
Hi

i have confirmed this on 2 machines:

1. mysql  Ver 14.14 Distrib 5.1.37, for debian-linux-gnu (i486) using  EditLine wrapper

2. mysql  Ver 14.14 Distrib 5.1.37, for debian-linux-gnu (i486) using  EditLine wrapper

thanks for looking into this
[16 Feb 2010 15:18] Peter Hagen
sorry, this is nr 2:

mysql  Ver 14.12 Distrib 5.0.38, for pc-linux-gnu (i486) using readline 5.2
[16 Feb 2010 15:28] Tonci Grgin
Thanks Peter.
[17 Feb 2010 11:20] Tonci Grgin
create table bug51206 (id integer NOT NULL AUTO_INCREMENT PRIMARY KEY, VCCol VARCHAR(30)) DEFAULT CHARSET UTF8 produces (on 5.1.31):
============================
ColumnName = VCCol
ColumnOrdinal = 2
ColumnSize = 30
NumericPrecision = 0
NumericScale = 0
IsUnique = False
IsKey = False
BaseCatalogName = 
BaseColumnName = VCCol
BaseSchemaName = test
BaseTableName = bug51206
DataType = System.String
AllowDBNull = True
ProviderType = 253
IsAliased = 
IsExpression = 
IsIdentity = 
IsAutoIncrement = False
IsRowVersion = False
IsHidden = 
IsLong = False
IsReadOnly = False
============================
[17 Feb 2010 11:36] Tonci Grgin
Test case:
        private void bnBug51206_Click(object sender, EventArgs e)
        {
            MySqlConnection con = new MySqlConnection();
            con.ConnectionString = "DataSource=**;Database=test;UserID=**;Password=**;PORT=**;Charset=utf8";
            con.Open();
            MySqlCommand cmd = new MySqlCommand();
            cmd.Connection = con;
            cmd.CommandText = "drop table if exists bug51206";
            cmd.ExecuteNonQuery();
            cmd.CommandText = "create table bug51206 (id integer NOT NULL AUTO_INCREMENT PRIMARY KEY, VCCol VARCHAR(30)) DEFAULT CHARSET UTF8";
            cmd.ExecuteNonQuery();

            cmd.CommandText = "INSERT INTO bug51206 VALUES (NULL, '海峽定義')";
            cmd.ExecuteNonQuery();

            cmd.CommandText = "INSERT INTO bug51206 VALUES (NULL,'chinese cNET-2')";
            cmd.ExecuteNonQuery();

            cmd.CommandText = "SELECT * FROM bug51206";
            MySqlDataReader rd = cmd.ExecuteReader();
            DataTable dt = rd.GetSchemaTable();
            DisplayData(dt);
            rd.Close();
            dt.Clear();
            dt.Dispose();
            cmd.Dispose();
            con.Close();
            con.Dispose();
        }
[17 Feb 2010 11:36] Tonci Grgin
Sorry, DisplayData part:
        private static void DisplayData(System.Data.DataTable table)
        {
            foreach (System.Data.DataRow row in table.Rows)
            {
                foreach (System.Data.DataColumn col in table.Columns)
                {
                    Console.WriteLine("{0} = {1}", col.ColumnName, row[col]);
                }
                Console.WriteLine("============================");
            }
        }
[17 Feb 2010 12:01] Tonci Grgin
Peter, LIMIT 1 or no LIMIT 1, remote 5.1.31 on OpenSolaris, 5.1.43 & 5.0.90 on Win localhost... all produce same (correct) result:
ColumnName = id
ColumnOrdinal = 1
ColumnSize = 11
NumericPrecision = 0
NumericScale = 0
IsUnique = False
IsKey = True
BaseCatalogName = 
BaseColumnName = id
BaseSchemaName = test
BaseTableName = bug51206
DataType = System.Int32
AllowDBNull = False
ProviderType = 3
IsAliased = 
IsExpression = 
IsIdentity = 
IsAutoIncrement = True
IsRowVersion = False
IsHidden = 
IsLong = False
IsReadOnly = False
============================
ColumnName = VCCol
ColumnOrdinal = 2
ColumnSize = 30
NumericPrecision = 0
NumericScale = 0
IsUnique = False
IsKey = False
BaseCatalogName = 
BaseColumnName = VCCol
BaseSchemaName = test
BaseTableName = bug51206
DataType = System.String
AllowDBNull = True
ProviderType = 253
IsAliased = 
IsExpression = 
IsIdentity = 
IsAutoIncrement = False
IsRowVersion = False
IsHidden = 
IsLong = False
IsReadOnly = False
============================

Further more, the server returns the column length in *bytes* no matter what the type is, but, I agree, if a user says create table t( name varchar(20) char set utf8) and then look at column length with GetSchemaTable there should be 20 for that column. However, the server might return 60 or 80 depending on version...
So we had to factor in the charset of the column which can be seen in field.cs when the charset index is set. It looks into our charset hash table to determine the byte count for that charset and then it sets the max byte property which for utf8 on 5.x would be 3. This is the code-change I missed when first looking into this report.
So, if it is a text field then column length is col size / max byte as my test case demonstrates.
[17 Feb 2010 12:11] Peter Hagen
Thanks for the explanation. I rewrote the code to use the GetSchema for getting the size, and use the GetSchemaTable() for getting the .net type, auto increment flag and primary key flag. So, that will work for now. It got a bit slower, but thats ok.
[17 Feb 2010 12:28] Tonci Grgin
Further discussion from server side in Bug#51241.
[17 Feb 2010 14:47] Reggie Burnett
Peter

You shouldn't need to rewrite.  As Tonci has demonstrated, he can't reproduce the bug.  If you are able to reproduce the bug using a 6.x connector then please give us steps as we would like to fix it.

Reggie