Bug #47985 UTF-8 String Length Issue (guids etc)
Submitted: 12 Oct 2009 7:11 Modified: 5 Nov 2009 16:18
Reporter: Christos Pavlides Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / NET Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Reggie Burnett CPU Architecture:Any
Tags: UTF-8 Guid Length

[12 Oct 2009 7:11] Christos Pavlides
Description:
The length of a particular string column is not calculated properly in the connector. This is especially true for utf-8 character set where the db server is reporting the length of string columns (char, varchar) as (actual column length defined in db * 3 or 4).
This is normally not a big deal, but the connector does not behave correctly on a query that includes a utf-8 char column of length 12 on mySql server 5.x. Since the server will report the length of the column as 12 * 3 in this case the field will be identified as a guid, and if an attempt to read any data on this column is made the new Guid(string) initializer will throw an exception of the form: "Guid should contain 32 digits with 4 dashes ..."

How to repeat:
In my case I have a 5.1.39 server, with a default charset set to utf-8. The query is being executed on a view which includes a char column of length 12.
But this should be easily repeatable with any column set as char 12 and utf-8 encoding running a 5.1.x version of the server. I believe for 6.x the length of utf-8 characters is set to 4x, therefore on a 6.x server this should be repeatable with a column defined as char 9.

Suggested fix:
I am not sure of any other impact on the connector, if it does not have any side effects I believe the length on the field object should be set to field.ColumnLength/field.MaxLength (I believe maxlength in this case is either 1, 3 or 4, depending on the encoding of the datatype in the db)
(in function GetFieldMetaData41() of nativeDriver)

If this has any sideffects then the easy fix is to add an additional check to take into account the character set, or max length, before the decision for a guid using the new method, in Field.SetTypeAndFlags(..) to prevent the connector from taking the wrong decision to decode this field as guid.
[12 Oct 2009 9:23] Tonci Grgin
Hi Christos and thanks for your report.

Can you please attach a small, self-sufficient test case so I can check? I'm particularly interested in connection string used.
[12 Oct 2009 10:34] Christos Pavlides
I don't have a test case, since I was in a hurry to make the necessary changes and deploy a new server... but the connection string is:
"server=127.0.0.1;user id=user;persist security info=True;database=db;password=pass"

I think your concern is if oldGuids is on or off. In order to reproduce this issue you have to set the oldguids to false (the default). 

I think a test case should be simple enough, create a table on 5.1.x server with datatype char of length 12 and set its char set to utf8. Then do a simple select on it using the connector. I tried the Execute reader and the fill dataset but I think any action should work.

Please note that the previous comments are just assumption by looking at the code, since I only tested this using a View (instead of a table). Therefore this problem might be related only to views..
[12 Oct 2009 11:52] Tonci Grgin
Ok, taken into account.
[12 Oct 2009 12:15] Christos Pavlides
Thanks Tonci and sorry that I don't have the time to create a test case, or even do some manual tests. I will try to do it as soon as possible if required, but I think the particular problem should be easy to spot.
[12 Oct 2009 12:35] Tonci Grgin
DDL and test-results

Attachment: Bug47985.txt (text/plain), 3.11 KiB.

[12 Oct 2009 12:38] Tonci Grgin
Christos, using following test case:
            MySqlConnection conn = new MySqlConnection();
            conn.ConnectionString = "DataSource=**;Database=te**st;UserID=**;Password=**;PORT=**;logging=True;oldGuids = false;charset=utf8;";
            conn.Open();

            MySqlCommand cmd = conn.CreateCommand();
            cmd.CommandTimeout = 50;
            cmd.CommandText = "SELECT * FROM bug47985;";
            try
            {
                MySqlDataReader dr = cmd.ExecuteReader();
                Console.WriteLine("T DataTypeName(1): " + dr.GetDataTypeName(1));
                Console.WriteLine("T GetFieldType(1): " + dr.GetFieldType(1));
                dr.Close();
                cmd.CommandText = "SELECT * FROM vbug47985;";
                dr = cmd.ExecuteReader();
                Console.WriteLine("V DataTypeName(1): " + dr.GetDataTypeName(1));
                Console.WriteLine("V GetFieldType(1): " + dr.GetFieldType(1));
                dr.Close();
                
                cmd.Dispose();
                conn.Close();
            }
            catch (Exception ex)
            {
                Console.Out.WriteLine(DateTime.UtcNow.ToLongTimeString() + "  " + "Exception: " + ex.Message);
                throw;
            }

and DDL statements attached, I am unable to reproduce the problem...
T DataTypeName(1): VARCHAR
T GetFieldType(1): System.String

V DataTypeName(1): VARCHAR
V GetFieldType(1): System.String

Ideas?
[12 Oct 2009 12:41] Tonci Grgin
Sorry, forgot to add most important parts of DDL:
mysql> create table bug47985 (Id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, VCCol VARCHAR(12) CHARSET utf8 COLLATE utf8_bin) CHARSET utf8;
Query OK, 0 rows affected (0.04 sec)

mysql> create view vbug47985 as select * from bug47985;
Query OK, 0 rows affected (0.01 sec)
[12 Oct 2009 12:45] Christos Pavlides
Tonci the column must be Char not VarChar
[12 Oct 2009 12:47] Tonci Grgin
Ok, retesting.
[12 Oct 2009 12:54] Tonci Grgin
True, verified as described both for table and for view...
  oldGuids = false
  CHAR column of length 12, UTF8

T DataTypeName(1): CHAR(36)
T GetFieldType(1): System.Guid

V DataTypeName(1): CHAR(36)
V GetFieldType(1): System.Guid

New DDL:
mysql> create table bug47985 (Id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, CCol CHAR(12) CHARSET utf8 COLLATE utf8_bin) CHARSET utf8;
Query OK, 0 rows affected (0.03 sec)

mysql> create view vbug47985 as select * from bug47985;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from vbug47985;
Field   1:  `Id`
Catalog:    `def`
Database:   `test`
Table:      `vbug47985`
Org_table:  `vbug47985`
Type:       LONG
Collation:  binary (63)
Length:     10
Max_length: 0
Decimals:   0
Flags:      NOT_NULL PRI_KEY UNSIGNED AUTO_INCREMENT NUM PART_KEY 

Field   2:  `CCol`
Catalog:    `def`
Database:   `test`
Table:      `vbug47985`
Org_table:  `vbug47985`
Type:       STRING
Collation:  utf8_general_ci (33)
Length:     36
Max_length: 0
Decimals:   0
Flags:      BINARY 

0 rows in set (0.01 sec)

mysql> select * from bug47985;
Field   1:  `Id`
Catalog:    `def`
Database:   `test`
Table:      `bug47985`
Org_table:  `bug47985`
Type:       LONG
Collation:  binary (63)
Length:     10
Max_length: 0
Decimals:   0
Flags:      NOT_NULL PRI_KEY UNSIGNED AUTO_INCREMENT NUM PART_KEY 

Field   2:  `CCol`
Catalog:    `def`
Database:   `test`
Table:      `bug47985`
Org_table:  `bug47985`
Type:       STRING
Collation:  utf8_general_ci (33)
Length:     36
Max_length: 0
Decimals:   0
Flags:      BINARY
[12 Oct 2009 12:59] Christos Pavlides
Thanks Tonci
[12 Oct 2009 13:07] Christos Pavlides
This is my quick patch to solve the problem

Attachment: GuidIssue.patch (application/octet-stream, text), 1.71 KiB.

[23 Oct 2009 15:54] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/88012

781 Reggie Burnett	2009-10-23
      - fixed guid type so that multi-byte character sets will not effect how it works.  A column would be
        considered a guid if it has a *character* length of 36, not a *byte* length of 36 (bug #47985)
[23 Oct 2009 15:55] Reggie Burnett
Fixed in 6.1.3+
[23 Oct 2009 19:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/88040

791 Reggie Burnett	2009-10-23 [merge]
      - fixed guid type so that multi-byte character sets will not effect how it works.  A column would be
        considered a guid if it has a *character* length of 36, not a *byte* length of 36 (bug #47985)
[23 Oct 2009 19:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/88043

790 Reggie Burnett	2009-10-23 [merge]
      - fixed guid type so that multi-byte character sets will not effect how it works.  A column would be
        considered a guid if it has a *character* length of 36, not a *byte* length of 36 (bug #47985)
[5 Nov 2009 16:18] Tony Bedford
An entry has been added to the 6.1.3 changelog:

For some character sets such as UTF-8, a CHAR column would sometimes be incorrectly interpreted as a GUID by MySQL Connector/NET.

MySQL Connector/NET was changed so that a column would only be interpreted as a GUID if it had a character length of 36, as opposed to a byte length of 36.