Bug #51927 Mapping for MySQL's latin1 character set in .NET connector
Submitted: 10 Mar 2010 18:25 Modified: 22 Apr 2010 13:31
Reporter: Anita Novello Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / NET Severity:S3 (Non-critical)
Version:6.x OS:Microsoft Windows
Assigned to: Reggie Burnett CPU Architecture:Any
Tags: .net, charset, connector, Latin1

[10 Mar 2010 18:25] Anita Novello
Description:
In .NET Connector, LoadCharsetMap() function of CharSetMap class set this mapping:

mapping.Add("latin1", new CharacterSet("latin1", 1));

but right mapping should be:

mapping.Add("latin1", new CharacterSet("windows-1252", 1));

because MySQL's latin1 is the same as the Windows cp1252 character set and it extends official ISO 8859-1 or IANA latin1.
For example, the euro sign is not handled correctly.

How to repeat:
Create in test database following table:
CREATE TABLE `charsettest` (
  `ID` int(11) NOT NULL,
  `Chars` varchar(256) NOT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Create a .NET solution with two projects, one containing MySQL.Data source code and second one (that refers to the first) containing a simple test class:

using System;
using System.Data;
using System.Text;
using MySql.Data.MySqlClient;

namespace EncodingTest
{
  class Program
  {
    static void Main(string[] args)
    {
      var connectionStringBuilder = new MySqlConnectionStringBuilder()
                        {
                          AllowBatch = true,
                          Server = "localhost",
                          Database = "test",
                          UserID = "testuser",
                          Password = "testuser",
                          CharacterSet = "latin1",
                          IgnorePrepare = false,
                          ConnectionProtocol = MySqlConnectionProtocol.Sockets,
                          AllowUserVariables = true,
                          Pooling = false
                        };

      // String with tipical characters of Windows cp1252 charset
      // But also if testString = "€€€"
      var testString = "€€€ƒƒƒŸŸŸ";

      var myConnection = new MySqlConnection(connectionStringBuilder.ToString());
      myConnection.Open();

      // Clean the table
      using (var command = myConnection.CreateCommand())
      {
        command.CommandText = "DELETE FROM `charsettest`";
        command.CommandType = CommandType.Text;
        command.ExecuteNonQuery();
      }
			
      // Insert one record with testString as value for chars field.
      using (var command = myConnection.CreateCommand())
      {

        command.CommandText = "INSERT INTO `charsettest` Values (@id, @chars)";
        command.CommandType = CommandType.Text;
        command.Parameters.Add("@id", MySqlDbType.Int32).Value = 1 ;
        command.Parameters.Add("@chars", MySqlDbType.VarChar).Value = testString;

        command.ExecuteNonQuery();
      }

      // Retrieve record just inserted and compare value of chars field with testString.
      using (var command = myConnection.CreateCommand())
      {
        command.CommandText = "SELECT `chars` FROM `charsettest`";
        command.CommandType = CommandType.Text;
        using (var reader = command.ExecuteReader())
        {
          if (reader.Read())
          {
            if (String.Compare((string)reader["chars"], testString) != 0)
              Console.WriteLine("Match not ok");
            else
              Console.WriteLine("Match ok");
          }
        }
      }

      Console.Read();
      myConnection.Close();
    }
  }
}

If in the CharSetMap.LoadCharsetMap() function I leave 

mapping.Add("latin1", new CharacterSet("latin1", 1));

match between original value and retrieved value doesn't work, otherwise if I set

mapping.Add("latin1", new CharacterSet("windows-1252", 1))

this match works.

Suggested fix:
Using 

mapping.Add("latin1", new CharacterSet("windows-1252", 1))

instead of

mapping.Add("latin1", new CharacterSet("latin1", 1));

in CharSetMap.LoadCharsetMap()
[15 Mar 2010 12:18] Tonci Grgin
Hi Anita and thanks for your report.

There is no CP1252 xml file c/NET can map to in /share/charsets so it's obsolete to talk of that. Further more, CP1252 is a superset of Latin1 so how come "...because MySQL's latin1 is the same as the Windows cp1252 character set"? As for Java, it is the *JVM* mapping that you refer to, same goes to CP932 for example.

As for your test case, I just checked something similar in Bug#52046:
    cmd.CommandText = "INSERT INTO bug52046 VALUES (2,'€',
and it works. I'll make sure I try your test case too.
[15 Mar 2010 12:23] Tonci Grgin
Ummm, I'd like Bar to comment on this:
| latin1   | cp1252 West European        | latin1_swedish_ci   |      1 |
[25 Mar 2010 10:38] Anita Novello
Hi Tonci,

thanks for your reply. If you see the MySQL Reference at http://dev.mysql.com/doc/refman/5.1/en/charset-we-sets.html about latin1, you'll read that "MySQL's latin1 is the same as the Windows cp1252 character set.". Hence   I would expect mapping.Add("latin1", new CharacterSet("windows-1252", 1)); as mapping in .NET Connector. Best Regards.
[6 Apr 2010 10:22] Tonci Grgin
Confirmed by Bar, verified as described.

Thanks Anita.
[19 Apr 2010 19:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/106042

803 Reggie Burnett	2010-04-19
      - changed mapping of latin1 from latin1 to windows-1252 (bug #51927)
[19 Apr 2010 19:25] Reggie Burnett
fixed in 6.0.6, 6.1.4, 6.2.4, and 6.3.2
[22 Apr 2010 13:31] Tony Bedford
An entry has been added to the 6.0.6, 6.1.4, 6.2.4 and 6.3.2 changelogs:

In MySQL Connector/NET, the LoadCharsetMap() function of the CharSetMap class set the following incorrect mapping:

mapping.Add("latin1", new CharacterSet("latin1", 1));
This meant that, for example, the Euro sign was not handled correctly.

The correct mapping should have been:

mapping.Add("latin1", new CharacterSet("windows-1252", 1));

This is because MySQL's latin1 character set is the same as the windows-cp1252 character set and it extends the official ISO 8859-1 or IANA latin1.
[12 May 2010 19:20] Janaka Abeywardhana
referring to the following in the last comment above

"An entry has been added to the 6.0.6, 6.1.4, 6.2.4 and 6.3.2 changelogs:"

By is 6.3.2 a typo?

This http://dev.mysql.com/doc/refman/5.0/en/connector-net-news-6-2-4.html says that the fix to this bug is in 6.2.4? Is there even a version 6.3.2

I'm confused.

Thanks
[12 May 2010 19:23] Janaka Abeywardhana
sorry, ignore that. Got my answer from here

http://dev.mysql.com/doc/refman/5.1/en/connector-net-news.html

Any body know when 6.2.4 will go GA?