MySQL Bugs: #7756: Extended ASCII characters being returned as odd square

Bug #7756	Extended ASCII characters being returned as odd square
Submitted:	10 Jan 2005 5:18	Modified:	13 Jan 2005 21:43
Reporter:	Kris Kimbrough	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	Connector / NET	Severity:	S3 (Non-critical)
Version:	1.0.3	OS:	Windows (Win/XP)
Assigned to:	Reggie Burnett	CPU Architecture:	Any

Description:
I have data collected from many systems and some have an odd single quote character in them that are ASCII code 146 ( example: God’s Way).  When I read this with the MySQL Connecter for .Net it comes out as a square symbol character (&#146;)

If I read the same data using MySQLDriverCS it comes out as the correct symbol.

How to repeat:

Create a varchar field and populate it with:

God’s Way

Read the data (similar code to what I'm using)

private void ReadIt(MySqlConnection  conn)
{
	DataRow r = null ;
	MySqlCommand sqlCommand = new MySqlCommand("SELECT Title from books where ID=1", conn) ;
	DataTable oDS = new System.Data.DataTable( "TestTable" ) ;
	MySqlDataAdapter oAdapter = new MySqlDataAdapter(sqlCommand);
	oAdapter.Fill( oDS ) ;
	r = oDS.Rows[0] ;
}                                        

Examine the data in the row read.

Seems to display the proper character on MySql 4.1.7, my environment is 4.0.23

Kris

What bytes are sent back for that column?  If you are using latin1 encoding, this would be basically right, though I would expect it to come back as ascii 0x27.  The reason is that the latin1 encoding doesn't define a character for 146 (x92).  If you do get at latin1 encoding and do a GetBytes() on a string that contains a 0x92 you'll see that it converts the 0x92 to 0x27

-reggie

Reggie,

I think there is something here, but not significant enough for me to pursue.  The same code, the same db table, do produce different results on MySQL 4.0 vs. 4.1.   

I see the square character both in IE and in the VS debugger.  VS is a pure unicode environment, so l'm a little confused where latin1 comes into play?  Is there a fundamental character set difference between MySQL 4.0 and 4.1?  

If I assign the character in question to a char datatype it is ascii code 146, if I take the same character as rendered on the 4.1 database server (where it displays correctly) it is ascii code 8217.

Hopefully the problem is just my not understanding the db engine charset issues well enough.  Thanks

Kris

Kris

There were many character set changes between 4.0 and 4.1.  4.0 basicallly had a single charset for the entire database.  With 4.1 you can have a different charset for the database, the tables, and the columns, plus you can specify what charset you talk to the database with.  MySQL defaults to latin1.  Unless you override it on the connection string, the connector uses the default of the database to create an encoding to use.  So, by default, the connector will create a latin1 encoding and convert all text entered into that encoding before sending it to the database.  Since latin1 doens't include the character you were talking about, the latin1 encoding converted it to the apostrophe.  You can override that by giving the charset option on the connection string such as charset=utf8

-reggie