Bug #7756 | Extended ASCII characters being returned as odd square | ||
---|---|---|---|
Submitted: | 10 Jan 2005 5:18 | Modified: | 13 Jan 2005 21:43 |
Reporter: | Kris Kimbrough | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | Connector / NET | Severity: | S3 (Non-critical) |
Version: | 1.0.3 | OS: | Windows (Win/XP) |
Assigned to: | Reggie Burnett | CPU Architecture: | Any |
[10 Jan 2005 5:18]
Kris Kimbrough
[10 Jan 2005 15:36]
Kris Kimbrough
Seems to display the proper character on MySql 4.1.7, my environment is 4.0.23
[12 Jan 2005 20:26]
Reggie Burnett
Kris What bytes are sent back for that column? If you are using latin1 encoding, this would be basically right, though I would expect it to come back as ascii 0x27. The reason is that the latin1 encoding doesn't define a character for 146 (x92). If you do get at latin1 encoding and do a GetBytes() on a string that contains a 0x92 you'll see that it converts the 0x92 to 0x27 -reggie
[13 Jan 2005 4:48]
Kris Kimbrough
Reggie, I think there is something here, but not significant enough for me to pursue. The same code, the same db table, do produce different results on MySQL 4.0 vs. 4.1. I see the square character both in IE and in the VS debugger. VS is a pure unicode environment, so l'm a little confused where latin1 comes into play? Is there a fundamental character set difference between MySQL 4.0 and 4.1? If I assign the character in question to a char datatype it is ascii code 146, if I take the same character as rendered on the 4.1 database server (where it displays correctly) it is ascii code 8217. Hopefully the problem is just my not understanding the db engine charset issues well enough. Thanks Kris
[13 Jan 2005 21:43]
Reggie Burnett
Kris There were many character set changes between 4.0 and 4.1. 4.0 basicallly had a single charset for the entire database. With 4.1 you can have a different charset for the database, the tables, and the columns, plus you can specify what charset you talk to the database with. MySQL defaults to latin1. Unless you override it on the connection string, the connector uses the default of the database to create an encoding to use. So, by default, the connector will create a latin1 encoding and convert all text entered into that encoding before sending it to the database. Since latin1 doens't include the character you were talking about, the latin1 encoding converted it to the apostrophe. You can override that by giving the charset option on the connection string such as charset=utf8 -reggie