Bug #13662 | Prepare() truncates accented character input | ||
---|---|---|---|
Submitted: | 30 Sep 2005 15:34 | Modified: | 3 Oct 2005 16:05 |
Reporter: | Mark Modrall | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | Connector / NET | Severity: | S2 (Serious) |
Version: | 1.0.4 | OS: | Windows (Windows) |
Assigned to: | Reggie Burnett | CPU Architecture: | Any |
[30 Sep 2005 15:34]
Mark Modrall
[2 Oct 2005 8:13]
Valeriy Kravchuk
Thank you for a bug report. What version of MySQL server do you use?
[3 Oct 2005 14:09]
Mark Modrall
We're on 4.1.12 server on an intel linux box.
[3 Oct 2005 16:05]
Reggie Burnett
Thank you for taking the time to report a problem. Unfortunately you are not using a current version of the product your reported a problem with -- the problem might already be fixed. Please download a new version from http://www.mysql.com/downloads/ If you are able to reproduce the bug with one of the latest versions, please change the version on this bug report to the version you tested and change the status back to "Open". Again, thank you for your continued support of MySQL. Additional info: Mark I confirmed this was broken in 1.0.4 and is fixed now in 1.0.6. One thing you need to be aware of is some quirkiness with collation. In your example you are using utf8_bin. THis shouldn't cause the server to return the data to the connector as a binary blob but with current versions of 4.1 and 5.0 it does. The connectors do their best to return the right types but sometimes we can't. In your example, the columns will be returned from an executescalar as byte[]. Changing the collation to be not binary fixes it. You can always do a reader.GetString() or convert to string yourself. I am about to test some code that will handle this particular case but I just wanted you to be aware of the situation in general.
[3 Oct 2005 16:43]
Mark Modrall
Hi Reggie... Thanks for the update. I was unclear on one thing, however. You said that the problem is fixed in 1.0.6 and then had the note about what the binary collation does to fetches. Am I correct in understanding these as two separate points? The first point, on which the bug was filed, has to do with data being shipped up to the db, and the second point being the representation of the binary collation when data is fetched back out. Am I understanding you correctly? I switched to using the binary collation because the behavior of regular utf8 was kind of unpredictable. If you created an index on a text column in utf8, there is some kind of implicit normalization going on that was, at least to me, inconsistent. In terms of how the index was constructed, it seemed that diacriticals were all normalized out so resumé was the same as resume for the index. But that normalization wasn't applied to the data in the column, so whichever presentation got inserted first was the one that stuck. For example if you have a UNIQUE index on the text column and you insert resumé, then you try to insert resume, it will say it's a duplicate key because the index has stripped the diacriticals. If you fetch that column from the database, though, it comes back as resumé (the diacritical not stripped) because that was the first one inserted. Not being able to get a consistent answer, I switched to using utf8_bin so that the normalization on the index wouldn't happen. So far anyway getting the results back as a byte[] hasn't really been that hard to work with. So if I understood your points correctly, I can upgrade to 1.0.6 and get this truncation problem addressed while still keeping my utf8_bin for consistency, right? As an aside, how about but #12245? that was marked closed but I haven't seen it in any of the release notes) Thanks -Mark
[3 Oct 2005 17:12]
Reggie Burnett
Mark Correct. There are two issues in my post. There was a bug in 1.0.4 (which is now fixed). But I also wanted to bring the collation thing to your attention. It can be confusing to select a text col and get a byte[] back.