Bug #28263 INCORRECT STING VALUE WHEN INSERTING MEMO (ERR 1366)
Submitted: 5 May 2007 21:23 Modified: 22 Oct 2007 15:42
Reporter: Louis Breda van Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / ODBC Severity:S2 (Serious)
Version:5.0-20070505 OS:Windows (VISTA 64)
Assigned to: Jess Balint CPU Architecture:Any
Tags: BLOB, ODBC, text

[5 May 2007 21:23] Louis Breda van
Description:
Hello,

I did install odbc-5.0-win32-nightly-20070505 today and did a testrun with a DB.

The first thing I tried to do is to copy data from a MsAccess2003 source DB towards a MySQL 5117 destination DB via ODBC.

The copy is executed by a series of DoCmd.RunSQL "Add new records" querys.

At a certain moment after execution from 15 of this kind of querys, the process stops with an error 

1366 incorrect stringvalue for field_x at row 1 \xe2\x80\x97utl 

Why I do not know. (it simply works when using an ms-access equivalent of the same destination DB)

The source data was originally imported from an xls. So I was considering the posibility of binairy data as a problem cause. For that reason I temporary changed the TEXT field to a BLOB field. Same problem. I also checked for the sting length. No problem there as well. 
 

How to repeat:
I do not know what causes the problem, but the situation is like this:

SQL_AddNewRecord = "bla bla, memofield, bla bla" where somefield equals loopcounter

for 1 up to n do

       docmd.runsql SQL_AddNewRecord

end loop

After looping x-times (15??) the proces stops with error 1366
[5 May 2007 22:14] MySQL Verification Team
Thank you for the bug report. Could you please provide the complete test
case code which provokes the issue reported and the create table script.
Thanks in advance.
[6 May 2007 11:00] Louis Breda van
Sorry,

Of course is it common intrest to have bug fixed, but I do not see it as my task to write testcases. To be honest I already feel to much being a (early) beta tester. 
I also think that it is more effective to look at the code and try to find problems by thinking about how this error could occur. And check thinks.

I can add that the database I am using does contain a few 100.000 records and each addnew query is adding a couple of thousend records.

Still willing to help, but not to the extend to writing testcases.

Sincerely,

Louis
[14 May 2007 18:05] Jess Balint
I ran the following program in Access, where one of the fields is a memo field, but I haven't encountered any problem.

Public Sub Test_Bug_28263()
DoCmd.SetWarnings False
For i = 1 To 20000
DoCmd.RunSQL "insert into simple2 (details, age) values ('bug+28263'," & i & ")"
Next i
End Sub

Luis - Can you suggest a way to make this closer to your environment? Or provide an ODBC trace log so we can see the exact error encountered?
[23 May 2007 15:52] Bambarbia Kirkudu
I have similar problems with Java-5, JDBC, and MySQL 5.0.41 InnoDB utf8.

I am fetch the page http://www.cdw.com/shop/products/default.aspx?ProductID=927789 from Internet as a byte array, define encoding (by sniffering HTTP headers and META tags; META has higher priority):
Content-Type: text/html; charset=utf-8
<meta http-equiv='Content-Type' content='text/html; charset=ISO-8859-1'/> 

Then I create a Java String by [Byte Array, Encoding]. Then try to store it in LONGTEXT field.

===================
2007-05-22 21:40:37,541 [cdw.com ] ERROR org.tokenizer.fetcher.FetcherThread - http://www.cdw.com/shop/products/default.aspx?ProductID=927789
org.springframework.jdbc.UncategorizedSQLException: PreparedStatementCallback; uncategorized SQLException for SQL [update web_resource set url_actual=?, last_success_time=?, fail_num=?, status_code=?, status_text=?, text=?, html=? where resource_id=?]; SQL state [HY000]; error code [1366]; Incorrect string value: '\xF4\x80\x80\xB9Bu...' for column 'text' at row 1; nested exception is java.sql.SQLException: Incorrect string value: '\xF4\x80\x80\xB9Bu...' for column 'text' at row 1
Caused by:
java.sql.SQLException: Incorrect string value: '\xF4\x80\x80\xB9Bu...' for column 'text' at row 1
[23 May 2007 17:27] Bambarbia Kirkudu
Welcome to our Testing Department!

I was able to locate the problem: this is \x80 character which is "Padding Character" (ISO-8859-1) and should be used only for communication protocols (and MUST NOT be a part of text!). It still remains the same (with different code point) in UTF-8 after conversion.

In initial bug report, and in my Java case, we have same character \x80. This character must not be used as part of HTML, for instance. http://htmlhelp.com/reference/charset/

"The ISO/IEC 8859-1 standard has long been the basis of a number of character maps, also known as character sets, charsets, or code pages, the most popular being ISO-8859-1 (note the extra hyphen) and Windows-1252. Both of these maps are a superset of ISO/IEC 8859-1; they supplement the standard's 191 character assignments by mapping additional characters to at least some portion of the code value ranges 00–1F, 7F, and 80–9F."
... ... ...
"Note that most of these control characters are not made for use in portable ISO-8859-1 encoded plain text documents, but only within specific protocols or devices, except a few ones whose behavior are standardized: TAB (09), LF (0A), CR (0D) and NEL (85)..."

http://en.wikipedia.org/wiki/ISO_8859-1

Thanks
[23 May 2007 18:48] Bambarbia Kirkudu
In Windows 1252 (Windows Western), 0x80 mistakenly assigned "Euro Sign".
In ISO-8859-1 it is assigned "Padding Character".

You may check page with "EURO": 
http://www.cdw.com/shop/products/default.aspx?ProductID=927789
(CDW.com: sorry for spam)

Look at the following string in an HTML source (using Mozilla or Internet Explorer: 
valign='top'>&#149;&nbsp;</td>

Then, store an HTML in a file system. Open it and look how it was changed:
valign="top">•&nbsp;</td>

Note that 149 (or 0x95) has an Image assigned in Windows which is "Bullet". However, it is "Message Waiting" control character in ISO-8859-1. Additionally, HTTP header from website tells us to use UTF-8.

This is probably not related to MySQL Bugs Report database; anyway...

"In Windows-1252, codes between 0x80 and 0x9F are used for letters and punctuation, whereas they are control codes in ISO-8859-1. Many web browsers and e-mail clients will interpret ISO-8859-1 control codes as Windows-1252 characters in order to accommodate such mislabeling but it is not a standard behaviour and care should be taken to avoid it."

Use BLOB (with additional encoding field) instead of TEXT if you need to store windows files (even text files). Better than 3-byte UTF-8 and multi-tier encoding overhead.
[23 May 2007 18:53] Bambarbia Kirkudu
"BLOB columns are treated as binary strings (byte strings)."
Why? 
Another candidate for a bug report?
[23 May 2007 23:10] Mark Matthews
Please be more specific about BLOBs being treated as binary strings, I don't understand what you mean.

Do you mean you get a byte[] when you call .getObject() on a ResultSet containing them, or something else?
[25 May 2007 15:10] Bambarbia Kirkudu
"BLOB columns are treated as binary strings (byte strings)." - quote from documentation at http://dev.mysql.com/doc/refman/5.1/en/blob.html

I tried to say the same, "please be more specific, I can't understand". What is "byte strings"?

"BLOB  columns have no character set, and sorting and comparison are based on the numeric values of the bytes in column values."

And again, why do we have BLOB and TEXT on a same page, why do we need to sort BLOBs? to compare BLOBs (limiting to 767 bytes with InnoDB)? 

"BLOB columns have no character set" - nice.
[25 May 2007 15:50] Mark Matthews
> "BLOB  columns have no character set, and sorting and comparison are based on
> the numeric values of the bytes in column values."

> And again, why do we have BLOB and TEXT on a same page, why do we need to sort
> BLOBs? to compare BLOBs (limiting to 767 bytes with InnoDB)? 

>"BLOB columns have no character set" - nice.

Because they are _binary_ data? It doesn't make logical sense for them to have a character set. Let's say you store a JPEG in a BLOB column (which is often done). What character set should be used? How about a serialized Java object?

If you use _any_ character set, you'll corrupt most binary data, because not all sequences of bytes are valid in any given character set.

I've yet to use a database system where _BLOBs_ have a character set. That's why the whole concept of CLOBS (character large objects, which MySQL calls the "TEXT" type) exist, for example see

http://www.postgresql.org/docs/8.2/interactive/datatype-binary.html
http://msdn2.microsoft.com/en-us/library/ms187993.aspx
http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14200/sql_elements001.htm#st...
(etc, etc)
[22 Oct 2007 15:42] Susanne Ebrecht
Hi all,

as far as I understand this, this is not a bug.

Louis: you mixed ISO encoding with UTF8 encoding. Please make sure, that all your data have the same encoding.

Please try also our new MyODBC connector version 5.1.

Many thanks for using MySQL.

Regards,

Susanne