Bug #28853 MySQL must replace non-supported characters with reverse question mark
Submitted: 3 Jun 2007 1:29 Modified: 8 Jun 2007 1:51
Reporter: Bambarbia Kirkudu Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: CPU Architecture:Any
Tags: MySQL

[3 Jun 2007 1:29] Bambarbia Kirkudu
Description:
Java Unicode has codepoints which are not in MySQL utf8.
MySQL throws ERRORs like
"Incorrect string value: '\xF4\x80\x81\xB1/-...' for column 'text' at row 1"

MySQL should replace unsopported characters with special "Reverse Question Mark" codepoint.

See discussion at http://forums.mysql.com/read.php?39,155435,155435#msg-155435

It looks like problem happens in a highly overloaded system with multithreaded client application.

Difficult to repeat.

How to repeat:
See discussion at http://forums.mysql.com/read.php?39,155435,155435#msg-155435

Suggested fix:
MySQL must replace non-supported characters with reverse question mark
[4 Jun 2007 19:27] Valeriy Kravchuk
Thank you for a problem report. As mark suspect server's bug here, please, send the entire test case, with exact statements used to repeat the problem with mysql command line client. What exact server version you had used?
[7 Jun 2007 17:10] Bambarbia Kirkudu
Hi Valeriy,
I am trying to catch exact binary array where it happens... I'll submit it with a Java code.
(Byte Array) -windows-1252-> (Java String 'Unicode') -utf8-> MEDIUMTEXT
It's very rare... 
It could be a bug in Java 5, I hope to catch it.
[7 Jun 2007 17:17] Valeriy Kravchuk
Any additional information on how exactly one may repeat the behaviour described is greatly appreciated.
[7 Jun 2007 19:36] Bambarbia Kirkudu
Test for all range of bytes 0-255 encoded into String

Attachment: Test.java (application/octet-stream, text), 1.11 KiB.

[7 Jun 2007 19:38] Bambarbia Kirkudu
Test for all range of bytes 0-255 encoded into String (modified)

Attachment: Test.java (application/octet-stream, text), 1.11 KiB.

[7 Jun 2007 19:38] Bambarbia Kirkudu
Test of a bytearray retrieved from URL

Attachment: TestURL.java (application/octet-stream, text), 1.57 KiB.

[7 Jun 2007 19:54] Bambarbia Kirkudu
I tried to execute those tests in a standalone Windows 2000 environment, the problem never happens.

I created byte array: 0, 1, 2, 3, ..., 255

Converted into Java String using "windows-1252", and using "ISO-8859-1". The problem does not happen.

I would suspect hardware failure or multithreading issues but the problem was repeatable - it always happened on same URL before. It does not happen with same URL in isolated test (see attached).

The only difference is that I use HttpClient in production; and I use multithreaded client.

Strange: if all 0-255 bytes can be converted via windows-1252 and stored to utf8, why such SQL exceptions sometimes happen:

Incorrect string value: '\xF4\x80\x81\xB1/-...' for column 'text' at row 1
Incorrect string value: '\xF4\x80\x81\xB1/-...' for column 'text' at row 1
Incorrect string value: '\xF4\x80\x81\xB1\x0AP...' for column 'text' at row 1
Incorrect string value: '\xF4\x80\x81\xB1\x0AP...' for column 'text' at row 1
Incorrect string value: '\xF4\x80\x81\xBA D...' for column 'text' at row 1
Incorrect string value: '\xF4\x80\x81\xB8 A...' for column 'text' at row 1

Connector/J version: 5.0.6

It could be possibly "Handshake" issue in a multithreaded environment; I need to check server logs tonight.
[7 Jun 2007 19:59] Bambarbia Kirkudu
Test of a bytearray retrieved from URL (updated)

Attachment: TestURL.java (application/octet-stream, text), 1.45 KiB.

[7 Jun 2007 20:01] Bambarbia Kirkudu
This method creates Java String from a byte array retrieved from URL:
new String(content,"windows-1252")
I use simplified constant "windows-1252" which is superset for "ISO-8859-1"; in production I check HTTP headers and <meta> tags to retrieve encoding.
[8 Jun 2007 1:20] Bambarbia Kirkudu
Please close it.

Looks like it never happens with Connector/J 5.0.6
I don't relly know where was a problem, but it does not happen anymore...
Thanks!

Posted on forum, http://forums.mysql.com/read.php?39,155435,156721#msg-156721
...Java replaces some bytes 0x80, 0x81, ... with regular question mark "???" in case of ISO-8859-1. Database receives "???" without exceptions. In case of "windows-1252", 0x80 is "Euro Currency Sign" - no problem....
[8 Jun 2007 1:51] MySQL Verification Team
Thank you for the feedback.
[31 Oct 2008 12:48] Oleg Shpak
Hi,

There is still a problem. MySQL can't save a Unicode character U+1007A which is encoded in UTF-8 as \xF4\x80\x81\BA. Allegedly this is a bullet character which some versions of MS Word use.
Interestingly enough, Java also does not properly decode this sequence (http://java.sun.com/javase/6/docs/api/java/io/DataInput.html#modified-utf-8), but it does not cause any problems.

I'll submit a test case to reproduce this bug later today.
[31 Oct 2008 12:57] Oleg Shpak
A test case which reproduces this bug/issue

Attachment: Test.java (application/octet-stream, text), 970 bytes.

[31 Oct 2008 12:58] Oleg Shpak
I have just added a test case and here is the output it generates:

java.sql.SQLException: Incorrect string value: '\xF4\x80\x81\xBA' for column 'text' at row 1
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:946)
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2870)
	at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1573)
	at com.mysql.jdbc.ServerPreparedStatement.serverExecute(ServerPreparedStatement.java:1169)
	at com.mysql.jdbc.ServerPreparedStatement.executeInternal(ServerPreparedStatement.java:693)
	at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:794)
	at Test.main(Test.java:31)
[14 Nov 2008 11:03] Oleg Shpak
According to http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html 4-byte utf-8 sequences are not yet supported, so the server behaviour is correct.
[13 Aug 2019 11:04] Paul Frischknecht
There are 10 issues in my company about this, all reported to the people operating our Atlassian products, who cannot do anything to fix it. Please address this.

Links:

https://confluence.atlassian.com/confkb/saving-page-throws-unable-to-communicate-with-serv...
https://jira.atlassian.com/browse/CONFSERVER-18509?_ga=2.176522150.2028302740.1565693913-6...
https://jira.atlassian.com/browse/CONFSERVER-32453?_ga=2.176522150.2028302740.1565693913-6...
https://unicode.org/emoji/charts/full-emoji-list.html
https://bugs.mysql.com/bug.php?id=28853