Bug #34672 Unable to insert surrogate pairs into or fetch surrogate pairs from unicode coln
Submitted: 19 Feb 2008 17:11 Modified: 28 Mar 2008 18:19
Reporter: John Water Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / ODBC Severity:S2 (Serious)
Version:5.01.02.00 OS:Windows (XP)
Assigned to: Jess Balint CPU Architecture:Any
Tags: surrogate, Unicode

[19 Feb 2008 17:11] John Water
Description:
With MyODBC driver 5.1.2, it is impossible to insert surrogate pairs into or fetch surrogate pairs from a varchar column with a character set, ucs2 defined in MySQL 5.1.22 and MySQL 6.0.3.  Here is the table definition:
	create table test (pk int not null primary key,
                           c1 varchar( 20 ) character set ucs2)
and surrogate pair to be inserted into c1 is
	0xDC60D802
It is okay to insert this value to c1 through mysql.exe.  However the MyODBC driver with MySQL 5.1 will show the following error:
[MySQL][ODBC 5.1 Driver][mysqld-5.1.22-rc-community]Incorrect string value: '\xF0\x90\xA1\xA0\xF0\x90...' for column 'c1' at row 1

and the MyODBC driver with MySQL 6.0.3 will insert wrong value to test and the values would be 
0x003F003F003F003F
in the table.

The MyODBC driver contains the same kind of problem, when applications fetch surrogate pairs from MySQL.

A simple repro will be attached.  We have run this repro against Oracle, MS SQL Server, Sybase ASE, Sybase SQL Anywhere, MySQL 5.1.22, and MySQL 6.0.3 and the results are shown as:

Oracle:

ORA Test Begin ...
fetch returned 0, ind = 8
Fetched value in hex:
 02 D8 60 DC 03 D8 60 DC ... Test End
 
MS SQL Server:

MSS Test Begin ...
fetch returned 0, ind = 40
Fetched value in hex:
 02 D8 60 DC 03 D8 60 DC 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... Test End
 
Sybase ASE:

ASE Test Begin ...
fetch returned 0, ind = 8
Fetched value in hex:
 02 D8 60 DC 03 D8 60 DC ... Test End
 
Sybase SQL Anywhere:

SQL Test Begin ...
fetch returned 0, ind = 8
Fetched value in hex:
 02 D8 60 DC 03 D8 60 DC ... Test End
 
MySQL 5.1.22-rc-community:

MYS Test Begin ...
error at line: 259
SQLSTATE = HY000
NATIVE ERROR = 1366
MSG = [MySQL][ODBC 5.1 Driver][mysqld-5.1.22-rc-community]Incorrect string value: '\xF0\x90\xA1\xA0\xF0\x90...' for column 'c1' at row 1

MySQL 6.0.3-alpha-comminuty:

MYS Test Begin ...
fetch returned 0, ind = 8
Fetched value in hex:
 3F 00 00 00 00 00 00 00 ... Test End

How to repeat:
A repro will be attached.  To reproduce this problem,
1) unzip the attached zip file that contains odbcbug.c and odbcbug.exe for Windows;
2) run the following command
odbcbug "dsn=yguo_dsn;uid=your_uid;pwd=your_password" db_type
where db_type could be ORA, MSS, ASE, ASA, or MYS
[19 Feb 2008 18:18] Jess Balint
John, Thanks for your bug report. The characters you are trying to use are only supported in MySQL 6.0.4 and above. Please try your test using 6.0.4 and let us know the result.
[19 Feb 2008 21:51] Jess Balint
John - Sorry, I've just noticed that MySQL 6.0.4 has not yet been released.
[19 Feb 2008 21:52] Jess Balint
fix insert_param error handling, and raise an error if unsupported characters are bound

Attachment: bug34672.diff (application/octet-stream, text), 12.62 KiB.

[29 Feb 2008 21:30] Lawrenty Novitsky
patch approved
[17 Mar 2008 19:26] Jess Balint
Committed as rev 1076 and will be released in 5.1.3.
[28 Mar 2008 18:19] MC Brown
A note has been added to the 5.1.3 changelog: 

Inserting characters to a UTF8 table using surrogate pairs would fail and insert invalid data.