Bug #15678 Incorrect unescape of the binary data sent from sjis/cp932 client
Submitted: 12 Dec 2005 7:14 Modified: 4 May 2006 11:33
Reporter: Shuichi Tamagawa Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.x, 4.x OS:Any (Any)
Assigned to: Alexander Barkov CPU Architecture:Any

[12 Dec 2005 7:14] Shuichi Tamagawa
Description:
The binary data stored in the BLOB field, for example, can get broken under the following conditions:
 - Client character set is sjis or cp932.
 - The binary data contains the control code which needs to be escaped.

e.g.) 

If the binary data contains 0x9500, for example, it will be escaped and becomes 0x955C30. (0x00 is NULL, and it becomes '5C30' when escaped). However, when the data is passed to the server, the byte string is not unescaped if the client character set is sjis/cp932. This is probably because sjis/cp932 has a character of '955C' code point. This could happen for other patterns since sjis/cp932 has the characters which have 0x5C at the second byte of the character.

How to repeat:
Upload the attached file containing sjis 0x95CC data(0x95CC.txt) using the attached php script.

Suggested fix:
It seems that mysql server unescapes the data sent from the client based on the client character set. It is correct not to unescape the 0x5C which is part of the character, if the client character set is sjis/cp932 and the data is sent as character. However, if the data is sent as binary, it should unescape the byte string if it is stored in the BLOB field, even if the client character set is sjis/cp932. This could be avoided by sending the data in ‘0x’ + ‘binary string’ format. But it is troublesome for the developers.
[12 Dec 2005 7:15] Shuichi Tamagawa
Test file to be used by the PHP script

Attachment: 0x9500.txt (text/plain), 2 bytes.

[12 Dec 2005 7:15] Shuichi Tamagawa
PHP script to reproduce the problem

Attachment: blob.php (application/octet-stream, text), 1.08 KiB.

[21 Dec 2005 20:31] Aleksey Kishkin
testcase I used

Attachment: tt.c (application/octet-stream, text), 1.63 KiB.

[4 May 2006 11:33] Alexander Barkov
Shuichi, this is not a bug. You need to use HEX representation,
if you're going to send "dangerous" binary data, which have special
meening in sjis. You can do it either manually, or using mysql_hex_escape_string().

Another approach is to use "set names binary" before sending dangerous strings.