Bug #8378 String escaped incorrectly with client character set 'gbk'
Submitted: 8 Feb 2005 23:07 Modified: 31 May 2006 20:09
Reporter: Jim Winstead Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.9 OS:Windows (Windows)
Assigned to: Sergei Golubchik CPU Architecture:Any

[8 Feb 2005 23:07] Jim Winstead
Description:
Report from Freddie Bingham as comment to Bug #8317:

Your contention is that mysql_real_escape_string is going off of the character
set that was used to connect with, being latin1 by default.  Is the assumption
then if I was to connect with gbk, then this insert would function? It doesn't
as far as my tests go.

How to repeat:

Example:

c:\windows\my.ini:

[client]
port=3306
default-character-set=gbk

PHP Code:

$mysqli = mysqli_init();
$mysqli->options(MYSQLI_READ_DEFAULT_FILE, "C:\windows\my.ini");
$mysqli->real_connect('localhost', 'user', 'password', 'database');
echo $mysqli->character_set_name();

This returns:

gbk_chinese_ci

Insertion code (field is mediumblob):

$string = "'";
UPDATE field SET filedata = '" . $mysqli->real_escape_string($string) . "'

Result:

Errormessage: You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use near
''\''' at line 2 

This is the same error that is reported when the client character set is not set
and defaults to latin_swedish_ci
[9 Feb 2005 4:31] Jim Winstead
This is ugly. The problem is that 0xbf27 is not a valid multi-byte GBK character, but 0xbf5c is. So mysql_real_escape_string() returns 0xbf5c27 when given 0xbf27 as input, but then when the server parses the string, it sees 0xbf5c as a valid multibyte character followed by 0x27.

The workaround for this is to use the prepared statements API, and avoid this issue entirely.
[9 Feb 2005 13:47] John Percival
I think that this is probably quite a major issue. Take for example, the following SQL query:

SELECT * FROM user WHERE username = '¿\' OR userid = 1 -- '

That could easily be produced by:
$username="¿' OR userid = 1 -- ";
mysql_query("SELECT * FROM USER WHERE username = _latin1'" . mysql_real_escape_string($username) . "'");

What I'm saying is that this bug opens the door to SQL injection attacks, when running in GBK and some other multi-byte charsets.

I have posted this comment in bugs #8317 and #8378 as I think that it's pertinent to both: mysql_real_escape_string() needs to throw an error if it can't properly escape the string in the current charset; the parser needs to be able to use the _latin1 introducer to determine how it parses the following literal string.
[14 Feb 2005 14:37] Alexander Barkov
Jim, the fix looks fine. But It seems it can lose some tricky
combinations: if the first byte is a MB-head character,
but the second bite is  \0 or \r or any other character from
the following 'switch (*from) {...}', then the result will
not be correct. Will you fix it please?
Thank you!

I'm changing the status back to In-progress.
[16 Feb 2005 0:03] Jim Winstead
Fix for the problem that was pointed out: http://lists.mysql.com/internals/21971
[18 Feb 2005 0:01] Jim Winstead
Pushed, will be in 4.1.11.
[3 Mar 2005 17:24] Paul DuBois
Noted in 4.1.11 changelog.
[21 Apr 2006 5:11] xuefer tinys
$binary = "\xa3\x27"; and escape it correctly with gbk connection.

still cannot be escape, at least, in both 4.1.14 and 4.1.18

can mysql.com write a test case enuming+escaping 1/2/3/4 bytes binary in all connection character sets, to make sure all binary data works? it may take some time to complete the test.

i'd reopen this bug if i can.
[23 May 2006 17:22] Sergei Golubchik
Same problem in sjis, the character 圭 (0x8c5c)
[23 May 2006 17:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/6786
[23 May 2006 22:45] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/6802
[23 May 2006 22:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/33584
[25 May 2006 12:59] Sergei Golubchik
4.1.20, 5.0.22, 5.1.11
[25 May 2006 20:19] Paul DuBois
Cannot tell the nature of the problem or what was fixed.
Could you provide a sentence for the changelog entry, please?
Thanks.
[31 May 2006 20:09] Paul DuBois
Noted in 4.1.20, 5.0.22, 5.1.11 changelogs.

Security fix: An SQL-injection security hole has been found in
multi-byte encoding processing. The bug was in the server,
incorrectly parsing the string escaped with the
mysql_real_escape_string() C API function.

The changelogs also contain additional discussion of the
exploit and workarounds.
[1 Jun 2006 22:52] [ name withheld ]
Since PHP MySQL extensions don't read my.cnf, this security issue is still present in PHP, also if build against latest libmysql, right?

PHP extensions allways use the default character-set, compiled into libmysql, so you have no chance to make c-api function mysql_real_escape_string() character-set aware.
AFAIK ext/mysql and ext/pdo don't provide means to change the character-set used by PHP function mysql_real_escape_string(), respectively PDO::quote().

How could this be solved for PHP?
[4 Jun 2006 13:19] Dimitrij HIlt
Hi,

what's about 4.0? Is any fix for this production release avaible?

Dimi
[5 Jun 2006 12:01] Georg Richter
PHP 5's mysqli extension provides a function mysqli_set_charset, which fixes this problem for PHP.
[5 Jun 2006 12:02] Sergei Golubchik
4.0 is not affected
[31 Oct 2006 16:51] chuck gruber
Hello,

We have quite a large installed customer base with 4.1.14.  Outside of reinstalling with 4.1.20 or later would anybody know if simply disabling multibyte charsets would remedy this issue?  I am assuming this can be done by removing the offending characters sets from ../share/mysql/charsets/Index.xml.