Bug #369 USE_MB should not be used when escape/unescape string
Submitted: 5 May 2003 2:31 Modified: 29 May 2003 2:40
Reporter: xuefer tinys Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:mysql4.0.12 OS:not realative
Assigned to: Alexander Barkov CPU Architecture:Any

[5 May 2003 2:31] xuefer tinys
Description:
notice that USE_MB(it exists in older version, all mysql 4.x) 
it use connection CHARSET to escape string differently 
but it is said mysql-4.1 can have different charset on column,table,database 
how can the client know which table it's querying on and escape properly?

i guess this is why it failed to insert BLOB data to GBK mysql servers. 
php addslashes()/stripslashes() don't care about charset, and works ok on all strings!
so why the mysql do care charset when escaping string?

so do anyone use multibyte mysql server? 
do mysql developers tested insert blob data under GBK mysql server? 

quoted from mysql-4.1/libmysql/libmysql.c 
============= 
static ulong 
mysql_sub_escape_string(CHARSET_INFO *charset_info, char *to, 
const char *from, ulong length) 
{ 
const char *to_start=to; 
const char *end; 
#ifdef USE_MB 
my_bool use_mb_flag=use_mb(charset_info); 
#endif 
for (end=from+length; from != end ; from++) 
{ 
#ifdef USE_MB 
   int l; 
   if (use_mb_flag && (l = my_ismbchar(charset_info, from, end))) 
   { 
     while (l--) 
  *to++ = *from++; 
     from--; 
     continue; 
   } 
#endif 
============= 
quoted from mysql-4.1/sql/sql_parse.cc
============= 
static void remove_escape(char *name)
{
  if (!*name)					// For empty DB names
    return;
  char *to;
#ifdef USE_MB
  char *strend=name+(uint) strlen(name);
#endif
  for (to=name; *name ; name++)
  {
#ifdef USE_MB
    int l;
/*    if ((l = ismbchar(name, name+MBMAXLEN))) { Wei He: I think it's wrong */
    if (use_mb(system_charset_info) &&
        (l = my_ismbchar(system_charset_info, name, strend)))
    {
	while (l--)
	    *to++ = *name++;
	name--;
	continue;
    }
#endif
    if (*name == '\\' && name[1])
      name++;					// Skip '\\'
    *to++= *name;
  }
  *to=0;
}
============= 

How to repeat:
<?php
$user = "";
$pass = "";
error_reporting(E_ALL);
$conn = mysql_connect('localhost', $user, $pass) or die('cant connect');

test(chr(200), $conn);
test(chr(200) . '"', $conn);
test(chr(200) . "'", $conn);
// chr(200) just to hit my_ismbhead() \' is 2nd char of GBK
// for any other charset, use other char to test instead
test(chr(200) . "\\'", $conn);

function test($str, &$conn)
{
    echo "<br>========<br>";
    dump_str('string', $str);
    $q_str = mysql_escape_string($str); // you may also try mysql_real_escape_string() (php cvs only)
    dump_str('escaped', $q_str);
    $res = mysql_query("SELECT '$q_str'"); // such a simple query will be failed?! inacceptable!
    if (!$res) {
        print('<font color=red>cant query</font>, error:#'
		. mysql_errno()
		. ' '
		. mysql_error());
		return;
	}
    $row = mysql_fetch_row($res) or die('empty result');
    dump_str('result', $row[0]);
    echo $row[0] === $str ? "* fine<br>":"*** damn!<br>";
}
function dump_str($name, $str)
{
    echo "$name -&gt; $str (";
    for ($i = 0; $i < strlen($str); $i ++)
    {
            echo ' [ ' , $str{$i}, ' ] ';
    }
    echo ")<br>";
}

?>

Suggested fix:
remove those USE_MB which used in escape/unescape functions
but it will be a BC breaker :X
[5 May 2003 2:42] xuefer tinys
note that the above script run on mysql server with GBK charset
php script, run under both win32(binary package php) and linux(source package php, with source package of mysql which enabled GBK support with the server)
it was several months ago i found this problem. problem still until i found this bug report system now.

and i guess isn't so much servers currently running in GBK charset
and it may not be a serious BC breaker, just recompile clients with new server protocol will be ok

this problem should exists in other charset that make use of USE_MB
hope this fix soon
[29 May 2003 2:40] Michael Widenius
The escaping must be done differently for different charactersets because
for some multi-byte character sets \ or ' may be part of a multi-byte character and should not be escaped.

The function mysql_real_escape_string() takes care of escaping the string properly according to the current character set and is the recommended way to escape a string.

In MySQL 4.1 we will provide a better mysql_real_escape_string() that should be able to handle all character sets.  Other ways to solve this problem in 4.1 are:

- You can prefix a string with _N"" to tell the server that it's using
  latin1, in which case the addslashes() should work correctly.
- You can specify the character set used for the strings in your SQL
  statement. (SET CHARACTER SET)
- We plan to provide a sql mode where \ doesn't have to be escaped in
  strings (Good for compatibility purposes)
- You can use prepared statements to insert strings (in which case you
  don't have to quote anything)

For 4.0 the only solution for people using a MB character set is to the quoting multi-byte safe either with mysql_real_escape_string() or with an own function.
[29 May 2003 3:32] xuefer tinys
i don't see any point that make escaping have to care about charset
unless one escape string by hand, he can not see the \ and " in multi-byte char

but is there anywhere already discussed this problem?
and info about the new SQL mode?

thx