Bug #8303 String literals with multi-byte characters containing \ are lexed incorrectly
Submitted: 3 Feb 2005 23:39 Modified: 12 Jun 2006 17:18
Reporter: Jim Winstead Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.x OS:Any (Any)
Assigned to: Sergei Golubchik CPU Architecture:Any

[3 Feb 2005 23:39] Jim Winstead
Description:
If a string literal contains a backquote (\) followed by a multi-byte character whose second byte is 0x5c (ASCII for \), the string literal is handled incorrectly by the get_text() function of the lexer.

How to repeat:
Basically:

<literal> = UNHEX('8DB2939181408C5C')

SET NAMES sjis;
SELECT '<literal>' FROM DUAL;

Suggested fix:
When a backslash is encountered in get_text(), it needs to skip (or copy) the next character, not just the next byte.
[16 Feb 2005 9:42] Alexander Barkov
Perhaps it would be better to move str++ from here:
for (to=start ; str != end ; str++)
into the loop body in several places, to avoid doing str--.
It will make the function more readable and a  bit more
efficient. But I think it's ok to push as is.

In the future we should think whether to move this get_text()
function into CHARSET_INFO structure: not all multibyte
charsets can have slash character as a MB-char part, and get_text()
it can be simplified for them.
[18 Feb 2005 0:03] Jim Winstead
Pushed, will be in 4.1.11.
[2 Mar 2005 17:55] Paul DuBois
Noted in 4.1.11 changelog.
[3 Jun 2006 7:46] Sergei Golubchik
The patch broke the fix for Bug#8378 and was undone in 4.1.20, 5.0.22, 5.1.11
[12 Jun 2006 17:18] Paul DuBois
Noted in 4.1.20, 5.0.22, 5.1.11 changelogs.