Bug #8303 String literals with multi-byte characters containing \ are lexed incorrectly
Submitted: 4 Feb 2005 0:39 Modified: 12 Jun 2006 19:18
Reporter: Jim Winstead
Status: Closed
Category:Server Severity:S2 (Serious)
Version:4.1.x OS:Any (Any)
Assigned to: Sergei Golubchik Target Version:

[4 Feb 2005 0:39] Jim Winstead
Description:
If a string literal contains a backquote (\) followed by a multi-byte character whose
second byte is 0x5c (ASCII for \), the string literal is handled incorrectly by the
get_text() function of the lexer.

How to repeat:
Basically:

<literal> = UNHEX('8DB2939181408C5C')

SET NAMES sjis;
SELECT '<literal>' FROM DUAL;

Suggested fix:
When a backslash is encountered in get_text(), it needs to skip (or copy) the next
character, not just the next byte.
[16 Feb 2005 10:42] Alexander Barkov
Perhaps it would be better to move str++ from here:
for (to=start ; str != end ; str++)
into the loop body in several places, to avoid doing str--.
It will make the function more readable and a  bit more
efficient. But I think it's ok to push as is.

In the future we should think whether to move this get_text()
function into CHARSET_INFO structure: not all multibyte
charsets can have slash character as a MB-char part, and get_text()
it can be simplified for them.
[18 Feb 2005 1:03] Jim Winstead
Pushed, will be in 4.1.11.
[2 Mar 2005 18:55] Paul DuBois
Noted in 4.1.11 changelog.
[3 Jun 2006 9:46] Sergei Golubchik
The patch broke the fix for Bug#8378 and was undone in 4.1.20, 5.0.22, 5.1.11
[12 Jun 2006 19:18] Paul DuBois
Noted in 4.1.20, 5.0.22, 5.1.11 changelogs.