Bug #51313 Escape processing is confused by multiple backslashes.
Submitted: 19 Feb 2010 9:46 Modified: 25 Nov 2013 16:34
Reporter: Stephane Giron Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / J Severity:S2 (Serious)
Version:5.1.6 OS:Any
Assigned to: Filipe Silva CPU Architecture:Any

[19 Feb 2010 9:46] Stephane Giron
Description:
Hi

When inserting strings, it is possible to get "Not a valid escape sequence" error messages.
This occurs when there are several backslashes in a row, preceding a string marker (' for example).

EscapeTokenizer.java only handles this case when the last two characters are backslashes, but does not expect that they can be more than 2 of these.

How to repeat:
Here are two examples of strings that would fail to insert :

stmt = conn.createStatement();

stmt.executeUpdate("insert into test values (50, 'abcd \\\\\\' abcd {abcd \\\\\\' abcd { \\\\\\' abcd \\\\\\' } } abcd ')");

stmt.executeUpdate("insert into test values (56, 'This is \\\\\\\' { going to -- break escape } processing ')");

Suggested fix:
EscapeTokenizer should probably count the number of preceding backslashes instead of checking only for the 2 last characters.

I tested the following code successfully :

        public synchronized String nextToken() {
                StringBuffer tokenBuf = new StringBuffer();

                if (this.emittingEscapeCode) {
                        tokenBuf.append("{"); //$NON-NLS-1$
                        this.emittingEscapeCode = false;
                }
                int precedingBackslash = 0;
                for (; this.pos < this.sourceLength; this.pos++) {
                        char c = this.source.charAt(this.pos);

                        // Detect variable usage

                        if (!this.inQuotes && c == '@') {
                                this.sawVariableUse = true;
                        }
                        
                        if(c =='\\') {
                                precedingBackslash++;
                                tokenBuf.append(c);
                        }
                        else {
                                if (c == '\'' || c == '"') {
                                        if (this.inQuotes && c == quoteChar) {
                                                if (this.pos + 1 < this.sourceLength) {
                                                        if (this.source.charAt(this.pos + 1) == quoteChar) {
                                                                // Doubled-up quote escape
                                                                tokenBuf.append(quoteChar);
                                                                tokenBuf.append(quoteChar);
                                                                this.pos++;
                                                                continue;
                                                        }
                                                }
                                        }
                                        //if (this.lastChar != '\\') {
                                        if(precedingBackslash == 0) {
                                                if (this.inQuotes) {
                                                        if (this.quoteChar == c) {
                                                                this.inQuotes = false;
                                                        }
                                                } else {
                                                        this.inQuotes = true;
                                                        this.quoteChar = c;
                                                }
                                        //} else if (this.lastLastChar == '\\') {
                                        } else if (precedingBackslash % 2 == 0 ) {
                                        // Number of preceding backslash is even, which means that this quoting character is NOT escaped 
                                                if (this.inQuotes) {
                                                        if (this.quoteChar == c) {
                                                                this.inQuotes = false;
                                                        }
                                                } else {
                                                        this.inQuotes = true;
                                                        this.quoteChar = c;
                                                }
                                        }

                                        tokenBuf.append(c);
                                } else if (c == '-') {
                                        if ((this.lastChar == '-')
                                                //      && ((precedingBackslash % 2 > 0) && !this.inQuotes)) {
                                                && ((this.lastLastChar != '\\') && !this.inQuotes)) {
                                                this.inComment = true;
                                        }
        
                                        tokenBuf.append(c);
                                } else if ((c == '\n') || (c == '\r')) {
                                        this.inComment = false;

                                        tokenBuf.append(c);
                                } else if (c == '{') {
                                        if (this.inQuotes || this.inComment) {
                                                tokenBuf.append(c);
                                        } else {
                                                this.bracesLevel++;

                                                if (this.bracesLevel == 1) {
                                                        this.pos++;
                                                        this.emittingEscapeCode = true;
        
                                                        return tokenBuf.toString();
                                                }

                                                tokenBuf.append(c);
                                        }
                                } else if (c == '}') {
                                        tokenBuf.append(c);

                                        if (!this.inQuotes && !this.inComment) {
                                                this.lastChar = c;

                                                this.bracesLevel--;

                                                if (this.bracesLevel == 0) {
                                                        this.pos++;

                                                        return tokenBuf.toString();
                                                }
                                        }
                                } else {
                                        tokenBuf.append(c);
                                }
                                precedingBackslash = 0;
                        }
                        this.lastLastChar = this.lastChar;
                        this.lastChar = c;
                }

                return tokenBuf.toString();
        }
[25 Feb 2010 14:07] Stephane Giron
Hi,

Here is another example that confuse the escape tokenizer.
It is not directly related to this bug, but I feel this can be considered as part of it. Anyway, please let me know if you prefer that I open a new bug.

Here is the example :

stmt.executeUpdate("insert into testData values (56, 'This is \\'', '{ going to break escape -- } processing ')");

In that last case, the tokenizer is confused by the '' (double quote)... the current implementation does not check if the first quote is escaped or not...

From my previous suggested fix, this would require the following change :

                    if (this.inQuotes && c == quoteChar) {
                        if (this.pos + 1 < this.sourceLength && precedingBackslash % 2 == 0) {
                            // This should not happen if current character is escaped
                            if (this.source.charAt(this.pos + 1) == quoteChar) {
                                // Doubled-up quote escape
                                tokenBuf.append(quoteChar);
                                tokenBuf.append(quoteChar);
                                this.pos++;
                                continue;
                            }
                        }
                    }

A double quote should not be considered as a double quote if the first quote is escaped (if it is preceded by an odd number of backslashes). Instead, this should be considered as an escaped quote (within the string) followed by the string ending quote.

Cheers
Stephane
[25 Feb 2010 14:12] Tonci Grgin
Stephane, I understand :-)

Please tell me your MysQL server SQL_MODE. Does it include "no backslashes escape"?
[25 Feb 2010 14:23] Stephane Giron
Hi Tonci,

Thanks for your reply.
Here is my server sql_mode :

mysql> show variables like '%sql_mode%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| sql_mode      |       |
+---------------+-------+
1 row in set (0.00 sec)

The thing is in that case that I can insert these strings using mysql client. The issue just occurs when I try to insert them using connector/j.
Please let me know if you need other details (I can provide a whole test instead of just the failing statements, if you want, and also the connector/j jar file that I built in order to check my fixes).

Cheers

Stephane
[25 Feb 2010 14:28] Tonci Grgin
Thanks for info provided Stephane. One more thing, although not mandatory, can you try with snapshot build (5.1.13 I think).
[25 Feb 2010 18:42] Stephane Giron
Hi Tonci,

I built against head of branch 5.1 (from launchpad).
It seems it behaves a bit better. However, this is still possible to get errors, mainly due to the fact that backslashes are not properly counted. Only the last two \ are taken into account.

If you want to insert using the following :
insert into testData values (57, 'This is \\\'''', '{ going to break escape -- } processing ')
This works using mysql client.

Using jdbc, it would translate into :
stmt.executeUpdate("insert into testData values (57, 'This is \\\\\\'''', '{ going to break escape -- } processing ')");

And this gives this error message :
Not a valid escape sequence: { going to break escape -- } processing ')

This comes from the fact that some quote characters are interpreted as string markers whereas they actually are inside the string itself.

Cheers

Stephane
[4 Mar 2010 14:10] Tonci Grgin
Stephane, now I'm puzzled:

If you want to insert using the following :
insert into testData values (57, 'This is \\\'''', '{ going to break escape -- }
processing ')
This works using mysql client.

Using jdbc, it would translate into :
stmt.executeUpdate("insert into testData values (57, 'This is \\\\\\'''', '{ going to
break escape -- } processing ')");

mysql> SELECT 57, 'This is \\\'''', '{ going to break escape -- } processing ';
+----+-------------+------------------------------------------+
| 57 | This is \'' | { going to break escape -- } processing  |
+----+-------------+------------------------------------------+
| 57 | This is \'' | { going to break escape -- } processing  |
+----+-------------+------------------------------------------+
1 row in set (0.00 sec)

Now, I know from Bug#25399 that EscapeProcessor gets confused by multiple backslashes so we now push the responsibility of syntax errors back on to the server for most escape sequences.

So, can you please elaborate the sentence "Works in cl client" as it does not in mine, obviously neither of your examples. My SQL_MODE is also empty.
[4 Mar 2010 14:28] Stephane Giron
Hi Tonci,

I am not sure that I fully understand what you mean.

Here is what I get using mysql client :

Server version: 5.1.39-log MySQL Community Server (GPL)

mysql> insert into testData values (57, 'This is \\\'''', '{ going to break escape -- }processing ');
Query OK, 1 row affected (0.00 sec)

mysql> select * from testData;
+----+-------------+-----------------------------------------+
| i  | val         | val2                                    |
+----+-------------+-----------------------------------------+
| 57 | This is \'' | { going to break escape -- }processing  |
+----+-------------+-----------------------------------------+
1 row in set (0.00 sec)

No syntax error. It just inserts.
The problem stands when I try to insert the same using JDBC when escape processing is on.

Please let me know if that answers your question.

Cheers
Stephane
[4 Mar 2010 14:33] Stephane Giron
Other examples :

mysql> insert into test values (56, 'This is \\\' { going to -- break escape } processing ');
Query OK, 1 row affected (0.00 sec)

mysql> insert into test values (50, 'abcd \\\' abcd {abcd \\\' abcd {\\\' abcd \\\' } } abcd ');
Query OK, 1 row affected (0.00 sec)

mysql> select * from test;
+------+-----------------------------------------------------+
| i    | v                                                   |
+------+-----------------------------------------------------+
|   56 | This is \' { going to -- break escape } processing  |
|   50 | abcd \' abcd {abcd \' abcd {\' abcd \' } } abcd     |
+------+-----------------------------------------------------+
2 rows in set (0.00 sec)

Cheers
Stephane
[4 Mar 2010 14:33] Tonci Grgin
Stephane, I thought you were able to get 'This is \\\'''', '{ going to break escape -- }processing ' in one field in cl client.
[4 Mar 2010 14:38] Stephane Giron
Tonci,

Sorry about that.
My third example was using a 3 columns table instead, whereas previous 2 only used a 2 columns table.

Please let me know if you need more details or have other questions. ;)

Cheers
Stephane
[5 Mar 2010 8:22] Tonci Grgin
Stephane, sorry, I am unable to make c/J fail where cl client succeeds...

So:
mysql> drop table if exists bug51313;
Query OK, 0 rows affected (0.00 sec)

mysql> create table bug51313 (Id INT UNSIGNED NOT NULL PRIMARY KEY,
    ->  VCh1 VARCHAR(50),
    ->  VCh2 VARCHAR(50));
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO bug51313 VALUES (56, 'This is \\\'', '{ going to break escape
 -- } processing');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO bug51313 VALUES (58, 'This is \\\'''', '{ going to break esca
pe -- } processing');
Query OK, 1 row affected (0.00 sec)

mysql> insert into testData values (60, 'This is \\\\\\'''', '{ going to break e
scape -- } processing ';
    -> ;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use near '{ goi
ng to break escape' at line 1

Java code is rather simple:
assertEquals(1, this.stmt.executeUpdate("INSERT INTO bug51313 VALUES (50, 'This is \\\'''', '{ going to break escape -- } processing ')"));

From what I see, there is no bug here as \\ means just a \ character. So \\\ will work, while \\\\\\ will not, anywhere.
This is in accordance with Mark's statement that server does the parsing.
[5 Mar 2010 9:12] Stephane Giron
Hi Tonci,

I think that your java code is not equivalent to the mysql client request you are running. You need to get the same result from java code as from mysql client.

If you want to insert the same from java as :

mysql> INSERT INTO bug51313 VALUES (58, 'This is \\\'''', '{ going to
break escape -- } processing');
Query OK, 1 row affected (0.00 sec)

you have to run the following java code

stmt.executeUpdate("insert into bug51313 values (59, 'This is \\\\\\'''', '{ going to break escape -- } processing ')");

As it will fail (with "SQLException: Not a valid escape sequence: { going to break escape -- } processing ')"), just disable escape processing (stmt.setEscapeProcessing(false)) before executing the update statement and you will get the following result :

mysql> select * from bug51313;
+----+-------------+------------------------------------------+
| Id | VCh1        | VCh2                                     |
+----+-------------+------------------------------------------+
| 58 | This is \'' | { going to break escape -- } processing  |
| 59 | This is \'' | { going to break escape -- } processing  |
+----+-------------+------------------------------------------+
2 rows in set (0.00 sec)

With your java statement, this is what is inserted :

stmt.executeUpdate("INSERT INTO bug51313 VALUES(60, 'This is \\\'''', '{ going to break escape -- } processing ')");

select * from bug51313;
+----+-------------+------------------------------------------+
| Id | VCh1        | VCh2                                     |
+----+-------------+------------------------------------------+
| 58 | This is \'' | { going to break escape -- } processing  | 
| 59 | This is \'' | { going to break escape -- } processing  | 
| 60 | This is ''  | { going to break escape -- } processing  | 
+----+-------------+------------------------------------------+
3 rows in set (0.00 sec)

This is not the same inserted string...

Setting EscapeProcessing to false is just a workaround, as there are cases where you might need it. As I said, in that case, problem is that the Escape Tokenizer falsely detect escape sequences (there are none that should be detected here, as this should just insert strings into the database).

Cheers

Stephane
[8 Mar 2010 14:57] Tonci Grgin
Verified as described thanks to Mark.
[8 Apr 2010 7:36] Tonci Grgin
Escalated to Mark.
[20 May 2010 9:35] Tonci Grgin
Bug#53672 was marked as duplicate of this report.

Todd?
[25 Nov 2013 16:34] Daniel So
Added the following entry to the Connector/J 5.1.28 changelog:

"Escape processing for strings is confused by multiple backslashes. The EscapeTokenizer of Connector/J has been re-factored to process multiple backslashes properly."