Bug #34220 Bug in MySql.Data.MySqlClient.Statement TokenizeSql + Performance issue
Submitted: 1 Feb 2008 11:42 Modified: 1 Mar 2008 12:11
Reporter: Christos Pavlides
Status: Closed
Category:Connector/Net Severity:S3 (Non-critical)
Version:5.1.5 OS:Any
Assigned to: Target Version:
Tags: MySQL Connector/Net
Triage: D4 (Minor)

[1 Feb 2008 11:42] Christos Pavlides
Description:
While running some Performance Profilers on my project I found a performance issue in the
TokenizeSql. When I tried to improve it I stumbled upon a possible bug.
First the performance issue:
Basically almost all statement to be executed pass through the Tokenize Sql function and
as such even small enhancements can make a huge difference. The issue I found was that
inside a very long loop (for every character in the statement) there is a very complex if
statement which for most instances none of that cases evaluate to true. Therefore all the
if conditions are evaluated all the time. I tried to short circuit the if statement by
making cascading if statements. Although the code is much more complex now the
propability is that only one or two conditions will be checked therefore this change will
make the code take about a third of the time of the original.
Also the original code is using in a few places the Connection.ParameterMarker, which has
some additional performance hits. Therefore I created a local veriable and set it to
Connection.ParameterMarker before the loop so I can use the local variable through the
loop execution.

Now the bug:
In the same file line 207:
else if ((c == '\'' || c == '\"' || c == '`') & !escaped & delim == Char.MinValue)

i believe the two & should be &&
so the new line is:
else if ((c == '\'' || c == '\"' || c == '`') && !escaped && delim == Char.MinValue)

How to repeat:
In order to understand the time it takes to Tokenize an sql statement a profiler must be
run before and after the changes. Since this is a very processing intensive operation
that is incurred on almost every request it should be easily identified.

Suggested fix:
The new function is:

public ArrayList TokenizeSql(string sql)
        {
            bool batch = Connection.Settings.AllowBatch & Driver.SupportsBatch;
            char paramMarker = Connection.ParameterMarker;
            char delim = Char.MinValue;
            StringBuilder sqlPart = new StringBuilder();
            bool escaped = false;
            ArrayList tokens = new ArrayList();

            sql = sql.TrimStart(';').TrimEnd(';');
            for (int i = 0; i < sql.Length; i++)
            {
                char c = sql[i];
                if (
                    (escaped || c == delim || c == '\\' ||
                    (!escaped && delim == Char.MinValue && ((c == '\'' || c == '\"' || c
== '`') || c == paramMarker)) ||
                    (sqlPart.Length > 0 && sqlPart[0] == paramMarker &&
!Char.IsLetterOrDigit(c) && c != '_' && c != '.' && c != '$' && ((c != '@' && c !=
paramMarker) && (c != '?' && c != paramMarker))))
                   )
                {
                    if (escaped)
                        escaped = !escaped;
                    else if (c == delim)
                        delim = Char.MinValue;
                    else if (c == ';' && !escaped && delim == Char.MinValue && !batch)
                    {
                        tokens.Add(sqlPart.ToString());
                        tokens.Add(";");
                        sqlPart.Remove(0, sqlPart.Length);
                        continue;
                    }
                    else if ((c == '\'' || c == '\"' || c == '`') && !escaped && delim ==
Char.MinValue)
                        delim = c;
                    else if (c == '\\')
                        escaped = !escaped;
                    else if (c == paramMarker && delim == Char.MinValue && !escaped)
                    {
                        tokens.Add(sqlPart.ToString());
                        sqlPart.Remove(0, sqlPart.Length);
                    }
                    else if (sqlPart.Length > 0 && sqlPart[0] == paramMarker &&
                             !Char.IsLetterOrDigit(c) && c != '_' && c != '.' && c != '$'
&&
                             ((c != '@' && c != paramMarker) &&
                              (c != '?' && c != paramMarker)))
                    {
                        tokens.Add(sqlPart.ToString());
                        sqlPart.Remove(0, sqlPart.Length);
                    }
                }
                sqlPart.Append(c);
            }
            tokens.Add(sqlPart.ToString());
            return tokens;
        }
[1 Feb 2008 11:46] Tonci Grgin
Hi Christos and thanks for your report. I will set it to "Verified" and see what c/NET
team leader has to say.
[1 Feb 2008 19:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/41585
[1 Feb 2008 19:58] Reggie Burnett
Fixed in 5.1.5 and 5.2+

We accepted the && bug fix and the parameter marker improvement. The rest of the change
is not accepted but other speed improvements will appear in future versions of the
connector starting with 5.2
[1 Mar 2008 12:11] MC Brown
A note has been added tot he 5.1.5 and 5.2.0 changelogs: 

Some speed improvements have been implemented in the TokenizeSql process used to identify
elements of SQL statements.