Bug #34220 Bug in MySql.Data.MySqlClient.Statement TokenizeSql + Performance issue
Submitted: 1 Feb 2008 10:42 Modified: 1 Mar 2008 11:11
Reporter: Christos Pavlides Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / NET Severity:S3 (Non-critical)
Version:5.1.5 OS:Any
Assigned to: CPU Architecture:Any
Tags: MySQL Connector/Net

[1 Feb 2008 10:42] Christos Pavlides
Description:
While running some Performance Profilers on my project I found a performance issue in the TokenizeSql. When I tried to improve it I stumbled upon a possible bug.
First the performance issue:
Basically almost all statement to be executed pass through the Tokenize Sql function and as such even small enhancements can make a huge difference. The issue I found was that inside a very long loop (for every character in the statement) there is a very complex if statement which for most instances none of that cases evaluate to true. Therefore all the if conditions are evaluated all the time. I tried to short circuit the if statement by making cascading if statements. Although the code is much more complex now the propability is that only one or two conditions will be checked therefore this change will make the code take about a third of the time of the original.
Also the original code is using in a few places the Connection.ParameterMarker, which has some additional performance hits. Therefore I created a local veriable and set it to Connection.ParameterMarker before the loop so I can use the local variable through the loop execution.

Now the bug:
In the same file line 207:
else if ((c == '\'' || c == '\"' || c == '`') & !escaped & delim == Char.MinValue)

i believe the two & should be &&
so the new line is:
else if ((c == '\'' || c == '\"' || c == '`') && !escaped && delim == Char.MinValue)

How to repeat:
In order to understand the time it takes to Tokenize an sql statement a profiler must be run before and after the changes. Since this is a very processing intensive operation that is incurred on almost every request it should be easily identified.

Suggested fix:
The new function is:

public ArrayList TokenizeSql(string sql)
        {
            bool batch = Connection.Settings.AllowBatch & Driver.SupportsBatch;
            char paramMarker = Connection.ParameterMarker;
            char delim = Char.MinValue;
            StringBuilder sqlPart = new StringBuilder();
            bool escaped = false;
            ArrayList tokens = new ArrayList();

            sql = sql.TrimStart(';').TrimEnd(';');
            for (int i = 0; i < sql.Length; i++)
            {
                char c = sql[i];
                if (
                    (escaped || c == delim || c == '\\' ||
                    (!escaped && delim == Char.MinValue && ((c == '\'' || c == '\"' || c == '`') || c == paramMarker)) ||
                    (sqlPart.Length > 0 && sqlPart[0] == paramMarker && !Char.IsLetterOrDigit(c) && c != '_' && c != '.' && c != '$' && ((c != '@' && c != paramMarker) && (c != '?' && c != paramMarker))))
                   )
                {
                    if (escaped)
                        escaped = !escaped;
                    else if (c == delim)
                        delim = Char.MinValue;
                    else if (c == ';' && !escaped && delim == Char.MinValue && !batch)
                    {
                        tokens.Add(sqlPart.ToString());
                        tokens.Add(";");
                        sqlPart.Remove(0, sqlPart.Length);
                        continue;
                    }
                    else if ((c == '\'' || c == '\"' || c == '`') && !escaped && delim == Char.MinValue)
                        delim = c;
                    else if (c == '\\')
                        escaped = !escaped;
                    else if (c == paramMarker && delim == Char.MinValue && !escaped)
                    {
                        tokens.Add(sqlPart.ToString());
                        sqlPart.Remove(0, sqlPart.Length);
                    }
                    else if (sqlPart.Length > 0 && sqlPart[0] == paramMarker &&
                             !Char.IsLetterOrDigit(c) && c != '_' && c != '.' && c != '$' &&
                             ((c != '@' && c != paramMarker) &&
                              (c != '?' && c != paramMarker)))
                    {
                        tokens.Add(sqlPart.ToString());
                        sqlPart.Remove(0, sqlPart.Length);
                    }
                }
                sqlPart.Append(c);
            }
            tokens.Add(sqlPart.ToString());
            return tokens;
        }
[1 Feb 2008 10:46] Tonci Grgin
Hi Christos and thanks for your report. I will set it to "Verified" and see what c/NET team leader has to say.
[1 Feb 2008 18:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/41585
[1 Feb 2008 18:58] Reggie Burnett
Fixed in 5.1.5 and 5.2+

We accepted the && bug fix and the parameter marker improvement. The rest of the change is not accepted but other speed improvements will appear in future versions of the connector starting with 5.2
[1 Mar 2008 11:11] MC Brown
A note has been added tot he 5.1.5 and 5.2.0 changelogs: 

Some speed improvements have been implemented in the TokenizeSql process used to identify elements of SQL statements.