Description:
While running some Performance Profilers on my project I found a performance issue in the TokenizeSql. When I tried to improve it I stumbled upon a possible bug.
First the performance issue:
Basically almost all statement to be executed pass through the Tokenize Sql function and as such even small enhancements can make a huge difference. The issue I found was that inside a very long loop (for every character in the statement) there is a very complex if statement which for most instances none of that cases evaluate to true. Therefore all the if conditions are evaluated all the time. I tried to short circuit the if statement by making cascading if statements. Although the code is much more complex now the propability is that only one or two conditions will be checked therefore this change will make the code take about a third of the time of the original.
Also the original code is using in a few places the Connection.ParameterMarker, which has some additional performance hits. Therefore I created a local veriable and set it to Connection.ParameterMarker before the loop so I can use the local variable through the loop execution.
Now the bug:
In the same file line 207:
else if ((c == '\'' || c == '\"' || c == '`') & !escaped & delim == Char.MinValue)
i believe the two & should be &&
so the new line is:
else if ((c == '\'' || c == '\"' || c == '`') && !escaped && delim == Char.MinValue)
How to repeat:
In order to understand the time it takes to Tokenize an sql statement a profiler must be run before and after the changes. Since this is a very processing intensive operation that is incurred on almost every request it should be easily identified.
Suggested fix:
The new function is:
public ArrayList TokenizeSql(string sql)
{
bool batch = Connection.Settings.AllowBatch & Driver.SupportsBatch;
char paramMarker = Connection.ParameterMarker;
char delim = Char.MinValue;
StringBuilder sqlPart = new StringBuilder();
bool escaped = false;
ArrayList tokens = new ArrayList();
sql = sql.TrimStart(';').TrimEnd(';');
for (int i = 0; i < sql.Length; i++)
{
char c = sql[i];
if (
(escaped || c == delim || c == '\\' ||
(!escaped && delim == Char.MinValue && ((c == '\'' || c == '\"' || c == '`') || c == paramMarker)) ||
(sqlPart.Length > 0 && sqlPart[0] == paramMarker && !Char.IsLetterOrDigit(c) && c != '_' && c != '.' && c != '$' && ((c != '@' && c != paramMarker) && (c != '?' && c != paramMarker))))
)
{
if (escaped)
escaped = !escaped;
else if (c == delim)
delim = Char.MinValue;
else if (c == ';' && !escaped && delim == Char.MinValue && !batch)
{
tokens.Add(sqlPart.ToString());
tokens.Add(";");
sqlPart.Remove(0, sqlPart.Length);
continue;
}
else if ((c == '\'' || c == '\"' || c == '`') && !escaped && delim == Char.MinValue)
delim = c;
else if (c == '\\')
escaped = !escaped;
else if (c == paramMarker && delim == Char.MinValue && !escaped)
{
tokens.Add(sqlPart.ToString());
sqlPart.Remove(0, sqlPart.Length);
}
else if (sqlPart.Length > 0 && sqlPart[0] == paramMarker &&
!Char.IsLetterOrDigit(c) && c != '_' && c != '.' && c != '$' &&
((c != '@' && c != paramMarker) &&
(c != '?' && c != paramMarker)))
{
tokens.Add(sqlPart.ToString());
sqlPart.Remove(0, sqlPart.Length);
}
}
sqlPart.Append(c);
}
tokens.Add(sqlPart.ToString());
return tokens;
}