Description:
When using the LINQ to SQL data provider for MySQL from the .NET Connector there is a potential bottleneck.
When trying to generate a query of the form "SELECT * FROM Users WHERE UserID IN(1, 2, 3, 4, 5);" in LINQ the result becomes "SELECT * FROM Users WHERE UserID=1 OR UserID=2 OR UserID=3 OR UserID=4 OR UserID=5;" instead.
I benchmarked both SQL query directly in mysql to make sure. I consistenly found IN() to be much faster than using OR.
I ran a loop of 1000x the equivalent queries (for consistency, I used sql_no_cache):
IN - 2.34969592094s
OR - 5.83781504631s
How to repeat:
ICollection<int> bigList = new List<int>();
for(int i=0; i<100; ++i)
bigList.Add(i);
var results = (from user in DB.Users where bigList.Contains(user.ID) select user.LastName).ToList();
This generates the following SQL statement by the .NET Connector:
SELECT `Extent1`.`LastName` FROM `Users` AS `Extent1` WHERE (0 = `Extent1`.`ID`) OR (1 = `Extent1`.`ID`) OR (2 = `Extent1`.`ID`) OR (3 = `Extent1`.`ID`)[...truncated to clarity's sake...] OR (99 = `Extent1`.`ID`)
However, the default data provider for Microsoft's MSSQL 2008 generates the (much faster) SQL:
SELECT [Extent1.LastName] FROM [Users] AS [Extent1] WHERE [Extent1.ID] IN (0, 1, 2, 3, [...truncated...], 99)
Suggested fix:
As writte in the Description - "IN" is a lot faster than doing so many ORs. Furthermore it makes more sense to convert LINQ's "bigList.Contains(user.ID)" to "Users.ID IN (...)" than converting the list to multiple ORs.
If I wanted so many ORs, I would've used the "||" in LINQ ;)