Bug #64934 LINQ to SQL's Contains() generates many "ORs" instead of the faster "IN"
Submitted: 10 Apr 2012 22:32 Modified: 19 Jul 2012 22:44
Reporter: Alex K. Email Updates:
Status: Closed Impact on me:
Category:Connector / NET Severity:S3 (Non-critical)
Version:6.5.4 OS:Microsoft Windows (7 x64)
Assigned to: Fernando Gonzalez.Sanchez CPU Architecture:Any
Tags: .net Connector, C#, contains, EF, entity framework, IN, linq, list, OR, performance

[10 Apr 2012 22:32] Alex K.
When using the LINQ to SQL data provider for MySQL from the .NET Connector there is a potential bottleneck.

When trying to generate a query of the form "SELECT * FROM Users WHERE UserID IN(1, 2, 3, 4, 5);" in LINQ the result becomes "SELECT * FROM Users WHERE UserID=1 OR UserID=2 OR UserID=3 OR UserID=4 OR UserID=5;" instead.

I benchmarked both SQL query directly in mysql to make sure. I consistenly found IN() to be much faster than using OR.

I ran a loop of 1000x the equivalent queries (for consistency, I used sql_no_cache):
IN - 2.34969592094s
OR - 5.83781504631s

How to repeat:
ICollection<int> bigList = new List<int>();
for(int i=0; i<100; ++i)

var results = (from user in DB.Users where bigList.Contains(user.ID) select user.LastName).ToList();

This generates the following SQL statement by the .NET Connector:
SELECT `Extent1`.`LastName` FROM `Users` AS `Extent1` WHERE (0 = `Extent1`.`ID`) OR (1 = `Extent1`.`ID`) OR (2 = `Extent1`.`ID`) OR (3 = `Extent1`.`ID`)[...truncated to clarity's sake...]  OR (99 = `Extent1`.`ID`)

However, the default data provider for Microsoft's MSSQL 2008 generates the (much faster) SQL:
SELECT [Extent1.LastName] FROM [Users] AS [Extent1] WHERE [Extent1.ID] IN (0, 1, 2, 3, [...truncated...], 99)

Suggested fix:
As writte in the Description - "IN" is a lot faster than doing so many ORs. Furthermore it makes more sense to convert LINQ's "bigList.Contains(user.ID)" to "Users.ID IN (...)" than converting the list to multiple ORs.

If I wanted so many ORs, I would've used the "||" in LINQ ;)
[19 Jul 2012 22:44] John Russell
Added to changelog for 6.5.5: 

The LINQ to SQL data provider for MySQL was generating inefficient
code for the Contains() method, producing a query with multiple OR
clauses instead of the more efficient IN clause.