Bug #75272 | Poor performance with Entity Framework Include Queries and Order By | ||
---|---|---|---|
Submitted: | 19 Dec 2014 8:34 | Modified: | 7 Jul 2016 15:40 |
Reporter: | Jürgen Steinblock | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | Connector / NET | Severity: | S5 (Performance) |
Version: | 6.9.5.0 | OS: | Windows |
Assigned to: | CPU Architecture: | Any | |
Tags: | entity-framework, MySql.Data.Entity |
[19 Dec 2014 8:34]
Jürgen Steinblock
[19 Dec 2014 8:56]
Jürgen Steinblock
Just for update. This sql will be generated by Entity Framework for Sql Server Compact (and I suppose for SQL Server in general). It does not include an outer query and if I think about it, the outer query is unnecessary for mysql, too. SELECT TOP (10) [Extent1].[id] AS [id], [Extent1].[name] AS [name], [Extent1].[category_id] AS [category_id], [Extent2].[id] AS [id1], [Extent2].[name] AS [name1], FROM [products] AS [Extent1] LEFT OUTER JOIN [categories] AS [Extent2] ON [Extent1].[category_id] = [Extent2].[id] ORDER BY [Extent1].[id] ASC
[29 Dec 2014 5:35]
Chiranjeevi Battula
Hello Jürgen Steinblock, Thank you for the bug report. I tried to reproduce the issue at my end using Visual Studio 2013 (C#.Net), MySQL Connector/Net 6.9.5 and EF Version : MySql.Data.Entity.EF6 (6.9.5.0) but Couldn't trace out any issue in performance. Could you please list out the steps you tried out at your end? Please provide repeatable test case in order for us to confirm this issue at our end. Thanks, Chiranjeevi.
[29 Dec 2014 5:35]
Chiranjeevi Battula
Testing output Test Name: TestMethod1 Test Outcome: Passed Result StandardOutput: SELECT `Extent1`.`id`, `Extent1`.`name`, `Extent1`.`category_id` FROM `products` AS `Extent1` ORDER BY `Extent1`.`id` ASC LIMIT 10 -- Executing at 12/26/2014 12:43:05 PM +05:30 -- Completed in 0 ms with result: EFMySqlDataReader ****************************************** SELECT `Project1`.`C1`, `Project1`.`id`, `Project1`.`name`, `Project1`.`category_id`, `Project1`.`id1`, `Project1`.`name1` FROM (SELECT `Extent1`.`id`, `Extent1`.`name`, `Extent1`.`category_id`, `Extent2`.`id` AS `id1`, `Extent2`.`name` AS `name1`, 1 AS `C1` FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`) AS `Project1` ORDER BY `Project1`.`id` ASC LIMIT 10
[29 Dec 2014 7:24]
Jürgen Steinblock
Hello Chiranjeevi, first of all your testing output lacks the interesing part. For the first query you see this line: Completed in 0 ms with result: EFMySqlDataReader but for the second one (the one with the poor performance) the result is missing in the output you provided. My test case contained a `Seed()` method which would insert some data if the database is empty. If you don't experience performance problems you could increse the max value in the for loop to mabybe one million, delete all rows from products and start again. Also my test case was not designed to fail but just dumps the queries and execution times to console (or test output). But, please look at the EXPLAIN statement for the second query 1, 'PRIMARY', '<derived2>', 'ALL', '', '', '', '', 100000, 'Using filesort' 2, 'DERIVED', 'Extent2', 'ALL', 'PRIMARY', '', '', '', 3, '' 2, 'DERIVED', 'Extent1', 'ref', 'FK_products_categories', 'FK_products_categories', '4', 'orders.Extent2.id', 25128, '' It should be obvious that selecting all 100,000 rows in combination with using filesort just to get 10 rows back does not scale well.
[29 Dec 2014 9:17]
Chiranjeevi Battula
Hello Jurgen Steinblock, Thank you for your feedback. please see the below output and explain plan. Test Name: TestMethod1 Test Outcome: Passed Result StandardOutput: SELECT `Extent1`.`id`, `Extent1`.`name`, `Extent1`.`category_id` FROM `products` AS `Extent1` ORDER BY `Extent1`.`id` ASC LIMIT 10 -- Executing at 12/29/2014 2:36:25 PM +05:30 -- Completed in 0 ms with result: EFMySqlDataReader ********************************** SELECT `Project1`.`C1`, `Project1`.`id`, `Project1`.`name`, `Project1`.`category_id`, `Project1`.`id1`, `Project1`.`name1` FROM (SELECT `Extent1`.`id`, `Extent1`.`name`, `Extent1`.`category_id`, `Extent2`.`id` AS `id1`, `Extent2`.`name` AS `name1`, 1 AS `C1` FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`) AS `Project1` ORDER BY `Project1`.`id` ASC LIMIT 10 -- Executing at 12/29/2014 2:36:25 PM +05:30 -- Completed in 173 ms with result: EFMySqlDataReader explain plan: # id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra '1', 'PRIMARY', '<derived2>', 'ALL', NULL, NULL, NULL, NULL, '23682', 'Using filesort' '2', 'DERIVED', 'Extent2', 'ALL', 'PRIMARY', NULL, NULL, NULL, '3', NULL '2', 'DERIVED', 'Extent1', 'ref', 'FK_products_categories', 'FK_products_categories', '4', 'test.Extent2.id', '7894', NULL Thanks, Chiranjeevi.
[1 Jul 2016 7:31]
roarker roarker
It seems this bug still continues at 5.6.21-log, could someone please confirm it?
[7 Jul 2016 15:40]
Jürgen Steinblock
The good news is that with the current connector (MySql.Data 6.9.9 and MySql.Data.Entity 6.9.9) the generated sql looks much cleaner and doesn't use subqueries at all. The new queries for var products1 = context.products .OrderBy(x => x.id) .Take(10) .ToList(); looks like this SELECT `Extent1`.`id`, `Extent1`.`name`, `Extent1`.`category_id` FROM `product` AS `Extent1` ORDER BY `Extent1`.`id` ASC LIMIT 10 and for var products2 = context.products .Include(x => x.category) .OrderBy(x => x.id) .Take(10) .ToList(); it looks like this SELECT `Extent1`.`id`, `Extent1`.`name`, `Extent2`.`id` AS `id1`, `Extent2`.`name` AS `name1` FROM `product` AS `Extent1` INNER JOIN `category` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id` ORDER BY `Extent1`.`id` ASC LIMIT 10 This is a huge improvement over connector 6.9.5 which used subselects even for a simple include query like var products1 = context.products .OrderBy(x => x.id) .Take(10) .ToList(); I have created a small test project here: Bug75272">https://github.com/Steinblock/MySql.Bug75272