Bug #75272 Poor performance with Entity Framework Include Queries and Order By
Submitted: 19 Dec 2014 8:34 Modified: 7 Jul 2016 15:40
Reporter: Jürgen Steinblock Email Updates:
Status: Verified Impact on me:
None 
Category:Connector / NET Severity:S5 (Performance)
Version:6.9.5.0 OS:Microsoft Windows
Assigned to: CPU Architecture:Any
Tags: entity-framework, MySql.Data.Entity

[19 Dec 2014 8:34] Jürgen Steinblock
Description:
I am developing an application which uses Entity Framework 6 (6.1.1) and MySql.Data (6.9.5) + MySQL.Data.Entity (6.9.5) both from nuget.

My database has a table products and a table categories. I want to load the products paged and eager loading the categories http://msdn.microsoft.com/en-us/data/jj574232.aspx

This works fine and fast. However, as soon as I apply a sort the execution time drops dramatically.

context.products
    .Include(x => x.category)
    .Take(10)
    .ToList();

produces the following output

------------------------------------------------------
Opened connection at 19.12.2014 08:51:33 +01:00

SELECT
`Limit1`.`C1`, 
`Limit1`.`id`, 
`Limit1`.`name`, 
`Limit1`.`category_id`, 
`Limit1`.`id1`, 
`Limit1`.`name1`
FROM (SELECT
`Extent1`.`id`, 
`Extent1`.`name`, 
`Extent1`.`category_id`, 
`Extent2`.`id` AS `id1`, 
`Extent2`.`name` AS `name1`, 
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id` LIMIT 10) AS `Limit1`

-- Executing at 19.12.2014 08:51:33 +01:00

-- Completed in 0 ms with result: EFMySqlDataReader

Closed connection at 19.12.2014 08:51:33 +01:00
------------------------------------------------------

But If I add an OrderBy the performance drops:

context.products
    .Include(x => x.category)
    .OrderBy(x => x.id)
    .Take(10)
    .ToList();

produces the following output

------------------------------------------------------
Opened connection at 19.12.2014 08:51:33 +01:00

SELECT
`Project1`.`C1`, 
`Project1`.`id`, 
`Project1`.`name`, 
`Project1`.`category_id`, 
`Project1`.`id1`, 
`Project1`.`name1`
FROM (SELECT
`Extent1`.`id`, 
`Extent1`.`name`, 
`Extent1`.`category_id`, 
`Extent2`.`id` AS `id1`, 
`Extent2`.`name` AS `name1`, 
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`) AS `Project1`
 ORDER BY 
`Project1`.`id` ASC LIMIT 10

-- Executing at 19.12.2014 08:51:33 +01:00

-- Completed in 252 ms with result: EFMySqlDataReader

Closed connection at 19.12.2014 08:51:33 +01:00
------------------------------------------------------

In my production database with more records the result is even worse (15 seconds compared to 100ms)

How to repeat:
- Create the following tables

CREATE TABLE  `orders`.`categories` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(45) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8;

CREATE TABLE  `orders`.`products` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(45) NOT NULL,
  `category_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `FK_products_categories` (`category_id`),
  CONSTRAINT `FK_products_categories` FOREIGN KEY (`category_id`) REFERENCES `categories` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=113512 DEFAULT CHARSET=utf8;

- Create a new Test Project in Visual Studio
- Add the following test class

    [TestClass]
    public class ProductsTest
    {
        [TestMethod]
        public void CanGetProducts()
        {
            Seed();

            using (var context = new ProductsEntities())
            {

                context.Database.Log = Console.WriteLine;

                var products1 = context.products
                    .OrderBy(x => x.id)
                    .Take(10)
                    .ToList();

                Console.WriteLine("---");

                var products2 = context.products
                    .Include(x => x.category)
                    .OrderBy(x => x.id)
                    .Take(10)
                    .ToList();

            }
        }

        private void Seed()
        {
            using (var context = new ProductsEntities())
            {
                if (context.categories.Count() == 0)
                {
                    context.categories.Add(new category { name = "Category 1" });
                    context.categories.Add(new category { name = "Category 2" });
                    context.categories.Add(new category { name = "Category 3" });
                    context.SaveChanges();
                }

                if (context.products.Count() == 0)
                {
                    context.Configuration.AutoDetectChangesEnabled = false;

                    var rand = new Random();
                    var categories = context.categories.ToList();

                    for (int i = 1; i <= 100000; i++)
                    {
                        var product = new product
                        {
                            category = categories[rand.Next(categories.Count)],
                            name = "Product " + i.ToString(),
                        };
                        context.products.Add(product);

                        if (i % 1000 == 0)
                        {
                            context.ChangeTracker.DetectChanges();
                            context.SaveChanges();
                        }
                    }

                    context.ChangeTracker.DetectChanges();
                    context.SaveChanges();
                }
            }
        }

    }

- Run the test and look at the output

Suggested fix:
The seconds query should be optimized. Exlcuding the orderby from the inner query causes a full table select with filesort. EXPLAIN shows

1, 'PRIMARY', '<derived2>', 'ALL', '', '', '', '', 100000, 'Using filesort'
2, 'DERIVED', 'Extent2', 'ALL', 'PRIMARY', '', '', '', 3, ''
2, 'DERIVED', 'Extent1', 'ref', 'FK_products_categories', 'FK_products_categories', '4', 'orders.Extent2.id', 25128, ''

This query:

    SELECT
    ...
    FROM (
      SELECT
      ...
      FROM `products` AS `Extent1`
      INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`
    ) AS `Project1`
    ORDER BY 
    `Project1`.`id` ASC LIMIT 10

should include the ORDER BY and LIMIT in the inner query like the query without orderby

    SELECT
    ...
    FROM (
      SELECT
      ...
      FROM `products` AS `Extent1`
      INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`
      ORDER BY 
        `Extent1`.`id` ASC LIMIT 10
    ) AS `Project1`
[19 Dec 2014 8:56] Jürgen Steinblock
Just for update. This sql will be generated by Entity Framework for Sql Server Compact (and I suppose for SQL Server in general). It does not include an outer query and if I think about it, the outer query is unnecessary for mysql, too.

SELECT TOP (10) 
[Extent1].[id] AS [id], 
[Extent1].[name] AS [name], 
[Extent1].[category_id] AS [category_id], 
[Extent2].[id] AS [id1], 
[Extent2].[name] AS [name1], 
FROM  [products] AS [Extent1]
LEFT OUTER JOIN [categories] AS [Extent2] ON [Extent1].[category_id] = [Extent2].[id]
ORDER BY [Extent1].[id] ASC
[29 Dec 2014 5:35] Chiranjeevi Battula
Hello Jürgen Steinblock,

Thank you for the bug report.
I tried to reproduce the issue at my end using Visual Studio 2013 (C#.Net), MySQL Connector/Net 6.9.5 and EF Version : MySql.Data.Entity.EF6 (6.9.5.0) but Couldn't trace out any issue in performance.
Could you please list out the steps you tried out at your end? Please provide repeatable test case in order for us to confirm this issue at our end.

Thanks,
Chiranjeevi.
[29 Dec 2014 5:35] Chiranjeevi Battula
Testing output

Test Name:	TestMethod1
Test Outcome:	Passed
Result StandardOutput:	
SELECT
`Extent1`.`id`, 
`Extent1`.`name`, 
`Extent1`.`category_id`
FROM `products` AS `Extent1`
 ORDER BY 
`Extent1`.`id` ASC LIMIT 10

-- Executing at 12/26/2014 12:43:05 PM +05:30

-- Completed in 0 ms with result: EFMySqlDataReader

******************************************
SELECT
`Project1`.`C1`, 
`Project1`.`id`, 
`Project1`.`name`, 
`Project1`.`category_id`, 
`Project1`.`id1`, 
`Project1`.`name1`
FROM (SELECT
`Extent1`.`id`, 
`Extent1`.`name`, 
`Extent1`.`category_id`, 
`Extent2`.`id` AS `id1`, 
`Extent2`.`name` AS `name1`, 
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`) AS `Project1`
 ORDER BY 
`Project1`.`id` ASC LIMIT 10
[29 Dec 2014 7:24] Jürgen Steinblock
Hello Chiranjeevi,

first of all your testing output lacks the interesing part. For the first query you see this line:

    Completed in 0 ms with result: EFMySqlDataReader

but for the second one (the one with the poor performance) the result is missing in the output you provided.

My test case contained a `Seed()` method which would insert some data if the database is empty. If you don't experience performance problems you could increse the max value in the for loop to mabybe one million, delete all rows from products and start again.

Also my test case was not designed to fail but just dumps the queries and execution times to console (or test output).

But, please look at the EXPLAIN statement for the second query

1, 'PRIMARY', '<derived2>', 'ALL', '', '', '', '', 100000, 'Using filesort'
2, 'DERIVED', 'Extent2', 'ALL', 'PRIMARY', '', '', '', 3, ''
2, 'DERIVED', 'Extent1', 'ref', 'FK_products_categories', 'FK_products_categories', '4', 'orders.Extent2.id', 25128, ''

It should be obvious that selecting all 100,000 rows in combination with using filesort just to get 10 rows back does not scale well.
[29 Dec 2014 9:17] Chiranjeevi Battula
Hello Jurgen Steinblock,

Thank you for your feedback.
please see the below output and explain plan.

Test Name:	TestMethod1
Test Outcome:	Passed
Result StandardOutput:	
SELECT
`Extent1`.`id`, 
`Extent1`.`name`, 
`Extent1`.`category_id`
FROM `products` AS `Extent1`
 ORDER BY 
`Extent1`.`id` ASC LIMIT 10

-- Executing at 12/29/2014 2:36:25 PM +05:30

-- Completed in 0 ms with result: EFMySqlDataReader

**********************************
SELECT
`Project1`.`C1`, 
`Project1`.`id`, 
`Project1`.`name`, 
`Project1`.`category_id`, 
`Project1`.`id1`, 
`Project1`.`name1`
FROM (SELECT
`Extent1`.`id`, 
`Extent1`.`name`, 
`Extent1`.`category_id`, 
`Extent2`.`id` AS `id1`, 
`Extent2`.`name` AS `name1`, 
1 AS `C1`
FROM `products` AS `Extent1` INNER JOIN `categories` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`) AS `Project1`
 ORDER BY 
`Project1`.`id` ASC LIMIT 10

-- Executing at 12/29/2014 2:36:25 PM +05:30

-- Completed in 173 ms with result: EFMySqlDataReader

explain plan:

# id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
'1', 'PRIMARY', '<derived2>', 'ALL', NULL, NULL, NULL, NULL, '23682', 'Using filesort'
'2', 'DERIVED', 'Extent2', 'ALL', 'PRIMARY', NULL, NULL, NULL, '3', NULL
'2', 'DERIVED', 'Extent1', 'ref', 'FK_products_categories', 'FK_products_categories', '4', 'test.Extent2.id', '7894', NULL

Thanks,
Chiranjeevi.
[1 Jul 2016 7:31] roarker roarker
It seems this bug still continues at 5.6.21-log, could someone please confirm it?
[7 Jul 2016 15:40] Jürgen Steinblock
The good news is that with the current connector (MySql.Data 6.9.9 and MySql.Data.Entity 6.9.9) the generated sql looks much cleaner and doesn't use subqueries at all.

The new queries for 

                var products1 = context.products
                    .OrderBy(x => x.id)
                    .Take(10)
                    .ToList();

                
looks like this 

SELECT
`Extent1`.`id`, 
`Extent1`.`name`, 
`Extent1`.`category_id`
FROM `product` AS `Extent1`
 ORDER BY 
`Extent1`.`id` ASC LIMIT 10

and for

                var products2 = context.products
                    .Include(x => x.category)
                    .OrderBy(x => x.id)
                    .Take(10)
                    .ToList();

it looks like this

SELECT
`Extent1`.`id`, 
`Extent1`.`name`, 
`Extent2`.`id` AS `id1`, 
`Extent2`.`name` AS `name1`
FROM `product` AS `Extent1` INNER JOIN `category` AS `Extent2` ON `Extent1`.`category_id` = `Extent2`.`id`
 ORDER BY 
`Extent1`.`id` ASC LIMIT 10

This is a huge improvement over connector 6.9.5 which used subselects even for a simple include query like

                var products1 = context.products
                    .OrderBy(x => x.id)
                    .Take(10)
                    .ToList();

I have created a small test project here: Bug75272">https://github.com/Steinblock/MySql.Bug75272