Bug #67183 Malformed Query while eager loading with EF 4 due to multiple projections
Submitted: 10 Oct 2012 21:00 Modified: 28 Jun 2013 2:02
Reporter: Travis J Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / NET Severity:S1 (Critical)
Version:6.5.4 OS:Any
Assigned to: Fernando Gonzalez.Sanchez CPU Architecture:Any
Tags: eager-loading, entity-framework-4, projection, query

[10 Oct 2012 21:00] Travis J
Description:
Using a nested projection causes a malformed query to be created. The query is malformed in that it returns incorrect data (data from same named fields from different tables). I had originally thought this was the fault of entity framework. However, after multiple questions on stack overflow (http://stackoverflow.com/q/12825643/1026459 and http://stackoverflow.com/q/11676513/1026459), and a couple months of dealing with this issue, it has become apparent that it is the fault of mysql-connector. The main reason that I am faulting mysql-connector is that when using entity-framework with MSSQL this issue does not arise. In the links to stackoverflow, you can see the malformed query, and the correctly rendered mssql query (in the second one). "Does this bug report represent a security vulnerability?" I did not select Yes, however, returning data in the wrong fields could possibly be seen as that.

How to repeat:
using System;
using System.Collections.Generic;
using System.Data.Entity;
using System.Linq;

namespace EFInclude
{
    public class Harbor
    {
        public int HarborId { get; set; }
        public virtual ICollection<Ship> Ships { get; set; }

        public string Description { get; set; }
    }

    public class Ship
    {
        public int ShipId { get; set; }
        public int HarborId { get; set; }
        public virtual Harbor Harbor { get; set; }
        public virtual ICollection<CrewMember> CrewMembers { get; set; }

        public string Description { get; set; }
    }

    public class CrewMember
    {
        public int CrewMemberId { get; set; }
        public int ShipId { get; set; }
        public virtual Ship Ship { get; set; }
        public int RankId { get; set; }
        public virtual Rank Rank { get; set; }
        public int ClearanceId { get; set; }
        public virtual Clearance Clearance { get; set; }

        public string Description { get; set; }
    }

    public class Rank
    {
        public int RankId { get; set; }
        public virtual ICollection<CrewMember> CrewMembers { get; set; }

        public string Description { get; set; }
    }

    public class Clearance
    {
        public int ClearanceId { get; set; }
        public virtual ICollection<CrewMember> CrewMembers { get; set; }

        public string Description { get; set; }
    }

    public class MyContext : DbContext
    {
        public DbSet<Harbor> Harbors { get; set; }
        public DbSet<Ship> Ships { get; set; }
        public DbSet<CrewMember> CrewMembers { get; set; }
        public DbSet<Rank> Ranks { get; set; }
        public DbSet<Clearance> Clearances { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            Database.SetInitializer(new DropCreateDatabaseAlways<MyContext>());

            using (var context = new MyContext())
            {
                context.Database.Initialize(true);

                var harbor = new Harbor
                {
                    Ships = new HashSet<Ship>
                    {
                        new Ship
                        {
                            CrewMembers = new HashSet<CrewMember>
                            {
                                new CrewMember
                                {
                                    Rank = new Rank { Description = "Rank A" },
                                    Clearance = new Clearance { Description = "Clearance A" },
                                    Description = "CrewMember A"
                                },
                                new CrewMember
                                {
                                    Rank = new Rank { Description = "Rank B" },
                                    Clearance = new Clearance { Description = "Clearance B" },
                                    Description = "CrewMember B"
                                }
                            },
                            Description = "Ship AB"
                        },
                        new Ship
                        {
                            CrewMembers = new HashSet<CrewMember>
                            {
                                new CrewMember
                                {
                                    Rank = new Rank { Description = "Rank C" },
                                    Clearance = new Clearance { Description = "Clearance C" },
                                    Description = "CrewMember C"
                                },
                                new CrewMember
                                {
                                    Rank = new Rank { Description = "Rank D" },
                                    Clearance = new Clearance { Description = "Clearance D" },
                                    Description = "CrewMember D"
                                }
                            },
                            Description = "Ship CD"
                        }
                    },
                    Description = "Harbor ABCD"
                };

                context.Harbors.Add(harbor);
                context.SaveChanges();
            }

            using (var context = new MyContext())
            {
                DbSet<Harbor> dbSet = context.Set<Harbor>();
                IQueryable<Harbor> query = dbSet;
                query = query.Include(entity => entity.Ships);
                query = query.Include(entity => entity.Ships.Select(s => s.CrewMembers));
                query = query.Include(entity => entity.Ships.Select(s => s.CrewMembers.Select(cm => cm.Rank)));
                query = query.Include(entity => entity.Ships.Select(s => s.CrewMembers.Select(cm => cm.Clearance)));

                var sqlString = query.ToString();
                // see below for the generated SQL query

                var harbor = query.Single();

                Console.WriteLine("Harbor {0} Description = \"{1}\"",
                    harbor.HarborId, harbor.Description);
                foreach (var ship in harbor.Ships)
                {
                    Console.WriteLine("- Ship {0} Description = \"{1}\"",
                        ship.ShipId, ship.Description);
                    foreach (var crewMember in ship.CrewMembers)
                    {
                        Console.WriteLine("-- CrewMember {0} Description = \"{1}\"", 
                            crewMember.CrewMemberId, crewMember.Description);
                        Console.WriteLine("-- CrewMember {0} Rank Description = \"{1}\"",
                            crewMember.CrewMemberId, crewMember.Rank.Description);
                        Console.WriteLine("-- CrewMember {0} Clearance Description = \"{1}\"",
                            crewMember.CrewMemberId, crewMember.Clearance.Description);
                    }
                }

                Console.ReadLine();
            }
        }
    }
}

Suggested fix:
Both questions, in the answers, have complete programs as examples. This sql comes from the second question listed above.

Make this sql:

{SELECT
[Project1].[HarborId], 
[Project1].[Description], 
[Project1].[C2] AS [C1], 
[Project1].[ShipId], 
[Project1].[HarborId1], 
[Project1].[Description1], 
[Project1].[C1] AS [C2], 
[Project1].[CrewMemberId], 
[Project1].[ShipId1], 
[Project1].[ClearanceId], 
[Project1].[RankId], 
[Project1].[Description2], 
[Project1].[RankId1], 
[Project1].[Description3], 
[Project1].[ClearanceId1], 
[Project1].[Description4], 
FROM (SELECT
[Extent1].[HarborId], 
[Extent1].[Description], 
[Join3].[ShipId], 
[Join3].[HarborId] AS [HarborId1], 
[Join3].[Description]AS [Description1], 
[Join3].[CrewMemberId], 
[Join3].[ShipId]AS [ShipId1], 
[Join3].[ClearanceId], 
[Join3].[RankId], 
[Join3].[Description] AS [Description2], 
[Join3].[RankId] AS [RankId1], 
[Join3].[Description] AS [Description3], 
[Join3].[ClearanceId] AS [ClearanceId1], 
[Join3].[Description] AS [Description4], 
CASE WHEN ([Join3].[ShipId] IS  NULL) THEN (NULL)  WHEN ([Join3].[CrewMemberId] IS  NULL) THEN (NULL)  ELSE (1) END AS [C1], 
CASE WHEN ([Join3].[ShipId] IS  NULL) THEN (NULL)  ELSE (1) END AS [C2]
FROM [Harbor] AS [Extent1] LEFT OUTER JOIN (SELECT
[Extent2].[ShipId], 
[Extent2].[HarborId], 
[Extent2].[Description], 
[Join2].[CrewMemberId], 
[Join2].[ShipId] AS [ShipID1], 
[Join2].[ClearanceId], 
[Join2].[RankId], 
[Join2].[Description] AS [DESCRIPTION1], 
[Join2].[RankID1], 
[Join2].[DESCRIPTION1] AS [DESCRIPTION11], 
[Join2].[ClearanceID1], 
[Join2].[DESCRIPTION2], 
FROM [Ship] AS [Extent2] LEFT OUTER JOIN (SELECT
[Extent3].[CrewMemberId], 
[Extent3].[ShipId], 
[Extent3].[ClearanceId], 
[Extent3].[RankId], 
[Extent3].[Description], 
[Extent4].[RankId] AS [RankID1], 
[Extent4].[Description] AS [DESCRIPTION1], 
[Extent5].[ClearanceId] AS [ClearanceID1], 
[Extent5].[Description] AS [DESCRIPTION2], 
FROM [CrewMember] AS [Extent3] INNER JOIN [Rank] AS [Extent4] ON [Extent3].[RankId] = [Extent4].[RankId] LEFT OUTER JOIN [Clearance] AS [Extent5] ON [Extent3].[ClearanceId] = [Extent5].[ClearanceId]) AS [Join2] ON [Extent2].[ShipId] = [Join2].[ShipId]) AS [Join3] ON [Extent1].[HarborId] = [Join3].[HarborId]
 WHERE [Extent1].[HarborId] = @p__linq__0) AS [Project1]
 ORDER BY 
[Project1].[HarborId] ASC, 
[Project1].[C2] ASC, 
[Project1].[ShipId] ASC, 
[Project1].[C1] ASC}

In the above, NOTE THIS LINE: [Join2].[DESCRIPTION1] AS [DESCRIPTION11],
and how the AS description is denoted as 11.

-------------------------
Into this sql (or equivalent):
-------------------------

SELECT 
[Project1].[HarborId] AS [HarborId], 
[Project1].[Description] AS [Description], 
[Project1].[C2] AS [C1], 
[Project1].[ShipId] AS [ShipId], 
[Project1].[HarborId1] AS [HarborId1], 
[Project1].[Description1] AS [Description1], 
[Project1].[C1] AS [C2], 
[Project1].[CrewMemberId] AS [CrewMemberId], 
[Project1].[ShipId1] AS [ShipId1], 
[Project1].[RankId] AS [RankId], 
[Project1].[ClearanceId] AS [ClearanceId], 
[Project1].[Description2] AS [Description2], 
[Project1].[RankId1] AS [RankId1], 
[Project1].[Description3] AS [Description3], 
[Project1].[ClearanceId1] AS [ClearanceId1], 
[Project1].[Description4] AS [Description4]
FROM ( SELECT 
    [Extent1].[HarborId] AS [HarborId], 
    [Extent1].[Description] AS [Description], 
    [Join3].[ShipId1] AS [ShipId], 
    [Join3].[HarborId] AS [HarborId1], 
    [Join3].[Description1] AS [Description1], 
    [Join3].[CrewMemberId] AS [CrewMemberId], 
    [Join3].[ShipId2] AS [ShipId1], 
    [Join3].[RankId1] AS [RankId], 
    [Join3].[ClearanceId1] AS [ClearanceId], 
    [Join3].[Description2] AS [Description2], 
    [Join3].[RankId2] AS [RankId1], 
    [Join3].[Description3] AS [Description3], 
    [Join3].[ClearanceId2] AS [ClearanceId1], 
    [Join3].[Description4] AS [Description4], 
    CASE WHEN ([Join3].[ShipId1] IS NULL) THEN CAST(NULL AS int) WHEN ([Join3].[CrewMemberId] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1], 
    CASE WHEN ([Join3].[ShipId1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
    FROM  [dbo].[Harbors] AS [Extent1]
    LEFT OUTER JOIN  (SELECT [Extent2].[ShipId] AS [ShipId1], [Extent2].[HarborId] AS [HarborId], [Extent2].[Description] AS [Description1], [Join2].[CrewMemberId], [Join2].[ShipId2], [Join2].[RankId1], [Join2].[ClearanceId1], [Join2].[Description2], [Join2].[RankId2], [Join2].[Description3], [Join2].[ClearanceId2], [Join2].[Description4]
        FROM  [dbo].[Ships] AS [Extent2]
        LEFT OUTER JOIN  (SELECT [Extent3].[CrewMemberId] AS [CrewMemberId], [Extent3].[ShipId] AS [ShipId2], [Extent3].[RankId] AS [RankId1], [Extent3].[ClearanceId] AS [ClearanceId1], [Extent3].[Description] AS [Description2], [Extent4].[RankId] AS [RankId2], [Extent4].[Description] AS [Description3], [Extent5].[ClearanceId] AS [ClearanceId2], [Extent5].[Description] AS [Description4]
            FROM   [dbo].[CrewMembers] AS [Extent3]
            INNER JOIN [dbo].[Ranks] AS [Extent4] ON [Extent3].[RankId] = [Extent4].[RankId]
            LEFT OUTER JOIN [dbo].[Clearances] AS [Extent5] ON [Extent3].[ClearanceId] = [Extent5].[ClearanceId] ) AS [Join2] ON [Extent2].[ShipId] = [Join2].[ShipId2] ) AS [Join3] ON [Extent1].[HarborId] = [Join3].[HarborId]
)  AS [Project1]
ORDER BY [Project1].[HarborId] ASC, [Project1].[C2] ASC, [Project1].[ShipId] ASC, [Project1].[C1] ASC
[10 Oct 2012 21:02] Travis J
Feel free to email me for clarifications or for any other related issue.
[11 Oct 2012 18:24] Travis J
I believe this is the offending code:

protected string CreateUniqueParameterName()
    {
      return String.Format("@gp{0}", parameterCount++);
    }

It is inside:

namespace MySql.Data.Entity
{
  abstract class SqlGenerator : DbExpressionVisitor<SqlFragment>
  {

I think that in the second projection, or upon projections > 1, the parameterCount does not persist, starts over, and the problem occurs.
[11 Oct 2012 18:29] Travis J
Also note, from the original submission in the former sql statement

Description1 is used twice:

    [Join3].[Description]AS [Description1], 
    [Join2].[Description] AS [DESCRIPTION1],
[15 Oct 2012 21:26] Travis J
Similarly it may also be this: 

private string MakeColumnNameUnique(string baseName)
    {
      int i = 1;
      baseName = baseName.ToUpper();
      hasRenamedColumns = true;
      while (true)
      {
        string name = String.Format("{0}{1}", baseName, i);
        if (!columnHash.ContainsKey(name)) return name;
        i++;
      }
    }

Pretty much, somewhere there is code which is allowing duplication.
[19 Oct 2012 18:56] Travis J
I am still testing with a local version of the open source code for 6.5.4 available. Please contact me with any questions or findings.

I have tried to edit both CreateUniqueParameterName and MakeColumnNameUnique to include a Guid.NewGuid() inside of them instead of a single integer. This works somewhat, but not enough. An issue which arises from this, is that a primary key gets renamed but then referenced as the original name like this:

    Extent2.MyPrimaryId1 AS MyPrimaryId1

and then later:

    Extent2.MyPrimaryId

which throws this error:

    Exception Details: MySql.Data.MySqlClient.MySqlException: Unknown column 'Extent2.MyPrimaryId' in 'field list'
[19 Oct 2012 22:10] Fernando Gonzalez.Sanchez
Hi Travis,

Confirmed, the problem is in the description field of Rank & Clearance not propagated beyond Join3.

Thanks for the feedback, working on it.
Also raising the priority to Critical.

(And yes, the EF Sql generation code is a bit tricky)
[23 Oct 2012 16:38] Travis J
Continuation...

I am not sure what causes the Property.Name inside of RowType row = expression.ResultType.EdmType as RowType; from line ~282 in SqlGenerator.cs inside of MySql.Data.Entity to contain names which, when a duplicate would have existed in the property array, appends an integer. This leads to such issues as "Extend2.PrimaryId1". I could not figure out how to prevent this so was only able to include a hack to remove it once found like this (note that this also assigns a GUID to the alias name to prevent duplication).

This was added at line 296 after this code:

MySql.Data.Entity.SqlGenerator.cs

291        else
292        {
293          col = new ColumnFragment(null, null);
294          col.Literal = fragment;
295        }

here is the new code added after that code

        string colName = row.Properties[i].Name;
        char lastChar = colName[colName.Length - 1];
        int lastInt;
        bool colCheck = int.TryParse(lastChar.ToString(),out lastInt);
        if (colCheck)
        {
            colName = colName.Substring(0, colName.Length - 1);
        }
        col.ColumnName = colName;
        var guid = Guid.NewGuid();
        col.ColumnAlias = colName + guid.ToString();

This essentially rips the integer appended to MyPrimaryId off with the understanding that the alias is guaranteed to be unique with the GUID. Making this change is working on my local version. However, it is a hack and there are side affects.

1) When aliases are used such as C1 it will remove the 1 and just leave C. I am considering writing a test scenario to make sure that doesn't happen anymore.

2) In a complex query, when the main query has been constructed, Order By does not have access to these new aliased names and there is an exception thrown.
[23 Oct 2012 17:07] Travis J
Hi Fernando,

Thanks for noticing this as an issue, I think for multi tenant systems this could lead to some pretty egregious data leaks.

In trying to find a workaround for this on my local version, I have been reviewing as much of the code as possible. I have made a log of what changes I am making so that if I have a working version I can reproduce the changes made easily.

Right now, I am investigating the writesql visitor methods because it seems to me that some fragments sent to SqlFragment.cs contain a name, but not an alias - even though there are conditional statements which look for an alias. This is part of the issue regarding Order By as stated in my previous comment, but perhaps could be an issue affecting other features as well.
[24 Oct 2012 17:41] Travis J
Thank you for working on this, have you made any progress?

I could test a working version if you want.
[25 Oct 2012 19:26] Fernando Gonzalez.Sanchez
Hi Travis,

Using integers for alias is OK, has worked fine in the past, its a simple way of differentiating columns from different tables.

I think the problem is that some properties (from ColumnFragment.PropertyFragment) were lost, and thus SelectStatement.HasDifferentNameForColumn fails to do the proper match.

Still working on a fix (its very common that a "candidate fix" happens to broke something else) when I get a fix that passes my unit test suite, I would certainly like you to give it a try and confirm it.
[26 Oct 2012 19:35] Travis J
Hi Fernando,

Thanks for the point on using integers for naming, and for your efforts with this issue.

I was curious about the properties as well, so hopefully that provides some headway.

I would be more than willing to test your changes when complete on my application, you can send it to this email if you want, or post a link.

Thanks again!
[29 Oct 2012 14:42] Fernando Gonzalez.Sanchez
Hi Travis,

As an update this, I have a partial fix that solves this scenario, but causes or tests to get broken.

However, it is clear now that the currently broken code is related to unions, if you rewrite the LINQ this way:

//IQueryable<Harbor> query = dbSet;
        //query = query.Include(entity => entity.Ships);
        //query = query.Include(entity => entity.Ships.Select(s => s.CrewMembers));
        //query = query.Include(entity => entity.Ships.Select(s => s.CrewMembers.Select(cm => cm.Rank)));
        //query = query.Include(entity => entity.Ships.Select(s => s.CrewMembers.Select(cm => cm.Clearance)));
        var query =
          from ha in context.Harbors
          join sh in context.Ships on ha.HarborId equals sh.HarborId
          join cr in context.CrewMembers on sh.ShipId equals cr.ShipId
          join ra in context.Ranks on cr.RankId equals ra.RankId
          join cl in context.Clearances on cr.ClearanceId equals cl.ClearanceId
          select new
          {
            ha.HarborId,
            HarborDescription = ha.Description,
            sh.ShipId,
            ShipDescription = sh.Description,
            cr.CrewMemberId,
            CrewMemberDescription = cr.Description,
            ra.RankId,
            RankDescription = ra.Description,
            cl.ClearanceId,
            ClearanceDescription = cl.Description
          };

It will correctly generate the SQL query:

SELECT
`Extent1`.`HarborId`, 
`Extent1`.`Description`, 
`Extent2`.`ShipId`, 
`Extent2`.`Description` AS `Description1`, 
`Extent3`.`CrewMemberId`, 
`Extent3`.`Description` AS `Description2`, 
`Extent4`.`RankId`, 
`Extent4`.`Description` AS `Description3`, 
`Extent5`.`ClearanceId`, 
`Extent5`.`Description` AS `Description4`
FROM `Harbors` AS `Extent1` INNER JOIN `Ships` AS `Extent2` ON `Extent1`.`HarborId` = `Extent2`.`HarborId` INNER JOIN `CrewMembers` AS `Extent3` ON `Extent2`.`ShipId` = `Extent3`.`ShipId` INNER JOIN `Ranks` AS `Extent4` ON `Extent3`.`RankId` = `Extent4`.`RankId` INNER JOIN `Clearances` AS `Extent5` ON `Extent3`.`ClearanceId` = `Extent5`.`ClearanceId`

when all the columns have the right values. While this workaround may not be perfect, may temporarily help you out if are blocked for this issue.

I am still finishing the original fix, lowering the priority of this due to the workaround.
[29 Oct 2012 18:33] Travis J
Hi Fernando, 

Thanks for the update. I am glad that you have gotten to the heart of the issue :) Unfortunately, this workaround will not be viable in my current situation. The example code you are using is just a demonstration of the problem. In actual code those are being formed dynamically and cannot be explicitly written out.

Right now, what I am having to do is to leave out the duplicated area of the object graph, and then iterate to reform that piece of the graph. It makes the query run a lot slower but it "works" for now.

I would appreciate it if you left the severity level where it was, but understand if you still feel it necessary to lower.

Thanks again,
Travis
[20 Nov 2012 19:17] Travis J
Hello,

I was just following up on this issue. Have nested projections from linq been fixed, is there a patch in a coming version?

Thanks,
Travis
[21 Nov 2012 16:06] Fernando Gonzalez.Sanchez
Hi Travis, I am resuming work this one (after having finished other higher priority bugs)...
[28 Jun 2013 2:02] Philip Olson
Fixed as of the upcoming Connector/Net 6.5.7, 6.6.6, and 6.7.4 releases, and here's the changelog entry:

Using a nested projection causes a malformed query to be created.

Thank you for the bug report.