Bug #72004 Generated SQL requests column that doesnt exist
Submitted: 11 Mar 2014 10:11 Modified: 20 Oct 2014 0:01
Reporter: John Tucker Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / NET Severity:S2 (Serious)
Version:6.8.3 OS:Windows (8)
Assigned to: Fernando Gonzalez.Sanchez CPU Architecture:Any
Tags: entity, linq

[11 Mar 2014 10:11] John Tucker
Description:
We are investigating moving from .NET Connector 6.5.4 to 6.8.3 but are having problems with some queries that used to work, now failing with 'Unknown Column' error.
(Visual Studio 2010 on Windows 8 client, .NET Framework 4.0, MySQL Server 5.5.28 on Server 2008, MyISAM tables, latin1 - default collation)

This fairly simple linq query

    var results = (from pow in context.Pows
			select new PowSearchResult()
			{
				Id = pow.Id,
				Rank = pow.Rank,
				Surname = pow.Surname,
				FirstName = pow.Firstname,
				UnitName = pow.Regiment,
				Camp = pow.Camp
			}
		   );

    if (!ignoreSurname)
    {
	//I moved the Where... StartsWith clause in the statement above out as 
        //that has caused issues in other queries with 6.8.3 (thats another story, possibly related)
	//Moving it out to seperate statement resolved
        results = results.Where(r => r.Surname.StartsWith(surname));
    }

    results = results
        .OrderBy(pow => pow.Surname)
        .ThenBy(pow => pow.FirstName)
        .Skip(skip)
	.Take(pageSize);

So examining the generated SQL with ((System.Data.Objects.ObjectQuery)results).ToTraceString()
It produces a SQL query like this

    SELECT `Project1`.`id`,  `Project1`.`C1`,  `Project1`.`rank`,  `Project1`.`surname`,  `Project1`.`firstname`,  `Project1`.`regiment`,  `Project1`.`camp`
    FROM `pows` AS `Project1`  WHERE (LOCATE(@p__linq__0, `Project1`.`surname`)) = 1  ORDER BY  `Project1`.`surname` ASC,  `Project1`.`firstname` ASC LIMIT 0,20

Which will then later fail with 'Unknown column Project1.C1'

So where is it getting 'C1' from? It's not a column in the table or property of PowSearchResult. I will repeat, this did not happen with 6.5.4
(Side note - Its using LOCATE rather than LIKE, could that also be related to some collation type issue?)

How to repeat:
If it was possible to repeat with such a simple query then i'm sure many more people would have reported this problem by now. So I don't know what to suggest, or what could be different about our system and how we are using the connector. 

Suggested fix:
Make it not request Project1.C1 ?
[29 May 2014 16:37] Henrik Schnell
I am having the exact same issue. This is extremely annoying! I cannot use the mysql .net connector even for simple queries.
[1 Jul 2014 23:04] Emil Lenngren
Hi. I tried to run a similar query (with my own table designs) and it worked for me. I didn't get a C1 column.

Can you show your database table design and EF Models?
[3 Jul 2014 10:24] John Tucker
I've uploaded a create table script for the table, and a screenshot of the entity table mapping (we use a custom tool to strip underscores and camel case the field names mapped into EF).
[29 Aug 2014 14:32] Alexei Ch
I have the same issue.  It  this can be reproduced  in 6.8.3 and also in 6.9.2 !
When I run a simple linq query against my DB table:

long[] array = (from r in context.MetaDataTags where (r.ResourceId == nVal) select (long)r.Id).OrderBy(n => n).ToArray();

I am getting exception “Unknown column 'Project1.C1' in 'field list'”
However if I remove cast before r.Id so that expression looks like that:

int [] array = (from r in context.MetaDataTags where (r.ResourceId == nVal) select r.Id).OrderBy(n => n).ToArray();

everything work fine. Note also that the type of Id column in INT.
This is really a big problem. Release 6.6.5 was working properly.
[5 Sep 2014 23:51] Emil Lenngren
The bug is trigged when fusing two selects into one.
The original query is something like:

Select Project1.C1 From (Select Cast(Extent1.Id As Signed Integer) AS C1 From MetaDataTags As Extent1 Where nVal = Extent1.ResourceId) As Project1 Order By Project1.C1 Asc

(where Signed Integer means BIGINT)

First, for some reason, Cast is not implemented in the sql generator:

    public override SqlFragment Visit(DbCastExpression expression)
    {
      //TODO: handle casting
      return expression.Argument.Accept(this);
    }

which means that no cast is made on the value on the database side, so the query would look like this if the generator generates it naively:
Select Project.C1 From (Select Extent1.Id AS C1 From MetaDataTags As Extent1 Where nVal = Extent1.ResourceId) As Project1 Order By Project1.C1 Asc

Now, the generator doesn't stop there; the Mysql connector has a post-processing optimization pass which tries to combine these two selects into one.
One of many conditions for being able to combine two selects is that each projected column in the inner query mustn't be an expression; it must be a table column (like Extent1.Id).
It is further assumed that the column isn't aliased, which might not always be the case, and this triggers the bug. That means the name C1 is later used instead of the correct one, that is, Id.

The generated SQL I see is the following:
SELECT
`Project1`.`C1`
FROM `MetaDataTags` AS `Project1`
 WHERE (nVal parameter here) = `Project1`.`ResourceId`
 ORDER BY 
`Project1`.`C1` ASC

Suggested fix:
Alt 1: Simply don't do "fusing" if the column name has an alias. This is easy to fix (< 1 line of code) but misses the optimization opportunity.
Alt 2: Do fusing, but make sure all inner references are replaced not only with the table name (Extent1 -> Project1) but also all outer references with the alias (Project1.C1 -> Project1.Id). This might however be complicated, since some of the references might not have been visited yet.

To fix with alt 1, you should replace the code in Source/MySql.Data.Entity/Generators/SqlGenerator.cs in the method CanFuseSelect:

        List<ColumnFragment> cols = innerSelect.Columns;
        for (int i = 0; i < cols.Count; i++)
          if (cols[i].Literal != null)
            return false;
        return true;

with something like this:

        List<ColumnFragment> cols = innerSelect.Columns;
        for (int i = 0; i < cols.Count; i++)
          if (cols[i].Literal != null || (cols[i].ColumnAlias != null && cols[i].ColumnAlias != cols[i].ColumnName))
            return false;
        return true;
[10 Sep 2014 12:36] John Tucker
Not sure how that relates to my original problem, I have no fused queries, no casting, no sub/inner queries generated.

Still, I found a solution. Simply don't use linq to sql for Mysql. I've given up on this and other issues like bizarrely inefficient generated queries with unnecessary sub queries and projections etc.

Life is a lot simpler just building the SQL statement yourself and executing directly, most times you can map the results directly to an entity easily anyway.
[10 Sep 2014 12:46] Emil Lenngren
It's not MySql that creates all the subqueries and stuff but Microsoft's EF. The MySQL provider tries to optimize the query tree it gets from EF by "fusing" projections. The problem is that the fusing implementation is buggy. Let's hope the MySql people fix it.
[19 Sep 2014 20:09] Fernando Gonzalez.Sanchez
Developer Test case for Bug72004

Attachment: Bug72004TestCase.7z (application/octet-stream, text), 20.51 KiB.

[19 Sep 2014 20:13] Fernando Gonzalez.Sanchez
Hi,

I am working on this fix, the test case for query 
"long[] array = (from r in context.MetaDataTags where (r.ResourceId == nVal) select (long)r.Id).OrderBy(n => n).ToArray();" 

can be reproduced, but not so the test case for John Tucker.

I have uploaded my test case (tested against EF6/Cnet 6.9.3), if you guys can give feedback on it would be appreciated.

Otherwise the fix will be done against the reproducible test case.

Thanks.
[29 Sep 2014 16:28] Fernando Gonzalez.Sanchez
Posted by developer:
 
With the fix the query is generated correctly (without non-existing columns).

This fix will appear in MySQL Connector/NET 6.7.6, 6.8.4, 6.9.5.
[30 Sep 2014 0:44] Philip Olson
Fixed as of the upcoming MySQL Connector/Net 6.7.6/6.8.4/6.9.5 releases, and here's the changelog entry:

The query optimization routine would return statements with invalid table
aliases when nested queries were being optimized. This would throw an
"Unknown column" exception.

Thank you for the bug report.