Bug #73953 left join to view with <> test causing too many results
Submitted: 17 Sep 2014 16:47 Modified: 13 Jul 2015 16:51
Reporter: David Norman Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.5.14-log, 5.6.24 OS:Windows (Windows 7)
Assigned to: CPU Architecture:Any
Tags: leftjoin view inequality

[17 Sep 2014 16:47] David Norman
Description:
If you create a view with an inequality in, then left join to that view, and test the value of the inequality in the where clause, you get too many rows back.

How to repeat:
Run the following:

create table a
(
	id integer not null,
	primary key ( id )
);

create table b
(
	id integer not null,
	a_id integer not null,
	my_val integer not null,
	primary key ( id ),
	foreign key ( a_id ) references a( id )
);

insert into a values ( 1 ), ( 2 ), ( 3 ), ( 4 ), ( 5 ), ( 6 ), ( 7 ), ( 8 ), ( 9 ), ( 10 );

insert into b values ( 1, 1, 98 ), ( 2, 2, 99 ), ( 3, 3, 100 ), ( 4, 4, 98 ), ( 5, 5, 99 ), ( 6, 6, 100 );

create or replace view b_view as select b.id, b.a_id, b.my_val, if ( b.my_val <> 98, 0, 1 ) as is_x from b;

select * from b_view;

You get the following output:

+----+------+--------+------+
| id | a_id | my_val | is_x |
+----+------+--------+------+
|  1 |    1 |     98 |    1 |
|  2 |    2 |     99 |    0 |
|  3 |    3 |    100 |    0 |
|  4 |    4 |     98 |    1 |
|  5 |    5 |     99 |    0 |
|  6 |    6 |    100 |    0 |
+----+------+--------+------+

As expected, every row in b is listed, with is_x showing 1 for rows 1 and 4, and 0 for the other four rows. All OK so far. But

select a.id from a left join b_view on b_view.a_id = a.id where b_view.is_x = 1;

+----+
| id |
+----+
|  1 |
|  4 |
|  7 |
|  8 |
|  9 |
| 10 |
+----+

This should only output two rows - rows 1 and 4. These are the only rows for which is_x = 1 in b_view.

To prove I'm not doing something stupid, make a new table, copy the data from b_view into it, and do the equivalent query.

create table c
(
	id integer not null,
	a_id integer not null,
	my_val integer not null,
	is_x integer not null,
	primary key ( id ),
	foreign key ( a_id ) references a( id )
);

insert into c select * from b_view;

select a.id from a left join c on c.a_id = a.id where c.is_x = 1;

+----+
| id |
+----+
|  1 |
|  4 |
+----+

The result above should be the same as this, since c and b_view contain identical rows, and the query is the same.
[19 Sep 2014 14:22] MySQL Verification Team
// With CASE instead of IF

mysql> DROP TABLES b;
mysql>
mysql> create table b
    -> (
    -> id integer not null,
    -> a_id integer not null,
    -> my_val integer not null,
    -> primary key ( id ),
    -> foreign key ( a_id ) references a( id )
    -> );
Query OK, 0 rows affected (0.02 sec)

mysql> insert into a values ( 1 ), ( 2 ), ( 3 ), ( 4 ), ( 5 ), ( 6 ), ( 7 ), ( 8 ), ( 9 ), ( 10 );
Query OK, 10 rows affected (0.01 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> insert into b values ( 1, 1, 98 ), ( 2, 2, 99 ), ( 3, 3, 100 ), ( 4, 4, 98 ), ( 5, 5, 99 ), ( 6, 6, 100 );
Query OK, 6 rows affected (0.02 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql>
mysql> select * from a;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  4 |
|  5 |
|  6 |
|  7 |
|  8 |
|  9 |
| 10 |
+----+
10 rows in set (0.00 sec)

mysql> select * from b;
+----+------+--------+
| id | a_id | my_val |
+----+------+--------+
|  1 |    1 |     98 |
|  2 |    2 |     99 |
|  3 |    3 |    100 |
|  4 |    4 |     98 |
|  5 |    5 |     99 |
|  6 |    6 |    100 |
+----+------+--------+
6 rows in set (0.00 sec)

mysql> -- with CASE instead of IF
mysql>
mysql> drop view b_view;
mysql> create or replace view b_view as select b.id, b.a_id, b.my_val,  CASE b.my_val <> 98  WHEN 0 THEN '1' ELSE '0' END as is_x  from b;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from b_view;
+----+------+--------+------+
| id | a_id | my_val | is_x |
+----+------+--------+------+
|  1 |    1 |     98 | 1    |
|  2 |    2 |     99 | 0    |
|  3 |    3 |    100 | 0    |
|  4 |    4 |     98 | 1    |
|  5 |    5 |     99 | 0    |
|  6 |    6 |    100 | 0    |
+----+------+--------+------+
6 rows in set (0.00 sec)

mysql> select a.id from a left join b_view on b_view.a_id = a.id where b_view.is_x = 1;
+----+
| id |
+----+
|  1 |
|  4 |
+----+
2 rows in set (0.00 sec)
[20 Sep 2014 10:20] Luuk V
mysql> select a.id from a left join b_view on b_view.a_id = a.id where if ( b_view.my_val <> 98, 0, 1 ) = 1;
+----+
| id |
+----+
|  1 |
|  4 |
|  7 |
|  8 |
|  9 |
| 10 |
+----+
6 rows in set (0.00 sec)

problem seems not to be in IF (or in CASE), but in JOIN ?
[20 Sep 2014 10:54] David Norman
Responding to a couple of points.

Umesh Umesh: The reason your version works is not that you've used case rather than if, but because you've turned the logic around. To demonstrate this, see the following:

create or replace view b_view as select b.id, b.a_id, b.my_val,  CASE b.my_val <> 98  WHEN 1 THEN '0' ELSE '1' END as is_x  from b;

select * from b_view;

+----+------+--------+------+
| id | a_id | my_val | is_x |
+----+------+--------+------+
|  1 |    1 |     98 | 1    |
|  2 |    2 |     99 | 0    |
|  3 |    3 |    100 | 0    |
|  4 |    4 |     98 | 1    |
|  5 |    5 |     99 | 0    |
|  6 |    6 |    100 | 0    |
+----+------+--------+------+

select a.id from a left join b_view on b_view.a_id = a.id where b_view.is_x = 1;

+----+
| id |
+----+
|  1 |
|  4 |
|  7 |
|  8 |
|  9 |
| 10 |
+----+

Luuk V: Your example gives the output I would expect, as can be demonstrated using intermediate stages:

select a.id, b_view.my_val, ( b_view.my_val <> 98 ) as cond from a left join b_view on b_view.a_id = a.id;

+----+--------+------+
| id | my_val | cond |
+----+--------+------+
|  1 |     98 |    0 |
|  2 |     99 |    1 |
|  3 |    100 |    1 |
|  4 |     98 |    0 |
|  5 |     99 |    1 |
|  6 |    100 |    1 |
|  7 |   NULL | NULL |
|  8 |   NULL | NULL |
|  9 |   NULL | NULL |
| 10 |   NULL | NULL |
+----+--------+------+

select if ( null, 0, 1 );

+-------------------+
| if ( null, 0, 1 ) |
+-------------------+
|                 1 |
+-------------------+

So if ( b_view.my_val <> 98, 0, 1 ) is going to be 1 when my_val is 0 or null. So for your query, it should be listing rows 1, 4, 7, 8, 9, 10.

In fact, I've just realised how to show exactly where the problem is. Consider the following two queries:

select * from b_view;

+----+------+--------+------+
| id | a_id | my_val | is_x |
+----+------+--------+------+
|  1 |    1 |     98 | 1    |
|  2 |    2 |     99 | 0    |
|  3 |    3 |    100 | 0    |
|  4 |    4 |     98 | 1    |
|  5 |    5 |     99 | 0    |
|  6 |    6 |    100 | 0    |
+----+------+--------+------+

select b_view.* from a left join b_view on b_view.a_id = a.id;

+------+------+--------+------+
| id   | a_id | my_val | is_x |
+------+------+--------+------+
|    1 |    1 |     98 | 1    |
|    2 |    2 |     99 | 0    |
|    3 |    3 |    100 | 0    |
|    4 |    4 |     98 | 1    |
|    5 |    5 |     99 | 0    |
|    6 |    6 |    100 | 0    |
| NULL | NULL |   NULL | 1    |
| NULL | NULL |   NULL | 1    |
| NULL | NULL |   NULL | 1    |
| NULL | NULL |   NULL | 1    |
+------+------+--------+------+

Why is is_x = 1 in the last four rows? It should be null, since there is no such row in b_view.
[20 Sep 2014 11:50] Luuk V
Because you do not want the NULL-values, you should not do:
select a.id from a left join b_view on b_view.a_id = a.id where b_view.is_x = 1;

But do:
select a.id from a join b_view on b_view.a_id = a.id where b_view.is_x = 1;

Which still leaves /me clueless on why b_view.is_x returns 1 ....
[20 Sep 2014 13:59] David Norman
Luuk V: Yes, there are a number of possible work-arounds. But as I think is obvious, this is a very simple example to demonstrate the problem. The actual query was much more complex, and so using join rather than left join was not an option. And as ever with bugs like these, the real problem wasn't finding the solution. It was the hour or two I spent trying to work out why my code wasn't working, when in fact it was the database not behaving as expected.

You said "Which still leaves me clueless on why b_view.is_x returns 1". I'm pretty sure it's because "is_x" is calculated as "if ( b_view.my_val <> 98, 0, 1 )". Which when b_view.my_val is null, equates to "if ( null <> 98, 0, 1 )", which equates to 1.

But it shouldn't be doing that calculation. It should be taking the rows from b_view, not doing the calculations within b_view on rows which don't exist.
[20 Sep 2014 14:53] Luuk V
I would confirm this as a bug, (if i could) ;-)
[9 Oct 2014 15:16] MySQL Verification Team
This is definitely not a bug. MySQL follows strictly SQL standards, wherever applicable. In this case it is SQL-99, which clearly states that any expression comparing NULL returns NULL. 

Now, how does IF() behave in such situation. This is, on the other hand, clearly documented in our manual.  Here it is:

---------------------------------

 IF(expr1,expr2,expr3)

If expr1 is TRUE (expr1 <> 0 and expr1 <> NULL) then IF() returns expr2; otherwise it returns expr3.

----------------------------------

Hence, in this case when my_val == NULL, so does entire expr1 == NULL, hence the result is expr3, in this particular case expr3 being 1.

Not a Bug !!!!
[9 Oct 2014 18:13] Luuk V
the linked view does cannot be found, 
therefore the other fields in this table have value NULL,
The field defined as IF(,,) should therefore also return NULL

The value of the columns of that view is the output of the IF(,,) statement.
But there is NO record that complies to the (LEFT) JOIN

Tthe value within the joined record is 0 or 1, but because the join condition is not met, it should return NULL

I hope my explanation give some more input in this because it's hard to explain in English (for /me) why i still think it's a bug....
[9 Oct 2014 23:19] David Norman
Sinisa,

To restate what Luuk has said, but with the advantage of being a native English speaker...

I agree with you. The comparison with NULL should return NULL, and if ( NULL, x, y ) should return Y. However, this is not where the bug is. The bug is in the left join. 

To quote the MySQL documentation (section 13.2.7.1. JOIN Syntax), it states:

"If there is no matching record for the right table in the ON or USING part in a LEFT JOIN, a row with all columns set to NULL is used for the right table."

However, it is not doing this. It is not using a row with all columns set to NULL for the right table, it is using a row with is_x set to 1 and all other columns set to NULL in the right table.

Therefore it is a bug.

David.
[11 Oct 2014 9:21] Luuk V
mysql> select a.id, b_view.* from a left join b_view on b_view.a_id = a.id ;
+----+------+------+--------+------+
| id | id   | a_id | my_val | is_x |
+----+------+------+--------+------+
|  1 |    1 |    1 |     98 |    1 |
|  2 |    2 |    2 |     99 |    0 |
|  3 |    3 |    3 |    100 |    0 |
|  4 |    4 |    4 |     98 |    1 |
|  5 |    5 |    5 |     99 |    0 |
|  6 |    6 |    6 |    100 |    0 |
|  7 | NULL | NULL |   NULL |    1 |
|  8 | NULL | NULL |   NULL |    1 |
|  9 | NULL | NULL |   NULL |    1 |
| 10 | NULL | NULL |   NULL |    1 |
+----+------+------+--------+------+
10 rows in set (0.00 sec)

mysql>

For `a.id` 7 to 10 the value of `is_x` should not be 1

I also started a thread about it on comp.databases.mysql
(https://groups.google.com/forum/#!topic/comp.databases.mysql/zBdfnFJyPcQ)

They found out that MariaDB and SQLite3 give the same answer, and that MySQL gives a different answer.
[7 Apr 2015 7:20] MySQL Verification Team
Thank you for the feedback.

with 5.6.24:

mysql> select version();
+---------------------------------------+
| version()                             |
+---------------------------------------+
| 5.6.24-enterprise-commercial-advanced |
+---------------------------------------+
1 row in set (0.00 sec)

mysql> use test
Database changed
mysql> create table a
    -> (
    -> id integer not null,
    -> primary key ( id )
    -> );
Query OK, 0 rows affected (0.00 sec)

mysql>
mysql> create table b
    -> (
    -> id integer not null,
    -> a_id integer not null,
    -> my_val integer not null,
    -> primary key ( id ),
    -> foreign key ( a_id ) references a( id )
    -> );
Query OK, 0 rows affected (0.00 sec)

mysql> insert into a values ( 1 ), ( 2 ), ( 3 ), ( 4 ), ( 5 ), ( 6 ), ( 7 ), ( 8 ), ( 9 ), ( 10 );
Query OK, 10 rows affected (0.00 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql>
mysql> insert into b values ( 1, 1, 98 ), ( 2, 2, 99 ), ( 3, 3, 100 ), ( 4, 4, 98 ), ( 5, 5, 99 ), ( 6, 6, 100 );
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> create or replace view b_view as select b.id, b.a_id, b.my_val, if ( b.my_val <> 98, 0, 1 ) as is_x from b;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from b_view;
+----+------+--------+------+
| id | a_id | my_val | is_x |
+----+------+--------+------+
|  1 |    1 |     98 |    1 |
|  2 |    2 |     99 |    0 |
|  3 |    3 |    100 |    0 |
|  4 |    4 |     98 |    1 |
|  5 |    5 |     99 |    0 |
|  6 |    6 |    100 |    0 |
+----+------+--------+------+
6 rows in set (0.00 sec)

mysql> select a.id from a left join b_view on b_view.a_id = a.id where b_view.is_x = 1;
+----+
| id |
+----+
|  1 |
|  4 |
|  7 |
|  8 |
|  9 |
| 10 |
+----+
6 rows in set (0.00 sec)

mysql> create table c
    -> (
    -> id integer not null,
    -> a_id integer not null,
    -> my_val integer not null,
    -> is_x integer not null,
    -> primary key ( id ),
    -> foreign key ( a_id ) references a( id )
    -> );
Query OK, 0 rows affected (0.00 sec)

mysql> insert into c select * from b_view;
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> select a.id from a left join c on c.a_id = a.id where c.is_x = 1;
+----+
| id |
+----+
|  1 |
|  4 |
+----+
2 rows in set (0.00 sec)
[13 Jul 2015 16:51] Paul DuBois
Noted in 5.7.8, 5.8.0 changelogs.

Incorrect results could be produced tor views and derived tables on
the inner side of an outer join and from which non-nullable
expressions such as literals were selected.