MySQL Bugs: #79291: Issue with only_full_group_by and UNIQUE KEY with NULL values

Bug #79291	Issue with only_full_group_by and UNIQUE KEY with NULL values
Submitted:	16 Nov 2015 9:05	Modified:	20 Mar 2018 9:21
Reporter:	jocelyn fournier	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S4 (Feature request)
Version:	5.7.x	OS:	Any
Assigned to:	Guilhem Bichot	CPU Architecture:	Any
Tags:	ONLY_FULL_GROUP_BY

Description:
Hi,

When doing a WHERE on all the fields from an UNIQUE KEY containing nullable columns, with all the condition in the WHERE set to non null values, MySQL doesn't take into account the unicity, and only_full_group_by incorrectly detects a functionally dependent column.

It's also easily visible on JOIN, where MySQL choses a ref optimization instead of eq_ref.

BR,
  Jocelyn Fournier

How to repeat:
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (`id_category` int(10) unsigned NOT NULL DEFAULT 0, id_product int(10) unsigned NOT NULL DEFAULT 0, PRIMARY KEY (`id_category`,`id_product`));
INSERT INTO t1 VALUES (2,1), (1,1), (2,2);
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 (`id_product` int(10) unsigned NOT NULL AUTO_INCREMENT, PRIMARY KEY (`id_product`));
INSERT INTO t2 VALUES (1),(2);
DROP TABLE IF EXISTS t3;
CREATE TABLE t3 ( `id_product` int(10) unsigned NOT NULL DEFAULT 0, `id_product_attribute` int(10) unsigned NOT NULL DEFAULT 0, `default_on` tinyint(1) unsigned DEFAULT NULL,  `id_shop` int(10) unsigned NOT NULL DEFAULT 0,  PRIMARY KEY (`id_product_attribute`,`id_shop`),UNIQUE KEY `id_product` (`id_product`,`id_shop`,`default_on`));
INSERT INTO t3 VALUES (1,2,1,1),(1,1,NULL,1);

SELECT * FROM `t1` cp STRAIGHT_JOIN `t2` p FORCE INDEX (PRIMARY) ON p.`id_product` = cp.`id_product` STRAIGHT_JOIN `t3` product_attribute_shop FORCE INDEX (id_product) ON (p.`id_product` = product_attribute_shop.`id_product` AND product_attribute_shop.`default_on` = 1 AND product_attribute_shop.id_shop=1) WHERE default_on IS NOT NULL AND cp.`id_category` = 2  GROUP BY cp.id_product; 

=> ERROR 1055 (42000): Expression #5 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'prestashop.product_attribute_shop.id_product_attribute' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

EXPLAIN SELECT * FROM `t1` cp STRAIGHT_JOIN `t2` p FORCE INDEX (PRIMARY) ON p.`id_product` = cp.`id_product` STRAIGHT_JOIN `t3` product_attribute_shop FORCE INDEX (id_product) ON (p.`id_product` = product_attribute_shop.`id_product` AND product_attribute_shop.`default_on` = 1 AND product_attribute_shop.id_shop=1) WHERE default_on IS NOT NULL AND cp.`id_category` = 2  GROUP BY cp.id_product;
+------+-------------+------------------------+--------+---------------+------------+---------+--------------------------------+------+--------------------------+
| id   | select_type | table                  | type   | possible_keys | key        | key_len | ref                            | rows | Extra                    |
+------+-------------+------------------------+--------+---------------+------------+---------+--------------------------------+------+--------------------------+
|    1 | SIMPLE      | cp                     | ref    | PRIMARY       | PRIMARY    | 4       | const                          |    2 | Using where; Using index |
|    1 | SIMPLE      | p                      | eq_ref | PRIMARY       | PRIMARY    | 4       | test.cp.id_product             |    1 | Using index              |
|    1 | SIMPLE      | product_attribute_shop | ref    | id_product    | id_product | 10      | test.cp.id_product,const,const |    1 | Using index              |
+------+-------------+------------------------+--------+---------------+------------+---------+--------------------------------+------+--------------------------+

SELECT * FROM t3 WHERE id_product=1 AND default_on=1 AND id_shop=1 GROUP BY id_product;
ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'prestashop.t3.id_product_attribute' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

EXPLAIN SELECT * FROM t3 WHERE id_product=1 AND default_on=1 AND id_shop=1 GROUP BY id_product;
+------+-------------+-------+-------+---------------+------------+---------+-------------------+------+-------------+
| id   | select_type | table | type  | possible_keys | key        | key_len | ref               | rows | Extra       |
+------+-------------+-------+-------+---------------+------------+---------+-------------------+------+-------------+
|    1 | SIMPLE      | t3    | const | id_product    | id_product | 10      | const,const,const |    1 | Using index |
+------+-------------+-------+-------+---------------+------------+---------+-------------------+------+-------------+

Suggested fix:
No only_full_group_by error, because the result / JOIN is unique.

Bonjour Jocelyn!
I can't believe it, after so many years you're still reporting bugs (thanks), and I'm still looking at them! ;-)
Your case looks obvious to a human, of course.
I spent two hours this morning reading various parts of the SQL2011 standard (which is what I used as model to code the "functional dependency" functionality in 5.7). Functional dependencies (here abbreviated FD) spread over 10 pages in the Standard, but your case isn't covered. The Standard works "from the inside out": FDs derived from an index are defined only in the base table, from there they spread to the query's result; if I note (a,b) your unique index of t3 where "b" is nullable, and "c" another column of t3, the FD "{a,b} determine {c}" doesn't hold in the content of t3 (as "b" is nullable). It's only after filtering by WHERE that "b" cannot be NULL, in the result of the query, but the Standard doesn't "reinject that" back into the index nullability considerations. I have sympathy for the Standard, as it has to deal with lots of cases, which make the topic of FDs quite hairy: multiple nested outer joins, views, derived tables... I have been bitten several times in this work by such corner cases, which defeat any quick reasoning (NULLs cause hell here).
In your case, it's a pure inner join, so yes you're right. You have a valid feature request (I wouldn't call it a real bug, as we're fine with SQL2011).
Before I give it more thought: are the queries above a sufficient sample of your practical problem, or do you have more? More complex ones? Using outer joins, views, derived tables?
Thanks.

Bonjour Guilhem :)

It's been a while, indeed, we are getting older! And I'm definitely happy to see you still fixing bugs ;)
This code is a simplified query coming from Prestashop, which is not working properly anymore on MySQL 5.7 because of the only_full_group_by feature. As a quick fix, they are now removing the only_full_group_by flag during the mysql session init, which is sad since this feature is really useful.
AFAIK, my queries are sufficient samples, there's no views / derived tables involved. 

Thanks!

and... no left join or right join?

Actually the original query is :

SELECT p.*, product_shop.*, stock.out_of_stock, IFNULL(stock.quantity, 0) AS quantity, IFNULL(product_attribute_shop.id_product_attribute, 0) AS id_product_attribute,
product_attribute_shop.minimal_quantity AS product_attribute_minimal_quantity, pl.`description`, pl.`description_short`, pl.`available_now`,
pl.`available_later`, pl.`link_rewrite`, pl.`meta_description`, pl.`meta_keywords`, pl.`meta_title`, pl.`name`, image_shop.`id_image` id_image,
il.`legend` as legend, m.`name` AS manufacturer_name, cl.`name` AS category_default,
DATEDIFF(product_shop.`date_add`, DATE_SUB("2015-11-16 00:00:00",
INTERVAL 15 DAY)) > 0 AS new, product_shop.price AS orderprice
FROM `ps_category_product` cp
LEFT JOIN `ps_product` p
ON p.`id_product` = cp.`id_product`
 INNER JOIN ps_product_shop product_shop
ON (product_shop.id_product = p.id_product AND product_shop.id_shop = 1) LEFT JOIN `ps_product_attribute_shop` product_attribute_shop
ON (p.`id_product` = product_attribute_shop.`id_product` AND product_attribute_shop.`default_on` = 1 AND product_attribute_shop.id_shop=1)
 LEFT JOIN ps_stock_available stock
ON (stock.id_product = p.id_product AND stock.id_product_attribute = 0 AND stock.id_shop = 1  AND stock.id_shop_group = 0  )
LEFT JOIN `ps_category_lang` cl
ON (product_shop.`id_category_default` = cl.`id_category`
AND cl.`id_lang` = 5 AND cl.id_shop = 1 )
LEFT JOIN `ps_product_lang` pl
ON (p.`id_product` = pl.`id_product`
AND pl.`id_lang` = 5 AND pl.id_shop = 1 )
LEFT JOIN `ps_image_shop` image_shop
ON (image_shop.`id_product` = p.`id_product` AND image_shop.cover=1 AND image_shop.id_shop=1)
LEFT JOIN `ps_image_lang` il
ON (image_shop.`id_image` = il.`id_image`
AND il.`id_lang` = 5)
LEFT JOIN `ps_manufacturer` m
ON m.`id_manufacturer` = p.`id_manufacturer`
WHERE product_shop.`id_shop` = 1
AND cp.`id_category` = 2 AND product_shop.`active` = 1 AND product_shop.`visibility` IN ("both", "catalog") GROUP BY cp.id_product ORDER BY cp.`position` ASC

So the original query contains LEFT JOIN, not INNER JOIN.

Ok. More questions: in the query above,
what is the table with the index on a NULLable column (the equivalent of t3 in your first example),
which columns form this index,
and which column of the index is NULLable but has an equality (and is posing a problem to MySQL)?

On the query above, it's ps_product_attribute_shop which has a nullable column (default_on), and UNIQUE index on (id_product, id_shop, default_on) and also ps_image_shop (cover can be NULL), and UNIQUE index on (id_product, id_shop, cover).

EXPLAIN output : 

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: product_shop
         type: ALL
possible_keys: PRIMARY,id_shop
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 7
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: prestashop-1610.product_shop.id_product
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: product_attribute_shop
         type: ref
possible_keys: id_product
          key: id_product
      key_len: 10
          ref: prestashop-1610.product_shop.id_product,const,const
         rows: 1
        Extra: 
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: stock
         type: eq_ref
possible_keys: product_sqlstock,id_shop,id_shop_group,id_product,id_product_attribute
          key: product_sqlstock
      key_len: 16
          ref: prestashop-1610.product_shop.id_product,const,const,const
         rows: 1
        Extra: 
*************************** 5. row ***************************
           id: 1
  select_type: SIMPLE
        table: cl
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 12
          ref: prestashop-1610.product_shop.id_category_default,const,const
         rows: 1
        Extra: Using where
*************************** 6. row ***************************
           id: 1
  select_type: SIMPLE
        table: pl
         type: eq_ref
possible_keys: PRIMARY,id_lang
          key: PRIMARY
      key_len: 12
          ref: prestashop-1610.product_shop.id_product,const,const
         rows: 1
        Extra: 
*************************** 7. row ***************************
           id: 1
  select_type: SIMPLE
        table: image_shop
         type: ref
possible_keys: id_product,id_shop
          key: id_product
      key_len: 10
          ref: prestashop-1610.product_shop.id_product,const,const
         rows: 1
        Extra: Using index
*************************** 8. row ***************************
           id: 1
  select_type: SIMPLE
        table: il
         type: eq_ref
possible_keys: PRIMARY,id_image
          key: PRIMARY
      key_len: 8
          ref: prestashop-1610.image_shop.id_image,const
         rows: 1
        Extra: Using where
*************************** 9. row ***************************
           id: 1
  select_type: SIMPLE
        table: m
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: prestashop-1610.p.id_manufacturer
         rows: 1
        Extra: Using where
*************************** 10. row ***************************
           id: 1
  select_type: SIMPLE
        table: cp
         type: eq_ref
possible_keys: PRIMARY,id_product
          key: PRIMARY
      key_len: 8
          ref: const,prestashop-1610.product_shop.id_product
         rows: 1
        Extra: 
10 rows in set (0.00 sec)

Hello Jocelyn, thanks for the details.

The query has GROUP BY but no aggregate functions; I guess it's because it's simplified? if it's the real query then GROUP BY could be replaced by DISTINCT, it would solve the problem :-)

If GROUP BY has to stay, the short-term solutions are:

- if many queries are affected, turn off only_full_group_by (which has
been done already, as you wrote)

- if few queries are affected, use any_value():
http://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_any-value
"This function is useful for GROUP BY queries when the
ONLY_FULL_GROUP_BY SQL mode is enabled, for cases when MySQL rejects a
query that you know is valid for reasons that MySQL cannot
determine."
It is better than the first alternative, because it documents the
problematic cases, and newly written queries can use
only_full_group_by.

Long term, it would be good if MySQL handled the problem you described.
I know for sure this can't happen in 5.7 (5.7 is GA and the problem is not trivial). It's not guaranteed for post-5.7 either, again because it's not trivial: should we spend weeks on this, or would those weeks be more profitable if spent on some other feature...
For the record, I'm writing here my design ideas.

FUNCTIONAL DEPENDENCY DETECTION IN UNIQUE CONSTRAINT WITH NULLABLE COLUMNS

Summary
=======
Feature request: inferring functional dependencies from a UNIQUE constraint having nullable columns, when these columns are actually not nullable in the query's result, due to predicates in the query.
Not required by SQL 2011.

Discussion
=========
In SQL 2011, no functional dependency can be inferred from a UNIQUE constraint having nullable columns. It's because multiple NULLs are allowed in the constrained columns, for example:
create table t(a int null, b int, c int, unique(a,b));
insert into t values (null, 1, 4);
insert into t values (null, 1, 5);
we see that the first and second fields field don't determine the third , so it would be incorrect to do:
select a,b,c from t group by a,b;

However, if we exclude null values of "a" then a human can see that there is a functional dependency in the result of the WHERE clause - group columns do determine {c}:
select a,b,c from (select * from t where a is not null) as D group by a,b;
select a,b,c from t  where a is not null group by a,b;
select a,b,c from t  where a > 3 group by a,b;
select a,b,c from t  where a = 3 group by b;

SQL 2011 does not require that this dependency be recognized. The feature request is to recognize it, including in more complex queries. Here is some more formal logic, below.

Let's note filter(x) a filtering predicate of the form "x is not null".
Let's note C a condition involving a column "x", which cannot be true if "x" is NULL: then C is equivalent to "C AND filter (x)".
Let's note T a table (base table, or result of a join, derived table, view). (T filter (X)) means "the rows of T where x is not null".

Filtering would be discovered at the "seed" place where C is, either WHERE or join condition, there are three cases:

FROM T WHERE C
is equivalent to
FROM (T filter (X)) WHERE C

TL JOIN TR ON C
is equivalent to
(TL FILTER (X)) JOIN (TR FILTER (X)) ON C

TL LEFT JOIN TR ON C
is equivalent to
TL LEFT JOIN (TR FILTER (X)) ON C

Then the filtering would propagate from the outside to the inside:
(TL [LEFT] JOIN TR ON does_not_matter) filter (X)
is equivalent to
(TL FILTER (X)) [LEFT] JOIN (TR FILTER (X)) ON does_not_matter

Assuming we only deal with base tables (the case of views, derived tables, and generated columns deserves more investigation), this propagation may finally reach the base table containing X, so at its place in the query, such table T can be replaced with (T filter (X)), so any UNIQUE constraint containing the nullable X can be treated as if X were not nullable, for the purpose of FD detection, and might thus produce new functional dependencies.

See also the SQL 2011 optional Feature T101, “Enhanced nullability determination”.

Hi Guilhem,

There are indeed cases where an aggregated function could appear in this query.
As for the ANY_VALUE() it's not really an option since it's not supported in MariaDB, and Prestashop has to work in both MySQL and MariaDB world :)
The main question is : with this kind of issues, is it still a good idea to have only_full_group_by enabled by default in 5.7 ?

BTW, do you think the filtering logic could be extended to the optimizer to change the optimize plan in an eq_ref instead of ref in case of a join ?

Good evening Jocelyn,

Regarding any_value(): I understand the operational problem. We knew that there would be existing queries which would fail; so we added any_value() for that exact scenario; but what happens in other products isn't controlled by us - what they port, what they don't...

Regarding only_full_group_by as default: the logic implemented in MySQL, even though not perfect, is the most advanced that I have seen when I looked at some other prominent DBMS' documentation. We're quite close to all functional dependencies defined in the Standard. Some DBMS' are still at "your selected columns must be in the GROUP BY", some are at "we detect FD based on a primary key". MySQL screens equalities, views, etc:
http://dev.mysql.com/doc/refman/5.7/en/group-by-functional-dependence.html
That was a big job, and should be fine for many queries (this bug report is the first we got). An existing app can indeed get problems, which would vote for not having it as default, however the existing app has workarounds: any_value(), or set only_full_group_by back to off in the session or globally. Having it on by default is very good for all our new users, those who blindly use the default settings; that's the win. In the last years we regularly got bug reports like "my query returns random data", because the user has a wrong grouped query. And MySQL was blamed for being too permissive. With only_full_group_by as default this won't happen anymore.
It's like when we turned "strict sql_mode" on by default in 5.6.
Googling for "5.7 only_full_group_by" I see, on the first page:
- positive comments, and
- reports of breaking applications, followed by a fix by the app maintainer.
So there was a decision to take, we took it... Knowing that we wouldn't satisfy 100% of existing and new users. Overall we want to be marching towards SQL compliance (which implies getting stricter).

Regarding eq_ref, you're right, I'm extending the design with:

CONTINUATION OF FEATURE REQUEST:
use the detected "non-nullability of the index in this query" to use eq_ref instead of ref, on the index.

There is another workaround, which would work with both MySQL and MariaDB: wrap the column (which MySQL complains is not dertermined) with MAX() (or MIN(), or GROUP_CONCAT()).

Bonjour Jocelyn. I started working on this bug finally. In my prototype I have reached a point where the queries in "how-to-repeat" are accepted. Do you have other examples of queries which I could test? I'd need the SQL to create necessary tables, too.

Bonjour Guilhem,

I'll take a quick look, and will let you know shortly.

Cheers,
  Jocelyn

Bonjour Jocelyn. The patch is entering its testing phase by QA (no idea yet of when it will be released). If you have the queries around, please let me know soon. Thanks.

Bonjour Guilhem,

Sorry for the late reply, no other example so far!

Documented fix in the MySQL 8.0.11 changelog as follows:

    When selecting from all columns making up a UNIQUE KEY containing
    nullable columns, with all in the WHERE condition set to non-null
    values, MySQL did not take into account their uniqueness, with the
    result that only_full_group_by incorrectly detected a functionally
    dependent column where there was none.   

Closed.