Bug #79291 | Issue with only_full_group_by and UNIQUE KEY with NULL values | ||
---|---|---|---|
Submitted: | 16 Nov 2015 9:05 | Modified: | 20 Mar 2018 9:21 |
Reporter: | jocelyn fournier | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S4 (Feature request) |
Version: | 5.7.x | OS: | Any |
Assigned to: | Guilhem Bichot | CPU Architecture: | Any |
Tags: | ONLY_FULL_GROUP_BY |
[16 Nov 2015 9:05]
jocelyn fournier
[26 Nov 2015 14:09]
Guilhem Bichot
Bonjour Jocelyn! I can't believe it, after so many years you're still reporting bugs (thanks), and I'm still looking at them! ;-) Your case looks obvious to a human, of course. I spent two hours this morning reading various parts of the SQL2011 standard (which is what I used as model to code the "functional dependency" functionality in 5.7). Functional dependencies (here abbreviated FD) spread over 10 pages in the Standard, but your case isn't covered. The Standard works "from the inside out": FDs derived from an index are defined only in the base table, from there they spread to the query's result; if I note (a,b) your unique index of t3 where "b" is nullable, and "c" another column of t3, the FD "{a,b} determine {c}" doesn't hold in the content of t3 (as "b" is nullable). It's only after filtering by WHERE that "b" cannot be NULL, in the result of the query, but the Standard doesn't "reinject that" back into the index nullability considerations. I have sympathy for the Standard, as it has to deal with lots of cases, which make the topic of FDs quite hairy: multiple nested outer joins, views, derived tables... I have been bitten several times in this work by such corner cases, which defeat any quick reasoning (NULLs cause hell here). In your case, it's a pure inner join, so yes you're right. You have a valid feature request (I wouldn't call it a real bug, as we're fine with SQL2011). Before I give it more thought: are the queries above a sufficient sample of your practical problem, or do you have more? More complex ones? Using outer joins, views, derived tables? Thanks.
[26 Nov 2015 14:30]
jocelyn fournier
Bonjour Guilhem :) It's been a while, indeed, we are getting older! And I'm definitely happy to see you still fixing bugs ;) This code is a simplified query coming from Prestashop, which is not working properly anymore on MySQL 5.7 because of the only_full_group_by feature. As a quick fix, they are now removing the only_full_group_by flag during the mysql session init, which is sad since this feature is really useful. AFAIK, my queries are sufficient samples, there's no views / derived tables involved. Thanks!
[26 Nov 2015 14:57]
Guilhem Bichot
and... no left join or right join?
[26 Nov 2015 15:18]
jocelyn fournier
Actually the original query is : SELECT p.*, product_shop.*, stock.out_of_stock, IFNULL(stock.quantity, 0) AS quantity, IFNULL(product_attribute_shop.id_product_attribute, 0) AS id_product_attribute, product_attribute_shop.minimal_quantity AS product_attribute_minimal_quantity, pl.`description`, pl.`description_short`, pl.`available_now`, pl.`available_later`, pl.`link_rewrite`, pl.`meta_description`, pl.`meta_keywords`, pl.`meta_title`, pl.`name`, image_shop.`id_image` id_image, il.`legend` as legend, m.`name` AS manufacturer_name, cl.`name` AS category_default, DATEDIFF(product_shop.`date_add`, DATE_SUB("2015-11-16 00:00:00", INTERVAL 15 DAY)) > 0 AS new, product_shop.price AS orderprice FROM `ps_category_product` cp LEFT JOIN `ps_product` p ON p.`id_product` = cp.`id_product` INNER JOIN ps_product_shop product_shop ON (product_shop.id_product = p.id_product AND product_shop.id_shop = 1) LEFT JOIN `ps_product_attribute_shop` product_attribute_shop ON (p.`id_product` = product_attribute_shop.`id_product` AND product_attribute_shop.`default_on` = 1 AND product_attribute_shop.id_shop=1) LEFT JOIN ps_stock_available stock ON (stock.id_product = p.id_product AND stock.id_product_attribute = 0 AND stock.id_shop = 1 AND stock.id_shop_group = 0 ) LEFT JOIN `ps_category_lang` cl ON (product_shop.`id_category_default` = cl.`id_category` AND cl.`id_lang` = 5 AND cl.id_shop = 1 ) LEFT JOIN `ps_product_lang` pl ON (p.`id_product` = pl.`id_product` AND pl.`id_lang` = 5 AND pl.id_shop = 1 ) LEFT JOIN `ps_image_shop` image_shop ON (image_shop.`id_product` = p.`id_product` AND image_shop.cover=1 AND image_shop.id_shop=1) LEFT JOIN `ps_image_lang` il ON (image_shop.`id_image` = il.`id_image` AND il.`id_lang` = 5) LEFT JOIN `ps_manufacturer` m ON m.`id_manufacturer` = p.`id_manufacturer` WHERE product_shop.`id_shop` = 1 AND cp.`id_category` = 2 AND product_shop.`active` = 1 AND product_shop.`visibility` IN ("both", "catalog") GROUP BY cp.id_product ORDER BY cp.`position` ASC So the original query contains LEFT JOIN, not INNER JOIN.
[26 Nov 2015 15:57]
Guilhem Bichot
Ok. More questions: in the query above, what is the table with the index on a NULLable column (the equivalent of t3 in your first example), which columns form this index, and which column of the index is NULLable but has an equality (and is posing a problem to MySQL)?
[26 Nov 2015 16:14]
jocelyn fournier
On the query above, it's ps_product_attribute_shop which has a nullable column (default_on), and UNIQUE index on (id_product, id_shop, default_on) and also ps_image_shop (cover can be NULL), and UNIQUE index on (id_product, id_shop, cover). EXPLAIN output : *************************** 1. row *************************** id: 1 select_type: SIMPLE table: product_shop type: ALL possible_keys: PRIMARY,id_shop key: NULL key_len: NULL ref: NULL rows: 7 Extra: Using where; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: p type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: prestashop-1610.product_shop.id_product rows: 1 Extra: *************************** 3. row *************************** id: 1 select_type: SIMPLE table: product_attribute_shop type: ref possible_keys: id_product key: id_product key_len: 10 ref: prestashop-1610.product_shop.id_product,const,const rows: 1 Extra: *************************** 4. row *************************** id: 1 select_type: SIMPLE table: stock type: eq_ref possible_keys: product_sqlstock,id_shop,id_shop_group,id_product,id_product_attribute key: product_sqlstock key_len: 16 ref: prestashop-1610.product_shop.id_product,const,const,const rows: 1 Extra: *************************** 5. row *************************** id: 1 select_type: SIMPLE table: cl type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 12 ref: prestashop-1610.product_shop.id_category_default,const,const rows: 1 Extra: Using where *************************** 6. row *************************** id: 1 select_type: SIMPLE table: pl type: eq_ref possible_keys: PRIMARY,id_lang key: PRIMARY key_len: 12 ref: prestashop-1610.product_shop.id_product,const,const rows: 1 Extra: *************************** 7. row *************************** id: 1 select_type: SIMPLE table: image_shop type: ref possible_keys: id_product,id_shop key: id_product key_len: 10 ref: prestashop-1610.product_shop.id_product,const,const rows: 1 Extra: Using index *************************** 8. row *************************** id: 1 select_type: SIMPLE table: il type: eq_ref possible_keys: PRIMARY,id_image key: PRIMARY key_len: 8 ref: prestashop-1610.image_shop.id_image,const rows: 1 Extra: Using where *************************** 9. row *************************** id: 1 select_type: SIMPLE table: m type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: prestashop-1610.p.id_manufacturer rows: 1 Extra: Using where *************************** 10. row *************************** id: 1 select_type: SIMPLE table: cp type: eq_ref possible_keys: PRIMARY,id_product key: PRIMARY key_len: 8 ref: const,prestashop-1610.product_shop.id_product rows: 1 Extra: 10 rows in set (0.00 sec)
[27 Nov 2015 12:44]
Guilhem Bichot
Hello Jocelyn, thanks for the details. The query has GROUP BY but no aggregate functions; I guess it's because it's simplified? if it's the real query then GROUP BY could be replaced by DISTINCT, it would solve the problem :-) If GROUP BY has to stay, the short-term solutions are: - if many queries are affected, turn off only_full_group_by (which has been done already, as you wrote) - if few queries are affected, use any_value(): http://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_any-value "This function is useful for GROUP BY queries when the ONLY_FULL_GROUP_BY SQL mode is enabled, for cases when MySQL rejects a query that you know is valid for reasons that MySQL cannot determine." It is better than the first alternative, because it documents the problematic cases, and newly written queries can use only_full_group_by. Long term, it would be good if MySQL handled the problem you described. I know for sure this can't happen in 5.7 (5.7 is GA and the problem is not trivial). It's not guaranteed for post-5.7 either, again because it's not trivial: should we spend weeks on this, or would those weeks be more profitable if spent on some other feature... For the record, I'm writing here my design ideas. FUNCTIONAL DEPENDENCY DETECTION IN UNIQUE CONSTRAINT WITH NULLABLE COLUMNS Summary ======= Feature request: inferring functional dependencies from a UNIQUE constraint having nullable columns, when these columns are actually not nullable in the query's result, due to predicates in the query. Not required by SQL 2011. Discussion ========= In SQL 2011, no functional dependency can be inferred from a UNIQUE constraint having nullable columns. It's because multiple NULLs are allowed in the constrained columns, for example: create table t(a int null, b int, c int, unique(a,b)); insert into t values (null, 1, 4); insert into t values (null, 1, 5); we see that the first and second fields field don't determine the third , so it would be incorrect to do: select a,b,c from t group by a,b; However, if we exclude null values of "a" then a human can see that there is a functional dependency in the result of the WHERE clause - group columns do determine {c}: select a,b,c from (select * from t where a is not null) as D group by a,b; select a,b,c from t where a is not null group by a,b; select a,b,c from t where a > 3 group by a,b; select a,b,c from t where a = 3 group by b; SQL 2011 does not require that this dependency be recognized. The feature request is to recognize it, including in more complex queries. Here is some more formal logic, below. Let's note filter(x) a filtering predicate of the form "x is not null". Let's note C a condition involving a column "x", which cannot be true if "x" is NULL: then C is equivalent to "C AND filter (x)". Let's note T a table (base table, or result of a join, derived table, view). (T filter (X)) means "the rows of T where x is not null". Filtering would be discovered at the "seed" place where C is, either WHERE or join condition, there are three cases: FROM T WHERE C is equivalent to FROM (T filter (X)) WHERE C TL JOIN TR ON C is equivalent to (TL FILTER (X)) JOIN (TR FILTER (X)) ON C TL LEFT JOIN TR ON C is equivalent to TL LEFT JOIN (TR FILTER (X)) ON C Then the filtering would propagate from the outside to the inside: (TL [LEFT] JOIN TR ON does_not_matter) filter (X) is equivalent to (TL FILTER (X)) [LEFT] JOIN (TR FILTER (X)) ON does_not_matter Assuming we only deal with base tables (the case of views, derived tables, and generated columns deserves more investigation), this propagation may finally reach the base table containing X, so at its place in the query, such table T can be replaced with (T filter (X)), so any UNIQUE constraint containing the nullable X can be treated as if X were not nullable, for the purpose of FD detection, and might thus produce new functional dependencies. See also the SQL 2011 optional Feature T101, “Enhanced nullability determination”.
[27 Nov 2015 13:33]
jocelyn fournier
Hi Guilhem, There are indeed cases where an aggregated function could appear in this query. As for the ANY_VALUE() it's not really an option since it's not supported in MariaDB, and Prestashop has to work in both MySQL and MariaDB world :) The main question is : with this kind of issues, is it still a good idea to have only_full_group_by enabled by default in 5.7 ? BTW, do you think the filtering logic could be extended to the optimizer to change the optimize plan in an eq_ref instead of ref in case of a join ?
[27 Nov 2015 16:40]
Guilhem Bichot
Good evening Jocelyn, Regarding any_value(): I understand the operational problem. We knew that there would be existing queries which would fail; so we added any_value() for that exact scenario; but what happens in other products isn't controlled by us - what they port, what they don't... Regarding only_full_group_by as default: the logic implemented in MySQL, even though not perfect, is the most advanced that I have seen when I looked at some other prominent DBMS' documentation. We're quite close to all functional dependencies defined in the Standard. Some DBMS' are still at "your selected columns must be in the GROUP BY", some are at "we detect FD based on a primary key". MySQL screens equalities, views, etc: http://dev.mysql.com/doc/refman/5.7/en/group-by-functional-dependence.html That was a big job, and should be fine for many queries (this bug report is the first we got). An existing app can indeed get problems, which would vote for not having it as default, however the existing app has workarounds: any_value(), or set only_full_group_by back to off in the session or globally. Having it on by default is very good for all our new users, those who blindly use the default settings; that's the win. In the last years we regularly got bug reports like "my query returns random data", because the user has a wrong grouped query. And MySQL was blamed for being too permissive. With only_full_group_by as default this won't happen anymore. It's like when we turned "strict sql_mode" on by default in 5.6. Googling for "5.7 only_full_group_by" I see, on the first page: - positive comments, and - reports of breaking applications, followed by a fix by the app maintainer. So there was a decision to take, we took it... Knowing that we wouldn't satisfy 100% of existing and new users. Overall we want to be marching towards SQL compliance (which implies getting stricter). Regarding eq_ref, you're right, I'm extending the design with: CONTINUATION OF FEATURE REQUEST: use the detected "non-nullability of the index in this query" to use eq_ref instead of ref, on the index.
[28 Nov 2015 8:36]
Guilhem Bichot
There is another workaround, which would work with both MySQL and MariaDB: wrap the column (which MySQL complains is not dertermined) with MAX() (or MIN(), or GROUP_CONCAT()).
[17 Jan 2018 12:50]
Guilhem Bichot
Bonjour Jocelyn. I started working on this bug finally. In my prototype I have reached a point where the queries in "how-to-repeat" are accepted. Do you have other examples of queries which I could test? I'd need the SQL to create necessary tables, too.
[17 Jan 2018 13:00]
jocelyn fournier
Bonjour Guilhem, I'll take a quick look, and will let you know shortly. Cheers, Jocelyn
[22 Jan 2018 9:49]
Guilhem Bichot
Bonjour Jocelyn. The patch is entering its testing phase by QA (no idea yet of when it will be released). If you have the queries around, please let me know soon. Thanks.
[25 Jan 2018 23:37]
jocelyn fournier
Bonjour Guilhem, Sorry for the late reply, no other example so far!
[20 Mar 2018 9:21]
Jon Stephens
Documented fix in the MySQL 8.0.11 changelog as follows: When selecting from all columns making up a UNIQUE KEY containing nullable columns, with all in the WHERE condition set to non-null values, MySQL did not take into account their uniqueness, with the result that only_full_group_by incorrectly detected a functionally dependent column where there was none. Closed.