Bug #91665 CTE misbehaves with 38 digit keys
Submitted: 16 Jul 2018 13:45 Modified: 19 Jul 2018 13:29
Reporter: Anthony Marston Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Data Types Severity:S3 (Non-critical)
Version:8.0.11 OS:Windows (Windows 7)
Assigned to: CPU Architecture:Other (x64)
Tags: cte

[16 Jul 2018 13:45] Anthony Marston
Description:
I have a table with keys which are decimal(38,0) and which contains a hierarchy of containers which I am extracting using a CTE (Common Table Expression). Instead of always displaying the tree with all nodes automatically expanded I use a mechanism which allows nodes to be expanded (to include all child nodes) or collapsed (exclude all child nodes). I have noticed that sometimes nodes are being expanded even when not selected.

How to repeat:
CREATE TABLE IF NOT EXISTS `container` (
  `container_id` decimal(38,0) unsigned NOT NULL,
  `container_name` varchar(80) NOT NULL,
  `container_id_snr` decimal(38,0) unsigned DEFAULT NULL,
  PRIMARY KEY (`container_id`),
  KEY `container_id_snr` (`container_id_snr`)
) ENGINE=InnoDB;

INSERT INTO `container` values
(10101000000000000000000000000001,'Cabinet #1',null),
(10101000000000000000000000000002,'Cabinet #2',null),
(10101000000000000000000000000003,'Shelf #1',10101000000000000000000000000001),
(10101000000000000000000000000004,'Shelf #2',10101000000000000000000000000002),
(10101000000000000000000000000005,'Box #1',10101000000000000000000000000003),
(10101000000000000000000000000006,'Box #2',10101000000000000000000000000004);

WITH RECURSIVE cte (level, container_id, container_name, container_id_snr, path)
AS (
  SELECT 1 AS level, container_id, container_name, container_id_snr, CAST(container.container_name AS CHAR(4000)) AS path
   FROM container
  WHERE container_id_snr IS NULL
UNION ALL
  SELECT level+1, container.container_id, container.container_name, container.container_id_snr, CONCAT(cte.path, '/', container.container_name) AS path
   FROM cte
  LEFT JOIN container ON (cte.container_id = container.container_id_snr)
-- WHERE container.container_id_snr IN ('0')  -- selects 1 and 2
-- WHERE container.container_id_snr IN ('10101000000000000000000000000001')  -- selects 1,3 and 2
WHERE container.container_id_snr IN ('10101000000000000000000000000001','10101000000000000000000000000003') -- selects 1,3,5 and 2 (but 4 and 6 show up as well)
)
SELECT SQL_CALC_FOUND_ROWS cte.*
FROM cte
ORDER BY path

This error shows up when the numbers are enclosed in single quotes, but if I remove the quotes then the error disappears.

Suggested fix:
Treat 38 digit numbers as numbers even when the values are enclosed in quotes.
[17 Jul 2018 12:48] MySQL Verification Team
Hi,

Thank you for your report.

Our Reference Manual clearly explains in chapter 11, more precisely in subchapter 11.12, that all quoted values and quoted constants are always treated as strings. If those constants are used in comparisons or in the expressions, then conversion occurs. One DIGITAL and one VARCHAR type are usually converted to the floating point, or to VARCHAR, depending on factors described in the referenced subchapter.

This is expected behaviour and this is how MySQL is designed. It is much more flexible then some other products that would return the error.

Hence, you can either remove quotes, or CAST those quoted constants into the DECIMAL(38,0) type  in order to get the results that you want.

Let us know whether both work for you or not.
[17 Jul 2018 13:22] Anthony Marston
I am aware that all quoted values are treated as strings, but it has always been possible to supply a numeric value as a quoted string as in "SET numeric_field = '12345'" or "WHERE numeric_field = '12345'" not just in MySQL but also in PostgreSQL, Oracle and SQL Server. This means that the statement:

WHERE container.container_id_snr IN ('10101000000000000000000000000001','10101000000000000000000000000003')

where container_id_snr is defined as DECIMAL(38,0) there should be no difficulty in converting each of those strings to the correct numeric value. Yet in this case it seems to be ignoring the two values supplied in the IN clause and retrieving all possible values.

This behaviour is inconsistent as it works when only a single value is supplied, but not with multiple values.

I have tried exactly the same code with SQL Server and it does not exhibit this problem.

Your comment about converting a mixture of types is irrelevant as I am only converting strings to numbers.
[17 Jul 2018 15:24] MySQL Verification Team
HI,

As I wrote already:

"
One DIGITAL and one VARCHAR type
are usually converted to the floating point, or to VARCHAR, depending on
factors described in the referenced subchapter.
"

Hence, with the single string there is one rule, while with several strings there is another. Do note that there is no floating point number that supports more then 20 digits.

However, this looks like it can be improved, so I am verifying it as a feature request.
[17 Jul 2018 15:35] Anthony Marston
I don't see how your comment "One DIGITAL and one VARCHAR type are usually converted to the floating point" has any bearing on this issue as I am only supplying a list of varchars which need to be converted to decimal numbers. I am not supplying a mixture of DIGITAL and VARCHAR types which need to be converted, just a list of VARCHARS.

If "WHERE decimal_field = '12345'" works OK then why doesn't "WHERE decimal_field IN ('12345','67890')"? It works in SQL Server, so why can't MySQL handle it?
[18 Jul 2018 12:12] MySQL Verification Team
Hi,

It was a typo. No DIGITAL , but DECIMAL.

Hence, when you have: 

decimal_type= string_type 

there is one set of conversion rules, while when you have :

decimal_type IN (string_type, string_type, ...)

there is another set of conversion rules. If it is IN (single_string_const) it is a third conversion rules.

Now, that I think of it, this only needs documenting and not asking for the change of behaviour.

I will convert this to the documentation bug.
[18 Jul 2018 13:54] Anthony Marston
This is NOT just a documentation bug. I have another example where the table has plain INTEGER keys instead of DECIMAL(38,0) keys and there is no conversion error with any of the following statements:
WHERE key='1234'
WHERE key IN ('1234')
WHERE key IN ('1234','5678')
WHERE key NOT IN ('1234','5678')

The same conversion rules from string to numeric should apply regardless of whether the numeric is INTEGER or DECIMAL.

As I said in a previous post, SQL Server does not have this error.
[18 Jul 2018 15:02] MySQL Verification Team
Thank you for the expansion of this documentation bug.

This is how conversion works by design in MySQL, since year 2000. As there is no SQL standard on this topic, each RDBMS can have its own implementation.

Verified as a documentation bug.
[18 Jul 2018 17:26] Anthony Marston
In that case can you please change it to a feature request as it removes inconsistent behaviour.
[19 Jul 2018 10:16] Anthony Marston
You said that "This is how conversion works by design in MySQL since year 2000", but this implementation is not consistent. The clause

WHERE key NOT IN ('1234','5678')

works perfectly well if 'key' is an integer column, but not if it is decimal.

The section (12.2) on type conversion in the manual does not specify that there are different rules for different operators, therefore it is wrong to suggest otherwise. In the above WHERE clause there should be no difficulty in converting each of the string values into a decimal number, so the number of string values should be irrelevant.
[19 Jul 2018 12:04] MySQL Verification Team
Hi,

Thank you for pointing it out.

What you describe is actually an example of consistency. Simply, for any possible combination of data types in an expression, there is a separate rule. Ever since when those were introduced, back in year 2000.
[19 Jul 2018 12:40] Anthony Marston
I am not trying to mix strings, decimals, integers and floats in the same expression, so your comment that there is a different rule for different combinations of data types is misleading. If there is one rule for converting a string value to a decimal value then that one rule should be followed regardless of how many string values are supplied.
[19 Jul 2018 12:42] MySQL Verification Team
I agree.

That is why updating our Manual will make the behaviour quite clear.
[19 Jul 2018 13:29] Anthony Marston
Documenting an inconsistency does not make it legitimate. It will still confuse people who rely on consistent behaviour across the board.

If you won't treat this as a bug then please change it to a feature request.
[23 Jul 2018 13:38] MySQL Verification Team
Hi,

In our testing we noticed a change of the result type, only due to the addition of the index.

Which is why this will be treated as a bug.