MySQL Bugs: #27869: Comparison between string and very large integer fails

Bug #27869	Comparison between string and very large integer fails
Submitted:	17 Apr 2007 2:59	Modified:	4 May 2007 6:41
Reporter:	Kolbe Kegel	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Data Types	Severity:	S4 (Feature request)
Version:	5.0.38, 5.1.17	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	bfsm_2007_05_03, regression, wrong results

Description:
A very large integer argument in the WHERE clause will be cast in such a way that it does not compare properly with a character column in the table.

This behavior exists in 4.1.19, 5.0.38, and 5.1.17, but *not* 4.0.27.

It is impossible for me to say what the *real* problem is. Further investigation will be required to find the true cause of this behavior. It could be improper casting, utilization of an improper data type somewhere along the way, or some other cause.

How to repeat:
create table t1 (col1 char(20));
insert into t1 values ('123456789012345678');
select * from t1 where col1='123456789012345678';
select * from t1 where col1=123456789012345678;

mysql 5.0.38-enterprise-gpl-log (root) [test]> create table t1 (col1 char(20));
Query OK, 0 rows affected (0.09 sec)

mysql 5.0.38-enterprise-gpl-log (root) [test]> insert into t1 values ('123456789012345678');
Query OK, 1 row affected (0.00 sec)

mysql 5.0.38-enterprise-gpl-log (root) [test]> select * from t1 where col1='123456789012345678';
+--------------------+
| col1               |
+--------------------+
| 123456789012345678 | 
+--------------------+
1 row in set (0.00 sec)

mysql 5.0.38-enterprise-gpl-log (root) [test]> select * from t1 where col1=123456789012345678;
Empty set (0.00 sec)

Some related observations:

mysql 5.0.38-enterprise-gpl-log (root) [test]> insert into t1 values ('105555555555555555');
Query OK, 1 row affected (0.05 sec)

mysql 5.0.38-enterprise-gpl-log (root) [test]> select * from t1 where col1=105555555555555555;
+--------------------+
| col1               |
+--------------------+
| 105555555555555555 | 
+--------------------+
1 row in set (0.00 sec)

That integer is really only slightly smaller than the other integer, but it works properly.

Suggested fix:
The argument to a comparison operator in the WHERE clause should be cast to the same data type as the column to which it is being compared, if relevant.

For instance, the comparison illustrated above should be executed in this manner:

mysql 5.0.38-enterprise-gpl-log (root) [test]> select * from t1 where col1=cast(123456789012345678 as char);                                   
+--------------------+
| col1               |
+--------------------+
| 123456789012345678 | 
+--------------------+
1 row in set (0.00 sec)

I believe this is the same as Bug #9840

Also documented on http://dev.mysql.com/doc/refman/5.0/en/type-conversion.html

"
 Comparisons that use floating-point numbers (or values that are converted to floating-point numbers) are approximate because such numbers are inexact. This might lead to results that appear inconsistent:

mysql> SELECT '18015376320243458' = 18015376320243458;
        -> 1
mysql> SELECT '18015376320243459' = 18015376320243459;
        -> 0

Such results can occur because the values are converted to floating-point numbers, which have only 53 bits of precision and are subject to rounding: 
"

With the numbers from Bug #9840 I also experienced it in 4.0.22. So it isn't regression.

It is however depended on platform and/or compiler used.

Most mysql executables will fail at least one in a range of 8 consecutive numbers.

I am not sure, if there is a feature request about it....

Why are these types of comparisons done using floating point and not precision math?

>Why are these types of comparisons done using floating point and not precision
math
In fact, longlong comparison would work here.
But as string fields can contain any kinds of data like '1e-200' etc,
we always convert them to DOUBLE then compare. And double 'equal' comparison
is inpredictable.
The recommended solution here is using CAST like
select * from t1 where cast(col1 as unsigned) = 123...
or 
select * from t1 where cast(col1 as DECIMAL(30,0)) = 123...

My opinion is that it's not a bug.

In Oracle we have:

SQL> create table t1 (col1 char(20));

Table created.

SQL> insert into t1 values ('123456789012345678');

1 row created.

SQL> select * from t1 where col1='123456789012345678';

COL1
--------------------
123456789012345678

SQL> select * from t1 where col1=123456789012345678;

COL1
--------------------
123456789012345678

I don't agree that this is NOT a bug, even if DOUBLE usage is documented. See my previous comment with example from Oracle. We compare CHAR(20) representing integer to INT, we can get correct result. So, we should get it. Now we will have different results from the same SQL on different (minor?) versions/platforms. What is the point of "arithmetical" changes with DECIMAL in 5.0.x then?

We should use machine-independent presize arithmetics for this internally, like Oracle does with its NUMBER data type.

As current behaviour is documented, it is a "feature". But this is not what customers/users expected. OK, this is not a S1 bug (for formal reason + provided workaround), but then this is a valid, and urgent feature request, at least!

Yes, it's a possible feature request.
I think it's reasonable to implement specific 'string' - 'number' comparator,
that will check the string content with the given number type, and use
appropriate conversion / compare method.
It's significant change and rather a WL task than a bugfix IMHO

I am not sure this can be done at all. Of course it's possible to have a special string-int comparator, but it'll mean that a string will be converted to a number differently in different contexts: to a "best matching" in comparison, always to double in other operations. I'll lead to odd and inconsistent results such as

  f_str = f_int    being true but
  f_str - f_int    being non-zero

we need to have consistent behaviour, otherwise it'll be real hard to explain and quite surprising in some cases

IMHO the whole issue only exists, if the value represented by string is within the bigint / unsigned-bigint range. (maybe precision range)

If String converts to a number outside this range, or to a number with fractions, then the non-string value, must be of a non int type as well (double or precision).

Therefore it is possible to depend the string to number conversation on the size-of-string, rather than context:

This could be done by attempting to convert any string to a bigint first. If the conversation encounters an out-of-bounds, then the existing bigint is converted to float, and conversion continues from this value. (I would expect the bigint to float conversation to be of smaller cost, than starting to convert again)

The first step could be converting to decimal instead, or converting to decimal could be a middle step. Not sure if that adds any benefit.

This would not only solve the comparison issue and be consistent, it would also lead to more exact string to float conversations. 
Sergey explained in Bug #9840

> I don't think they should. Basically
>   select    '18015376320243459' =18015376320243459;
> computes
>   (((((((((((((1.0*10+8.0)*10*10+1.0)*10+5.0)*10+3.0)*10+7.0)*10+6.0)*10+3.0)*10
>     +2.0)*10*10+2.0)*10+4.0)*10+3.0)*10+4.0)*10+5.0)*10+9.0 ==
>   (double)18015376320243459

A string outside the bigint range (which must be converted to float), currently starts introducing conversion errors already within the bigint range. They get multiplied, and are kept during the rest of the conversion. (IMHO) Keeping the conversion to Bigint as long as possible, will reduce the amount of error for some string to float conversation.

There is related bug #43622