Bug #24759 WHERE IN(CONV(), ...) fails to match some BIGINT values
Submitted: 1 Dec 2006 20:02 Modified: 7 Dec 2006 16:28
Reporter: Jamie McCarthy Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.0.26 OS:Linux (CentOS 4.4 Linux)
Assigned to: CPU Architecture:Any

[1 Dec 2006 20:02] Jamie McCarthy
Description:
My program performs a SELECT on a table with a BIGINT primary key with a WHERE clause similar to: id IN (CONV('123ABC', 16, 10), CONV('456DEF', 16, 10)). When there is more than one value in the IN list, then, for some values, a match is not returned even though the BIGINT equivalent of the string returned by the CONV actually exists in the table. My expectation is that all values matched by the values in the IN clause would be returned.

There are several workarounds, including: using only one value in the IN clause at a time; using (id= OR id=) instead of IN; wrapping the CONV in a CAST into numeric form; and using the ordinary numeric value instead of having MySQL perform a CONV.

This problem may exist for numeric key types other than BIGINT, and it may exist when the value in question is not a key; I have not tested those cases.

How to repeat:
-- Set up a small table with three values.  The first and last are
-- correctly returned under all circumstances I can find.  The
-- middle is not returned with IN(CONV, ...) queries.

mysql> select version();
+---------------------+
| version()           |
+---------------------+
| 5.0.26-standard-log | 
+---------------------+
1 row in set (0.00 sec)

mysql> drop table if exists test1;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE test1 (
    -> id bigint(20) unsigned NOT NULL default '0',
    -> PRIMARY KEY (id)
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO test1 VALUES ('2332785242394147927'), ('2332785242394237927'), ('2332785242394247937');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

-- Show the contents of the table.

mysql> select id, conv(id, 10, 16) from test1;
+---------------------+------------------+
| id                  | conv(id, 10, 16) |
+---------------------+------------------+
| 2332785242394147927 | 205FB7D17E2D3C57 | 
| 2332785242394237927 | 205FB7D17E2E9BE7 | 
| 2332785242394247937 | 205FB7D17E2EC301 | 
+---------------------+------------------+
3 rows in set (0.00 sec)

-- Four ways of returning the middle value which do work.

mysql> select * from test1 where id=2332785242394237927;
+---------------------+
| id                  |
+---------------------+
| 2332785242394237927 | 
+---------------------+
1 row in set (0.00 sec)

mysql> select * from test1 where id='2332785242394237927';
+---------------------+
| id                  |
+---------------------+
| 2332785242394237927 | 
+---------------------+
1 row in set (0.00 sec)

mysql> select * from test1 where id=CONV('205FB7D17E2E9BE7', 16, 10);
+---------------------+
| id                  |
+---------------------+
| 2332785242394237927 | 
+---------------------+
1 row in set (0.00 sec)

mysql> select * from test1 where id IN (CONV('205FB7D17E2E9BE7', 16, 10));
+---------------------+
| id                  |
+---------------------+
| 2332785242394237927 | 
+---------------------+
1 row in set (0.00 sec)

-- The IN(CONV, ...) syntax, which returns the first and last
-- value but not the middle one.

mysql> select * from test1 where id IN (CONV('205FB7D17E2D3C57', 16, 10), 1);
+---------------------+
| id                  |
+---------------------+
| 2332785242394147927 | 
+---------------------+
1 row in set (0.00 sec)

mysql> select * from test1 where id IN (CONV('205FB7D17E2E9BE7', 16, 10), 1);
Empty set (0.00 sec)

mysql> select * from test1 where id IN (CONV('205FB7D17E2EC301', 16, 10), 1);
+---------------------+
| id                  |
+---------------------+
| 2332785242394247937 | 
+---------------------+
1 row in set (0.00 sec)

mysql> select * from test1 where id IN (CONV('205FB7D17E2D3C57', 16, 10), CONV('205FB7D17E2E9BE7', 16, 10), CONV('205FB7D17E2EC301', 16, 10));
+---------------------+
| id                  |
+---------------------+
| 2332785242394147927 | 
| 2332785242394247937 | 
+---------------------+
2 rows in set (0.00 sec)

-- Workaround 1:  id=CONV OR id=CONV

mysql> select * from test1 where id=CONV('205FB7D17E2D3C57', 16, 10) OR id=CONV('205FB7D17E2E9BE7', 16, 10) OR id=CONV('205FB7D17E2EC301', 16, 10);
+---------------------+
| id                  |
+---------------------+
| 2332785242394147927 | 
| 2332785242394237927 | 
| 2332785242394247937 | 
+---------------------+
3 rows in set (0.00 sec)

-- Workaround 2:  id=CAST(CONV AS UNSIGNED)

mysql> select * from test1 where id IN (CAST(CONV('205FB7D17E2E9BE7', 16, 10) AS UNSIGNED), 1);
+---------------------+
| id                  |
+---------------------+
| 2332785242394237927 | 
+---------------------+
1 row in set (0.00 sec)

-- Workaround 3:  specify numeric value directly

mysql> select * from test1 where id IN (2332785242394147927, 2332785242394237927, 2332785242394247937);
+---------------------+
| id                  |
+---------------------+
| 2332785242394147927 | 
| 2332785242394237927 | 
| 2332785242394247937 | 
+---------------------+
3 rows in set (0.00 sec)

I regret that time did not permit me to test all possible values of
a BIGINT before submitting this bug, to try to determine a pattern
of success and failure.  I did test several dozen values surrounding
the example of real-world failure that we found, and for what it's
worth here are the results.

Here are the sample values I tested which failed with the given syntax:

205F5CDE6DB482F7 205FAEB92FBC22F7 205FB6E8A989B2F7 205FB7BA35B7DAF7
205FB7CF2A22DEF7 205FB7D14293F8F7 205FB7D17E2E9BE7 205FB7D17E2EBF0F
205FB7D17E2EC293 205FB7D17E2EC2ED 205FB7D17E2EC2F6 205FB7D17E2EC2F7
205FB7D17E2EC2F8 205FB7D17E2EC6DF 205FB7D17E2EEA07 205FB7D1B9C98CF7
205FB7D3D23AA6F7 205FB7E8C6A5AAF7 205FB8BA52D3D2F7 205FC0E9CCA162F7
206012C48EA902F7 20833EC3EDEFC2F7 20A6C5B65DB0C2F7

And here are the sample values I tested which succeeded:

200EC4C2D7270000 2010C5A2183D5A47 20324BB546E80000 20344C9487FE5A47
20346D67A9C58000 20357E40DB344000 205F6A4A738DC000 205FB7D17838E1F7
205FB7D17D962C77 205FB7D17E1F80B7 205FB7D17E2D3C57 205FB7D17E2EC301
205FB7D17E2EC35B 205FB7D17E304997 205FB7D17E3E0537 205FB7D17EC75977
205FB7D18424A3F7 205FC53D84080000

Suggested fix:
Unknown.
[1 Dec 2006 23:34] Jamie McCarthy
Forgot to include this info before...

The tests were run on an Opteron. We have not attempted to reproduce the problem on any 32-bit machines.

vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : Dual Core AMD Opteron(tm) Processor 270

# rpm -qa|grep -i 'mysql\|glibc'
glibc-2.3.4-2.25
MySQL-client-5.0.26-0.glibc23
glibc-common-2.3.4-2.25
MySQL-shared-5.0.26-0.glibc23
MySQL-devel-5.0.26-0.glibc23
glibc-kernheaders-2.4-9.1.98.EL
glibc-devel-2.3.4-2.25
glibc-2.3.4-2.25
glibc-headers-2.3.4-2.25
MySQL-server-5.0.26-0.glibc23
[3 Dec 2006 9:50] Valeriy Kravchuk
Thank you for a problem report. Sorry, but I was not able to repeat the behaviour described with latest 5.0.32-BK on Linux (32-bit):

mysql> create table test1 (id bigint unsigned not null default '0', primary key
(id)) engine=InnoDB;
Query OK, 0 rows affected (0.06 sec)

mysql> INSERT INTO test1 VALUES ('2332785242394147927'),
    -> ('2332785242394237927'), ('2332785242394247937');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select id, conv(id, 10, 16) from test1;
+---------------------+------------------+
| id                  | conv(id, 10, 16) |
+---------------------+------------------+
| 2332785242394147927 | 205FB7D17E2D3C57 |
| 2332785242394237927 | 205FB7D17E2E9BE7 |
| 2332785242394247937 | 205FB7D17E2EC301 |
+---------------------+------------------+
3 rows in set (0.01 sec)

mysql> select * from test1 where id=2332785242394237927;
+---------------------+
| id                  |
+---------------------+
| 2332785242394237927 |
+---------------------+
1 row in set (0.01 sec)

mysql> select * from test1 where id='2332785242394237927';
+---------------------+
| id                  |
+---------------------+
| 2332785242394237927 |
+---------------------+
1 row in set (0.00 sec)

mysql> select * from test1 where id=CONV('205FB7D17E2E9BE7', 16, 10);
+---------------------+
| id                  |
+---------------------+
| 2332785242394237927 |
+---------------------+
1 row in set (0.00 sec)

mysql> select * from test1 where id IN (CONV('205FB7D17E2E9BE7', 16, 10));
+---------------------+
| id                  |
+---------------------+
| 2332785242394237927 |
+---------------------+
1 row in set (0.00 sec)

mysql> select * from test1 where id IN (CONV('205FB7D17E2E9BE7', 16, 10), 1);
+---------------------+
| id                  |
+---------------------+
| 2332785242394237927 |
+---------------------+
1 row in set (0.01 sec)

mysql> select version();
+--------------+
| version()    |
+--------------+
| 5.0.32-debug |
+--------------+
1 row in set (0.00 sec)
[3 Dec 2006 18:04] Brian Aker
I just tried this on 5.1 BK on my 64bit AMD withe a debug build.

For Valerity's results I got this:
mysql> select * from test1 where id IN (CONV('205FB7D17E2E9BE7', 16, 10), 1);
Empty set (0.00 sec)

My results for Jamie's queries were the same. Someone with access to a 64bit machine needs to look at this.
[4 Dec 2006 16:28] Sebastian Goth
Hi there,

i have a similar problem without CONV().

create table test (token bigint(20));
insert into test values('3232933621568372736'),('5174632569486238062'),('5860192774298853742'),('8787532532089676142'),('8029914733239132526')

select token from test where token in('3232933621568372736','8787532532089676142','8029914733239132526','5174632569486238062','5860192774298853742');

This gives 3 results with following versions:
4.1.20 (source)
4.1.22 (source)
4.1.22 (mysql-standard-4.1.22-unknown-linux-gnu-x86_64-glibc23.tar.gz)
5.0.27 (source)
5.0.27-1 (debian etch amd64 binary)

And I get only 1 result with:
5.0.27 (mysql-standard-5.0.27-linux-x86_64-icc-glibc23.tar.gz)

The correct 5 results are shown if I write the query without quotes or with
"when token='3232933621568372736' or token='8787532532089676142'..."

This happens on:

vendor_id       : AuthenticAMD
cpu family      : 15
model           : 35
model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+

with the debian stable amd64 port (unofficial) _and_ debian etch amd64.

Tested kernels: 
2.6.16-2-amd64-k8-smp from debian backports 
2.6.19 vanilla

In contrast my kubuntu 5.0.24a build on AMD Athlon 2400+ (32bit) does NOT have this bug.
[6 Dec 2006 11:05] Sergei Golubchik
This is expected.

1. CONV() function returns a string, not a number.
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html

2. When a number is compared to a string, both are converted to a floating point number (double).
http://dev.mysql.com/doc/refman/5.0/en/type-conversion.html

3. Converting of a BIGINT to a DOUBLE is lossy.
[6 Dec 2006 12:46] Sebastian Goth
Is it also expected that it behaves really inconsitent?

If 
... in('[bigint1]','[bigint2]','[bigint3]');

can't find [bigint1] because of conversion loss, shouldn't 

... in('[bigint1]');

return no result, either?

This is what i can't explain to myself. Ask for 5 quoted bigints and you get 3 for example. Ask for one of the missing bigints explicitly but also in quotes and you get a result.

Sebastian
[7 Dec 2006 16:28] Jamie McCarthy
I think this is my bad, I'd assumed that "CONV(constant, constant, constant)" would return a constant value when it does not. The description for "expr IN (val,val)" reads: "if all values are constants, they are evaluated according to the type of expr" so I was assuming the CONV would be evaluated as a BIGINT. Since it's not a constant, the type conversion rules Sergei pointed out apply, namely the conversion of the CONV's string and expr's BIGINT into floating-point.

So without the CAST this was never guaranteed to work and the question becomes why it ever did... my assumption is that "expr IN (singleval)" is special-cased, at which point I can imagine a bunch of reasons why that would work differently.

So at this point I'm inclined to agree this is not a bug.