Bug #36532 Sort order is wrong when a column contains "_" (underscores)
Submitted: 6 May 2008 13:12 Modified: 6 May 2008 21:28
Reporter: Philip Orleans Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.1 OS:Any
Assigned to: CPU Architecture:Any
Tags: reponse

[6 May 2008 13:12] Philip Orleans
Description:
I tested this problem with SQL Server 2005 using identical tables, and SQL Server gives the right answer. The bug can lead to losing a lot of money in my industry, telecommunications.
Suppose wi have  a table with two fields:
Code,firstmin
‘1_____’,0.010
‘121324’,0.005
the you do a select statement to select the row that is the cheapest and matches the most. This is the basis of telephony least cost routing. Suppose the phone number is 12132499999. The statement is like this:
"select code, firstmin from table where '12132499999' like concat(code,'%') order by code desc, firstmin asc set limit 1;"
What we want is the cheapest row after the match has been done on the phonenumber. Since then "_" is not a regular character, but one that means 'any single character", the two rows in the first column are worth exactly the same for sorting purposes, so the algrithm must pick up the row based on the second column, in ascending order, and instead it picks up the wrong one. In SQL Server 2005 the example works perfectly. I need to port my least-cost-routing algorithm and without this fix I would go bankrupt.

How to repeat:
If somebody wants to log into my windows box via remote desktop, I can show you a simple test like the one above, in Mysql 5.1 and Sql Server 2005.

Suggested fix:
the algorithm must be rewritten for cases where wildcards are used in one of the sorted fields. There is no other way to express business logic like this. In this case, one carrier offers to termiante calls to all number at one price, but another one offers to terminate a call to a matching area code to less. The sorting must pick the cheapest, otherwise how do we solve this?
[6 May 2008 13:37] Kristian Nielsen
ORDER BY length(code) DESC ?

    select code, firstmin from table where '12132499999' like concat(code,'%') order by length(code) desc, firstmin asc limit 1;
[6 May 2008 14:13] Philip Orleans
That would work, of course, but that is not an excuse for a wrong result. It gets even better. The database engine gets confused with any column that contains a value with an underscore. For example, I just did this:
insert clientrates(code,firstmin,id_rate,peakid,addmin,maxlength,roundto,surcharge,frequency,disabled,dayofweek,critical,mintime) values('1201207_',0.001,99,0,0.01,99,1,0.00,0,0,'*',0,0);
Query OK, 1 row affected (0.00 sec)

Then I type:

mysql> select code from clientrates_cache where code like '1201207_';
+----------+
| code     |
+----------+
| 12012070 |
| 12012071 |
| 12012072 |
| 12012073 |
| 12012074 |
| 12012075 |
| 12012076 |
| 12012077 |
| 12012078 |
| 12012079 |
+----------+
10 rows in set (0.00 sec)
My new value does not even show, and it should, because '1201207_' must match with itself.

Also, if I type this:
select code, firstmin from clientrates_cache where '12012075599' like concat(code,'%') order by length(code) desc, firstmin asc limit 100;

+----------+----------+
| code     | firstmin |
+----------+----------+
| 12012075 | 0.008300 |
| 1_______ | 0.009900 |
| 1        | 0.040000 |
+----------+----------+
3 rows in set (0.00 sec)

The right response would be the new value that I just inserted, because it is the cheapest.
[6 May 2008 15:24] Peter Laursen
@federico

I think you should post the CREATE STATEMENT for the table.  Data types may matter!

try like this:
* create table statement 8as returned by 'show create table')
* insert statement
* select statement

Also tell the client you are using. 

Make it complete so that it can simply be copied and execueted wihtout any need to guess/assume anything!

CREATE TABLE `tab1` (                 
          `id` int(11) DEFAULT NULL,          
          `t` varchar(20) DEFAULT NULL        
        ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert into `tab1`(`id`,`t`) values ( NULL,'a_');

select * from tab1 where t like 'a_';

/in both 5.051b and 5.1.24 returns

    id  t     
------  ------
(NULL)  a_ 
*/

Peter
(not a mysql person)
[6 May 2008 15:54] Philip Orleans
I am not using any client. I just installed mysql 5.1 last night and I am logged directly in the database using the only interface provided. This is  windows 2003. I am MS Sql Server professional but need a Linux database.I can give somebody access to the database via internet, for there is nothing there but test data. Just contact me via email and let me know your IP address so I can open the firewall.

show create table clientrates_cache;
| clientrates_cache | CREATE TABLE `clientrates_cache` (
  `code` varchar(15) NOT NULL,
  `id_rate` int(11) NOT NULL,
  `peakid` int(11) NOT NULL,
  `firstmin` decimal(19,6) NOT NULL,
  `addmin` decimal(19,6) NOT NULL,
  `maxlength` int(11) NOT NULL,
  `roundto` int(11) NOT NULL,
  `surcharge` decimal(19,6) NOT NULL,
  `frequency` int(11) NOT NULL,
  `dayofweek` varchar(15) NOT NULL,
  `critical` tinyint(4) NOT NULL,
  `mintime` int(11) NOT NULL,
  `country` varchar(60) DEFAULT NULL,
  `fromx` smallint(6) NOT NULL,
  `tox` smallint(6) NOT NULL,
  PRIMARY KEY (`id_rate`,`code`,`peakid`,`dayofweek`)
) ENGINE=MEMORY DEFAULT CHARSET=latin1 |
1 row in set (0.02 sec)
[6 May 2008 16:17] Peter Laursen
OK, so you use the command line client! :-)

You still do not make it totally reproducable!

You post a CREATE statement for `clientrates_cache`, but an INSERT statement for `clientrates`.  The SELECT statement operates on `clientrates_cache`. Two different tables here .. 

Soon some MySQL person will come and post a link on 'how to report a bug' .. :-)

My advice: start *totally* from scratch

* create database statement
* create table statement(s)
* insert statement(s)
* select statement(s)
* what do you get?
* what do you expect?

To many words here and too little exact information that can be tested/reproduced!
[6 May 2008 16:39] Philip Orleans
My apologies. I had inserted on the wrong table.
The bug is show below. The two statements should bring identical results, and they do not. In this case, all I care is top 1 row (limit 1). In MS SQL server, they work identical. I bet in Sybase it would be the same:

select code, firstmin from clientrates_cache where '12012075599' like concat(code,'%') order by length(code) desc, firstmin asc limit 100;
+----------+----------+
| code     | firstmin |
+----------+----------+
| 1201207_ | 0.001000 |
| 12012075 | 0.008300 |
| 1_______ | 0.009900 |
| 1        | 0.040000 |
+----------+----------+
4 rows in set (0.01 sec)

select code, firstmin from clientrates_cache where '12012075599' like concat(code,'%') order by code desc, firstmin asc limit 100;
+----------+----------+
| code     | firstmin |
+----------+----------+
| 1_______ | 0.009900 |
| 1201207_ | 0.001000 |
| 12012075 | 0.008300 |
| 1        | 0.040000 |
+----------+----------+
4 rows in set (0.00 sec)

This the table:
clientrates_cache | CREATE TABLE `clientrates_cache` (
 `code` varchar(15) NOT NULL,
 `id_rate` int(11) NOT NULL,
 `peakid` int(11) NOT NULL,
 `firstmin` decimal(19,6) NOT NULL,
 `addmin` decimal(19,6) NOT NULL,
 `maxlength` int(11) NOT NULL,
 `roundto` int(11) NOT NULL,
 `surcharge` decimal(19,6) NOT NULL,
 `frequency` int(11) NOT NULL,
 `dayofweek` varchar(15) NOT NULL,
 `critical` tinyint(4) NOT NULL,
 `mintime` int(11) NOT NULL,
 `country` varchar(60) DEFAULT NULL,
 `fromx` smallint(6) NOT NULL,
 `tox` smallint(6) NOT NULL,
 PRIMARY KEY (`id_rate`,`code`,`peakid`,`dayofweek`)
 ENGINE=MEMORY DEFAULT CHARSET=latin1 |
[6 May 2008 20:22] Sveta Smirnova
Thank you for the report.

If compare '_' with any digit result would be '_' is greater than digit:

mysql> select strcmp('_', '9');
+------------------+
| strcmp('_', '9') |
+------------------+
|                1 | 
+------------------+
1 row in set (0.00 sec)

mysql> select ord('_');
+----------+
| ord('_') |
+----------+
|       95 | 
+----------+
1 row in set (0.00 sec)

mysql> select ord('9');
+----------+
| ord('9') |
+----------+
|       57 | 
+----------+
1 row in set (0.00 sec)

So sort order is correct and I close report as "Not a Bug".

Only one thing can be considered as a bug from this report is wrong results of query select code from clientrates_cache where code like '1201207_'; But I can not repeat this in mine environment as well. If you could create repeatable test case for this (show create table, inserts with sample values, select, accurate version of mysql) please open new report.
[6 May 2008 21:28] Philip Orleans
I beg to differ. I think that you are wrong and Mysql cannot behave different than the rest of the major databases. When sorting, you cannot consider '_' like a character in itself, but a wildcard. That is how Microsoft does it. For sorting purposes these two values are worth the same:
'________'
'12345678'
I can set up Sybase and probaly get the sama result, if you agree to reconsider.
[6 May 2008 21:37] Sveta Smirnova
Thank you for the feedback.

To reconsider is better to take in account SQL standard, not other databases.

Btw PostgreSQL behaves same as MySQL:

postgres=# create table t1 (f1 varchar(25));CREATE TABLE
postgres=# insert into t1 values('1201207_');
INSERT 0 1
postgres=# insert into t1 values('12012075');
INSERT 0 1
postgres=# insert into t1 values('1_______');
INSERT 0 1
postgres=# insert into t1 values('1');
INSERT 0 1
postgres=# select f1 from t1 order by f1 desc;
    f1    
----------
 1_______
 1201207_
 12012075
 1
(4 rows)