Bug #20536 md5() with GROUP BY and UCS2 return different results on myisam/innodb
Submitted: 19 Jun 2006 13:36 Modified: 25 Sep 2006 15:00
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.0.23BK, 4.1 OS:Any (*)
Assigned to: Timothy Smith CPU Architecture:Any

[19 Jun 2006 13:36] Shane Bester
Description:
md5() with GROUP BY andUCS2 return different results on myisam/innodb.
The results should be the same no matter which engine is used. 
Surrounding md5 call with hex() makes the results identical.  unhex/hex can be used as a workaround.

See result from 5.0BK

mysql> create table t1(id bigint not null auto_increment pr
Query OK, 0 rows affected (0.03 sec)

mysql> INSERT INTO `t1` (`id`,`name`) VALUES
    -> (1,'test1'),
    -> (2,'test2'),
    -> (3,'test3'),
    -> (4,'test4'),
    -> (5,'test5');
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql>
mysql> select md5(name) as hash from t1 group by id;
+----------------------------------+
| hash                             |
+----------------------------------+
| f4b7ce8b45a20e3c4e84bef515d1525c |
| ????????????????                 |
| ????????????????                 |
| ????????????????                 |
| ????????????????                 |
+----------------------------------+
5 rows in set (0.03 sec)

mysql> alter table t1 engine=myisam;
Query OK, 5 rows affected (0.08 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> select md5(name) as hash from t1 group by id;
+----------------------------------+
| hash                             |
+----------------------------------+
| f4b7ce8b45a20e3c4e84bef515d1525c |
| 7823e21140de44563e17ad930d5a4a31 |
| 189c2715677870a83a06c32f85c8398f |
| cc14ebacc96404db9d3cfe87a702613a |
| a3b6e94c168c9525a92cad9b42d1dc56 |
+----------------------------------+
5 rows in set (0.02 sec)

mysql>

How to repeat:
drop table if exists t1;
create table t1(id bigint not null auto_increment primary key, name varchar(255) character set ucs2 NOT NULL)engine=innodb;
INSERT INTO `t1` (`id`,`name`) VALUES 
(1,'test1'),
(2,'test2'),
(3,'test3'),
(4,'test4'),
(5,'test5');

select md5(name) as hash from t1 group by id;
alter table t1 engine=myisam;
select md5(name) as hash from t1 group by id;

Suggested fix:
not sure..
[27 Jun 2006 21:22] Timothy Smith
This bug is not MyISAM vs. InnoDB; the test case happens to depend on that.  For example, add ORDER BY name to the select statement, and the InnoDB query no longer displays the problem; remove the GROUP BY, and the problem shows up with MyISAM.

I'm not sure yet why it sometimes shows up, and sometimes doesn't.  The following patch fixes the problem, but there is more work required to understand what other functions may be affected by this same source problem.

===== item_strfunc.cc 1.271 vs edited =====
--- 1.271/sql/item_strfunc.cc   2006-06-26 16:50:50 -04:00
+++ edited/item_strfunc.cc      2006-06-27 16:58:35 -04:00
@@ -145,6 +145,7 @@
            digest[8], digest[9], digest[10], digest[11],
            digest[12], digest[13], digest[14], digest[15]);
     str->length((uint) 32);
+    str->set_charset(&my_charset_bin);
     return str;
   }
   null_value=1;

Need to investigate SHA1() and AES_ENCRYPT() as well.
[27 Jun 2006 21:33] Timothy Smith
With the InnoDB test table, the following expose the bug:

select md5(name) from t1;
select md5(name) from t1 order by id;
select md5(name) from t1 order by name;

These two fail on the InnoDB table, but produce correct results on the MyISAM table:

select md5(name) from t1 group by id order by id;
select md5(name) from t1 group by id;

And the following return correct results on both tables:

select md5(name) from t1 group by id order by name;
select md5(name) from t1 group by name;
select md5(name) from t1 group by name order by id;
[14 Jul 2006 2:51] Timothy Smith
The difference in the queries is that those which return correct results are 'Using temporary'.  There is a longstanding problem, where the character set information is thrown away, for values processed via a temporary table.  In this case, it threw away the bogus character set, and so the correct values (in the 'binary' character set) were returned.

The correct fix for this particular bug is to force the result to be 'binary', no matter what the character set of the argument.

The same bug shows up in at least SHA1(), PASSWORD(), OLD_PASSWORD, ENCRYPT(), MAKE_SET(), EXPORT_SET(), QUOTE().

I'm looking for the correct way to fix it in those.

create table bug20536 (id bigint not null auto_increment primary key, name varchar(255) character set ucs2 not null);
insert into `bug20536` (`id`,`name`) values (1,'test1'), (2,'test2');
select md5(name) from bug20536;
select sha1(name) from bug20536;
select password(name) from bug20536;
select old_password(name) from bug20536;
select encrypt(name) from bug20536;
select make_set(3, name, name) from bug20536;
select export_set(5, name, name) from bug20536;
select quote(name) from bug20536;
drop table bug20536;

Current results are:

mysql> select md5(name) from bug20536;
+----------------------------------+
| md5(name)                        |
+----------------------------------+
| f4b7ce8b45a20e3c4e84bef515d1525c | 
| ????????????????                 | 
+----------------------------------+
2 rows in set (0.00 sec)

mysql> select sha1(name) from bug20536;
+------------------------------------------+
| sha1(name)                               |
+------------------------------------------+
| e0b52f38deddb9f9e8d5336b153592794cb49baf | 
| ????????????????????                     | 
+------------------------------------------+
2 rows in set (0.00 sec)

mysql> select password(name) from bug20536;
+----------------------+
| password(name)       |
+----------------------+
| ???????????????????? | 
| ???????????????????? | 
+----------------------+
2 rows in set (0.00 sec)

mysql> select old_password(name) from bug20536;
+--------------------+
| old_password(name) |
+--------------------+
| ????????           | 
| ????????           | 
+--------------------+
2 rows in set (0.00 sec)

mysql> select encrypt(name) from bug20536;
+---------------+
| encrypt(name) |
+---------------+
| ??????        | 
| ??????        | 
+---------------+
2 rows in set (0.01 sec)

mysql> select make_set(3, name, name) from bug20536;
+-------------------------+
| make_set(3, name, name) |
+-------------------------+
| test1?????              | 
| test2?????              | 
+-------------------------+
2 rows in set (0.00 sec)

mysql> select export_set(5, name, name) from bug20536;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| export_set(5, name, name)                                                                                                                                                                                                                                                                                                                                       |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1??????test1????? | 
| test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2??????test2????? | 
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql> select quote(name) from bug20536;
+-------------+
| quote(name) |
+-------------+
| ????????    | 
| ????????    | 
+-------------+
2 rows in set (0.00 sec)

mysql> drop table bug20536;
Query OK, 0 rows affected (0.00 sec)
[9 Aug 2006 11:56] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/10197

ChangeSet@1.2542, 2006-08-09 05:56:10-06:00, tsmith@siva.hindu.god +4 -0
  Bug #20536: md5() with GROUP BY and UCS2 return different results on myisam/innodb
  
  Make the encryption functions MD5(), SHA1() and ENCRYPT() return binary results.
      
  Make MAKE_SET() and EXPORT_SET() use the correct character set for their default separator strings.
[11 Aug 2006 23:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/10324

ChangeSet@1.2535, 2006-08-11 17:09:19-06:00, tsmith@siva.hindu.god +4 -0
  Bug #20536: md5() with GROUP BY and UCS2 return different results on myisam/innodb
  
  Make the encryption functions MD5(), SHA1() and ENCRYPT() return binary results.
      
  Make MAKE_SET() and EXPORT_SET() use the correct character set for their default separator strings.
[30 Aug 2006 15:58] Timothy Smith
queued to -maint trees (4.1, 5.0, 5.1)
[31 Aug 2006 11:39] Magnus BlÄudd
Pushed to 5.0.25
[12 Sep 2006 20:03] Paul DuBois
Noted in 5.0.25 changelog.

Use of MD5() with a ucs2 character column combined with GROUP BY could return different results for MyISAM than for InnoDB.

Resetting report to NDI pending merge into other trees.
[13 Sep 2006 8:21] Timothy Smith
Pushed to 5.1.12.
[25 Sep 2006 15:00] Paul DuBois
Corrected the changeset comment:

The MD5(), SHA1(), and ENCRYPT() functions should return a binary
string, but the result sometimes was converted to the character set
of the argument. MAKE_SET() and EXPORT_SET() now use the correct
character set for their default separators, resulting in consisten
tresult strings which can be coerced according to normal character set
rules.

Noted this also in the 4.1.22, 5.1.12 changelogs.