Bug #27530 Incorrect calculation CRC32 value in GROUP BY requests
Submitted: 29 Mar 2007 17:17 Modified: 24 Apr 2007 17:19
Reporter: Philip Jayasinghe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.0.37-log OS:Any (pc-linux-gnu x86_64)
Assigned to: Georgi Kodinov CPU Architecture:Any
Tags: 5.0.37, Contribution, crc32, GROUP, GROUP BY, overflow

[29 Mar 2007 17:17] Philip Jayasinghe
Description:
We have two tables - one for user data (id, names, etc - usr1), and second - quite large volume of messages from these users (these messages users can send each other with our Web site - msghist)

Time to time our administrator can check, if there large number of similar messages (i.e. some of users tries to spam with our messaging)

We do this with following request:

SELECT CRC32(CONCAT(author,msghist.body)) AS crc,author,name,count(*) AS cc,msghist.body FROM msghist LEFT JOIN usr1 ON id1=author GROUP BY (crc) having cc>10 ORDER BY cc DESC,author,part

Prior to verison 5.0.37 everything was OK. But with 5.0.37 we start to find large ammount of messages, with crc value 2147483647 (i.e. 7FFFFFFF).

So, first row of this request shows:

crc  |  author  |  name  |  cc  |   body 
2147483647  |  22  | Vesna | 139432 | ...some piece of text...

139432! But in real, there NO fields with this crc32 value:

Request like:
SELECT CRC32(CONCAT(author,msghist.body)) author,name,msghist.body FROM msghist LEFT JOIN usr1 ON id1=author WHERE CRC32(CONCAT(author,msghist.body))=2147483647

Will return empty result set.

How to repeat:
I think, it can be repeated on every SQL reqest, where CRC32 function is present, and where used GROUP BY on this CRC32 value

Suggested fix:
Upgrade to version 5.0.38 :)
[30 Mar 2007 9:26] Sveta Smirnova
Thank you for the report.

Please provide full output of statement SELECT CRC32(CONCAT(author,msghist.body)) AS crc,author,name,count(*) AS cc,msghist.body FROM msghist LEFT JOIN usr1 ON id1=author GROUP BY (crc) having cc>10 ORDER BY cc DESC,author,part where crc value is 2147483647

Also provide output of SHOW CREATE TABLE msghist, SHOW CREATE TABLE usr1
[30 Mar 2007 9:58] Martin Friebe
I would think, this may be related to Bug #27532 (under 5.0.40 bk)

It also can be reproduced with the table from Bug #27532:
select crc32(col), count(*) c  from bug_test group by 1;
+------------+---+
| crc32(col) | c |
+------------+---+
|  450215437 | 1 |
|  498629140 | 1 |
| 1790921346 | 1 |
| 1842515611 | 1 |
| 2147483647 | 6 |
+------------+---+
[30 Mar 2007 10:22] Alexander Y. Fomichev
As i can see this is because CREATE TEMPORARY TABLE don't honour an crc32 unsigned nature.
GROUP BY create temporary table while sorting so we have striped values in output.

mysql> CREATE TABLE `a` (`i` int(11) unsigned NOT NULL default '0') ENGINE=MyISAM;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TEMPORARY TABLE t SELECT i,CRC32(i) FROM a;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SHOW CREATE TABLE t;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                               |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TEMPORARY TABLE `t` (
  `i` int(11) unsigned NOT NULL default '0',
  `CRC32(i)` int(10) NOT NULL default '0'
) ENGINE=MyISAM DEFAULT CHARSET=cp1251 |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> INSERT INTO a VALUES (1,2,116421,116451);
mysql> SELECT i,CRC32(i) FROM a;
+--------+------------+
| i      | CRC32(i)   |
+--------+------------+
|      1 | 2212294583 |
|      2 |  450215437 |
| 116421 | 2821282535 |
| 116451 | 3882404896 |
+--------+------------+
4 rows in set (0.00 sec)

mysql> SELECT i,CRC32(i) FROM a GROUP BY i;
+--------+------------+
| i      | CRC32(i)   |
+--------+------------+
|      1 | 2147483647 |
|      2 |  450215437 |
| 116421 | 2147483647 |
| 116451 | 2147483647 |
+--------+------------+
4 rows in set (0.00 sec)
[30 Mar 2007 10:36] Sveta Smirnova
Thank you for the comments.

Verified using attached test case.
[30 Mar 2007 10:38] Sveta Smirnova
test case

Attachment: bug27530.test (application/octet-stream, text), 258 bytes.

[30 Mar 2007 11:30] Alexander Y. Fomichev
Huh, i'm sorry, first of all:
-INSERT INTO a VALUES (1,2,116421,116451);
+INSERT INTO a VALUES (1),(2),(116421),(116451);
( i should sometimes read a little before commit )
at the second part: it seems i was wrong as of this behaviour somwhere specific for 5.0.37, seems like it was here for a while.

On 5.0.24:

mysql> \u test
Database changed
mysql> DROP TABLE IF EXISTS a;
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE `a` (`i` int(11) unsigned NOT NULL default '0') ENGINE=MyISAM;
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT INTO a VALUES (1),(2),(116421),(116451);
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> CREATE TEMPORARY TABLE t SELECT i,CRC32(i) FROM a;
Query OK, 4 rows affected, 3 warnings (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 3

/* Oops! */

mysql> SHOW CREATE TABLE t;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                               |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TEMPORARY TABLE `t` (
  `i` int(11) unsigned NOT NULL default '0',
  `CRC32(i)` int(10) NOT NULL default '0'
) ENGINE=MyISAM DEFAULT CHARSET=cp1251 |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT i,CRC32(i) FROM a;
+--------+------------+
| i      | CRC32(i)   |
+--------+------------+
|      1 | 2212294583 |
|      2 |  450215437 |
| 116421 | 2821282535 |
| 116451 | 3882404896 |
+--------+------------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM t;
+--------+------------+
| i      | CRC32(i)   |
+--------+------------+
|      1 | 2147483647 |
|      2 |  450215437 |
| 116421 | 2147483647 |
| 116451 | 2147483647 |
+--------+------------+
4 rows in set (0.00 sec)

And i still think it is bug, because SELECT * FROM t; and SELECT i,CRC32(i) FROM a; shows different results (should i put it to bugs.mysql as _another_ bug? ), but another one Oops!:

mysql> SELECT i,CRC32(i) AS crc FROM a GROUP BY crc;
+--------+------------+
| i      | crc        |
+--------+------------+
|      2 |  450215437 |
|      1 | 2212294583 |
| 116421 | 2821282535 |
| 116451 | 3882404896 |
+--------+------------+
4 rows in set (0.00 sec)

so my previous assumption was definitely wrong, sorry. Seems mysql use (has used in the past) some type of workaround here.
[30 Mar 2007 19:55] Martin Friebe
crc32 is indeed not set to unsigned. See patch, and additional tests

Attachment: crc_group.patch (text/x-patch), 7.87 KiB.

[30 Mar 2007 20:11] Martin Friebe
A workaround for the problem is to explicitly cast the result of crc32 as unsigned. This wauy it willnot be truncated by the "group by"

SELECT
 cast( CRC32(CONCAT(author,msghist.body)) as unsigned) AS crc,
 author,name,count(*) AS cc,
 msghist.body
FROM msghist LEFT JOIN usr1 ON id1=author GROUP BY (crc) having
cc>10 ORDER BY cc DESC,author,part

I have tested the workaround on the example from Bug #27532:
 select cast(crc32(col) as unsigned), count(*) c  from bug_test group by 1;
does work
[11 Apr 2007 11:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/24266

ChangeSet@1.2444, 2007-04-11 13:58:16+03:00, gkodinov@magare.gmz +3 -0
  Bug #27530:
  The function CRC32() returns unsigned integer.
  But the metadata (the unsigned flag) for the 
  function was set incorrectly.
  As a result type arithmetics based on the 
  function's metadata (like finding the concise
  type of an temporary table column to hold the result)
  returned incorrect results.
  Fixed by returning correct type information.
  
  This fix is based on code contributed by Martin Friebe
  (martin@hybyte.com) on 2007-03-30.
[21 Apr 2007 15:19] Bugs System
Pushed into 5.1.18-beta
[21 Apr 2007 15:20] Bugs System
Pushed into 5.0.42
[24 Apr 2007 17:19] Paul DuBois
Noted in 5.0.42, 5.1.18 changelogs.

The CRC32() function returns an unsigned integer, but the metadata
was signed, which could cause certain queries to return incorrect
results. (For example, queries that selected a CRC32() value and used
that value in the GROUP BY clause.)