Bug #39474 checksum table reports equal checksums when data is different
Submitted: 16 Sep 2008 11:50
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DDL Severity:S2 (Serious)
Version:5.x, 6.0.8 OS:Any
Assigned to: CPU Architecture:Any
Tags: checksum table
Triage: Triaged: D2 (Serious) / R2 (Low) / E3 (Medium)

[16 Sep 2008 11:50] Shane Bester
Description:
when an empty string is in one field, checksum table may report identical checksums for different data:

mysql> drop table if exists `t1`,`t2`;
Query OK, 0 rows affected (0.06 sec)

mysql> create table `t1`(`a` varchar(20),`b` varchar(20))engine=myisam;
Query OK, 0 rows affected (0.38 sec)

mysql> create table `t2`(`a` varchar(20),`b` varchar(20))engine=myisam;
Query OK, 0 rows affected (0.45 sec)

mysql> insert into `t1` values ('red','');
Query OK, 1 row affected (0.59 sec)

mysql> insert into `t2` values ('','red');
Query OK, 1 row affected (0.58 sec)

mysql> checksum table `t1`,`t2`;
+---------+------------+
| Table   | Checksum   |
+---------+------------+
| test.t1 | 3906269342 |
| test.t2 | 3906269342 |
+---------+------------+
2 rows in set (0.55 sec)

this affects folks using checksum table to quickly compare whether or
not two tables are the same.  as we see above, this is now too unreliable.

How to repeat:
drop table if exists `t1`,`t2`;
create table `t1`(`a` varchar(20),`b` varchar(20))engine=myisam;
create table `t2`(`a` varchar(20),`b` varchar(20))engine=myisam;
insert into `t1` values ('red','');
insert into `t2` values ('','red');
checksum table `t1`,`t2`;
[16 Sep 2008 12:22] Shane Bester
not only limited to empty strings.  when the row data is the same, but each column has different portions of the string, the checksums match too:

drop table if exists `t1`,`t2`;
create table `t1`(`a` varchar(20),`b` varchar(20))engine=myisam;
create table `t2`(`a` varchar(20),`b` varchar(20))engine=myisam;
insert into `t1` values ('re','dblue');
insert into `t2` values ('red','blue');
checksum table `t1`,`t2`;
[29 Sep 2008 9:36] Sergei Golubchik
We can only fix it in 6.0. A fix would change the values that CHECKSUM returns, and it cannot be done in 5.1 or earlier. 6.0 already has different CHECKSUM values, so it's not an issue there.
[23 Jun 2009 8:30] Guilhem Bichot
see also WL#5020
[5 Aug 2010 10:39] Shane Bester
i'm not the only one to notice this. see bug #55753 !
[28 Sep 2010 10:25] James Day
Perhaps the most common use of this is to try to find out whether a master and slave are consistent. That in turn makes it desirable to have a return value that can easily be inserted into a table and/or the replication stream. Then you can have a point in time comparison between the data on the master and slave without needing to block data changes on either server.

The value should be independent of the storage engine and metadata by default, since it's not prohibited to have differences in those things between servers. A more thorough metadata check would be nice sometimes a well.

CRC32 has too high a collision rate to be sensible when introducing a new method.
[11 Oct 2010 18:27] Shane Bester
http://kostja-osipov.livejournal.com/30504.html