MySQL Bugs: #81910: WRITE SET HASHES SHOULD HAVE A 64BITS OPTION TO REDUCE WRITE CONFLICTS

Bug #81910	WRITE SET HASHES SHOULD HAVE A 64BITS OPTION TO REDUCE WRITE CONFLICTS
Submitted:	17 Jun 2016 16:35	Modified:	26 Aug 2016 13:23
Reporter:	Pedro Gomes	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.7.14	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
For such uses as Group Replication, the server has write set
extraction algorithms that identify all updates made in that server in
an unique way according to the affected database, table and rows.

As of now we have

transaction-write-set-extraction=MURMUR32

This is a 32 bits algorithm and our tests suggest that 32bits is not
enough for large workloads.
During performance testing conflicting transactions are regularly seen in
Group Replication even when we know for sure that there are no conflicts in
the data.
This happens for instance when we have different members writing to different
databases and still having conflicts.

After some investigation on other sources of conflicts, the justification for
this is that we are hitting the limits of the 32 bit murmur hash used.

The 32bits in a hash allows up to 4G different hashes/rows, which seems a
reasonably large number to hash the in transit row changes. However, taking
into consideration the birthday paradox
(https://en.wikipedia.org/wiki/Birthday_problem), the probability of having
two rows with same hash being written to at the same time is in fact
significant, and the rate of conflicts in a larger transactions with many
rows is quite visible.

In fact, having less than 50% probability of having one conflict with 32bits
requires a group with at most 77.000 hashes, while with 64bits that group can
grow to 5.100.000.000 hashes.

There is also an issue with the hash feeding algorithm that can also lead to duplicated values.
One example is database "dba" + table "extefield13ld" and database
"dba3extefield" + table "ld1", which both generate the same initial hash
string "dba3extefield13ld13".

How to repeat:
1. Execute sysbench prepare stage with 1M rows simultaneously on two GR
members in two different databases

Given that prepare transactions are large and that the cover 25% of the hash
space with autoinc values, more time than not the prepare will fail with a
conflict.

Suggested fix:
1. Besides the 32bit write-set hash infrastructure, add a 64bit one;
2. Add an alternative to the the murmur32 hash (which has no 64bit version in the server), for example the XXH64 hash from the LZ4 package included in the server.
(XXH is also much faster on 64bit systems)

Posted by developer:
 
The new value has been added to the 5.7 docs. As this is related to GR and that hasn't been released yet, not adding change log entry.