MySQL Bugs: #74919: purge should remove intermediate rows between far snapshots

Bug #74919	purge should remove intermediate rows between far snapshots
Submitted:	18 Nov 2014 18:15	Modified:	25 Jun 2015 2:53
Reporter:	Domas Mituzas	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	all	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
InnoDBs MVCC allows repeatable-reads of snapshots. For that it keeps old rows. If there's a one year old transaction, InnoDB will keep all row versions that were created after that transaction. 

There're plenty reasons why that is a bad idea - space usage, algorithmic complexity of operations on top of multiversioned rows, etc. As there're no semantics that would allow to access intermediate row versions (that is, it is impossible for transaction to advance to arbitrary LSNs), an obvious solution is for purge to drop intermediate rows. 

In more practical settings that means one can do logical dumps and other types of consistent reads of larger MySQL instances that have high churn of rows. 

Workarounds like "you should be using replica for your logical dumps" don't always work because:
a) there may be no replica existing
b) replica can be unable to replicate because of MVCC overhead, leading back to a)

While we're at it, deferring purges of individual tables/schemas while keeping up with other schemas may be of very high practical use in multitenant environments. 

This is major pain point in various operational environments where customers cannot be supported on existing MySQL/InnoDB solution. 

How to repeat:
SET wait_timeout=86400*365;
BEGIN;
...

Suggested fix:
purge intermediate rows
introduce semantics that allow deferring purges on individual tables or schemas

ActiveMQ by default uses an 'interesting' approach to make sure that only a single active instance exists: 

* BEGIN TRANSACTION;

* SELECT ...some_row... FOR UPDATE;  -- this will block if another instance
                                     -- already exists and will eventually 
                                     -- get a lock timeout, after which a new
                                     -- transaction will try to lock it again
                                     -- ... and again ...

* UPDATE ...some_row...;             -- then sleep a bit, and repeat ... and repeat ...

With MySQL Cluster this will eventually run into bug #56929 after updating the same row ~32000 times within the same transaction, and bug #65037 as ndbd data nodes won't restart unless all ongoing transactions have terminated ...

With InnoDB the effects are not as severe, but bad enough, so a more intelligent 
visibility detection algorithm than just "newer than the oldest still active transaction" that can discard intermediate rows not visible to any transaction anymore would be of great help in this "neverending transaction" use case, too ...

(IMHO ActiveMQ should find a more clever way of coordinating different instances, but that's a different story)