Bug #74303 commited rows are visible before log sync when binlog not used
Submitted: 9 Oct 2014 17:56 Modified: 12 Oct 2014 17:51
Reporter: Mark Callaghan Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.7.5 OS:Any
Assigned to: Assigned Account CPU Architecture:Any

[9 Oct 2014 17:56] Mark Callaghan
Description:
This was reported in a PhD thesis and then discussed on a Postgres advocacy list:
http://www.postgresql.org/message-id/5435B1E4.7000900@aklaver.com

I think this is only an issue when the binlog is disabled because the race occurs in the commit processing step of InnoDB and when binlog is enabled then there is innodb prepare step and binlog write/flush step prior to innodb commit step and durability is guaranteed by the time innodb commit step is reached.

Changes from a transaction can be visible to others before the redo log is forced to disk. I am not an expert in this part of InnoDB, but the problem is that innodb_commit calls innobase_commit_low and then trx_commit_complete_for_mysql. Changes from a transaction are visible to others when innobase_commit_low returns before trx_commit_complete is called.

innobase_commit_low -> trx_commit_for_mysql -> trx_commit -> trx_commit_low -> trx_commit_in_memory and the commit becomes visible to new snapshots there. trx_commit_in_memory calls:
1) lock_release_trx_locks
2) UT_LIST_REMOVE(trx_sys->rw_trx_list, trx);
3) some other things

Is #2 what makes it visible to new snapshots?

How to repeat:
1) Add a call to sleep for 10 seconds in trx_commit_complete_for_mysql before the call to trx_flush_log_if_needed
2) do auto-commit insert from one client
3) do auto-commit select from that table in another client

Client for #3 will see result almost immediately, long before 10 seconds are up. 

Suggested fix:
don't know
[4 Jan 2016 2:01] zhai weixiang
Any plan to fix this bug ?  Have click the  "Affects me" because we may turn off binary log in the near future. 

There are two potential ways to fix this bug.
1. sync redo log before releasing transaction id from rw transaction list, and we need to find another way to implement group commit of redo logs, or we may encounter performance regression.
2. keep trx_id in global read-write transaction set and  erase it after the   corresponding log is write/sync to file.