MySQL Bugs: #76252: InnoDB ROW event handling too zelous for large transactions on table without PK

Bug #76252	InnoDB ROW event handling too zelous for large transactions on table without PK
Submitted:	11 Mar 2015 1:02
Reporter:	Andrii Nikitin	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.6.23	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
For tables without Unique Key InnoDB changes enormous number of pages when handling ROW events for big transactions. I consider such absence of scaling as a bug, because I don't see any reason why InnoDB would behave like that (assuming the table has secondary index with very good selectivity). In comparison, MyISAM handles identical events instantly, while InnoDB may take ~30 min to handle 10K rows.

In particular, following are numbers of modified pages as reported by increment in status variable "innodb_pages_written" on master and slave, which indicate very bad scaling of such scenario:

 1K rows | Master 37  | Slave 41
 3K rows | Master 54  | Slave 164
 5K rows | Master 72  | Slave 548
 7K rows | Master 93  | Slave 1294
10K rows | Master 123 | Slave 1853

How to repeat:
Following mtr test case will demonstrate the problem:
(it sleeps ~10 minutes to let slave catch up, so you may need to adjust sleeping time or number of involved rows for your environment):

--source include/master-slave.inc 
--source include/have_binlog_format_row.inc 

connection master; 

drop table if exists z;

create table z(
a0 varchar(32),
a1 varchar(32),
a2 varchar(32),
a3 varchar(32),
a4 varchar(32),
a5 varchar(32),
a6 varchar(32),
a7 varchar(32),
a8 varchar(32),
a9 varchar(32),
index(a0),
index(a1)
) engine = innodb, default character set utf8;

set @N=1;
insert into z select 0, @N:=@N+1, @N, @N, @N, @N, @N, @N, @N, @N from mysql.help_topic a, mysql.help_topic b limit 10000;

do sleep(30);
let $mw1= query_get_value(SHOW GLOBAL STATUS LIKE 'innodb_pages_written', Value, 1);

connection slave;
let $sw1= query_get_value(SHOW GLOBAL STATUS LIKE 'innodb_pages_written', Value, 1);

connection master;

start transaction;
set @N=1;
update z set a0 = @N:=(@N+1);
commit;

do sleep(600);
let $mw2= query_get_value(SHOW GLOBAL STATUS LIKE 'innodb_pages_written', Value, 1);
let $mw= `select $mw2-$mw1`;

connection slave;

let $sw2= query_get_value(SHOW GLOBAL STATUS LIKE 'innodb_pages_written', Value, 1);
let $sw= `select $sw2-$sw1`;

--echo Master wrote $mw = $mw2 - $mw1 pages
--echo Slave  wrote $sw = $sw2 - $sw1 pages
 
do sleep(5);
show slave status;
select max(a0) from z;

connection master;
drop table z;
do sleep(5);
--source include/rpl_end.inc 

Suggested fix:
InnoDB's handling of Update_row events should be as fast as processing similar UPDATE commands and competitive to processing identical Update_row events by MyISAM. Currently it is thousands times slower for some scenarios.

I believe problem may be related to something like in bug #67044 , because event processing is much faster when the table has no secondary index at all.

Posted by developer:
 
PROBLEM
-------------

Since the tables mentioned in the bug don't have any PK defined on them we are forced to create a
hidden DB_ROW_ID column for them .The problem is that we don't log this hidden column in bin log
so the DB_ROW_ID's generated for the table in slave is different than the master.Therefore while applying
row log event (updates)we cannot depend on the cluster index of the salve and we are forced to do full 
table scans for updates.  This is the reason for being slow.
    To fix this problem we have to expose the hidden column and write it in bin log for it to be replicated 
in slave too  which is too risky for GA versions. There are work logs WL7313 and WL7314 present
which plan to do it.

I have reproduced this problem in versions 5.5.44, 5.6.24, and 5.7.17.
My test was to create an InnoDB table without any index or key, insert many rows, and then issue a "delete" without a where condition.
In all three versions, the number of "InnoDB buffer pool read requests" during the delete on the master was proportional to the number of rows, but on the slave to the square of the number of rows.

With 200,000 rows, the master requested between 11 (5.5, 5.6) and 8 pages (5.7) per row deleted, but the slave requested 1,900 pages per row deleted!

AIUI, as there was no PK, InnoDB must have assigned the DB_ROW_ID in entry sequence, on both the master and the slave, and this should also be the order in the B-tree.
The "delete" on the master must have found the rows in that same sequence, deleted them, and therefore written the binlog entry in that sequence.
When processing these binlog entries, the slave should always find the (currently) first row as a match for that log entry, and so could delete it.
Why does it need to scan the complete table after finding a matching record?

If I created any index on the slave (not on the master) before the delete, the slave did use it.
If the selectivity was bad (129 distinct values in 200 k rows), the slave requested even more pages (2,800 .. 3,150 per row).
With better selectivity (10,000 distinct values), the request count fell to about 50 per row, with a unique index it was 19 .. 22, with a primary key it was about 9 page requests per row deleted.

My test setup uses additional mechanisms not available in "mysqltest", but I can provide the status counters if that is of any help.