MySQL Bugs: #84287: row inserts, statement updates, persistent stats lead to table scans+ lag slaves

Bug #84287	row inserts, statement updates, persistent stats lead to table scans+ lag slaves
Submitted:	20 Dec 2016 19:30
Reporter:	Shane Bester (Platinum Quality Contributor)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Row Based Replication ( RBR )	Severity:	S4 (Feature request)
Version:	5.7.17	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
If a master is configured with binlog_format=MIXED and there are
inserts done in ROW format,  along with UPDATEs done in statement format,
the slave ends up doing expensive table scans.

How to repeat:
setup master with --binlog_format=mixed.
setup slave with --skip-slave-start
configure replication..

start slave;

on master; setup the test tables.

--
set sql_mode="";
drop table if exists t;

create table t (
  k bigint(20) unsigned not null auto_increment,
  h varchar(99), j varchar(99), f varchar(99),
  e varchar(99), a varchar(99), b varchar(99),
  c varchar(99), d varchar(99), primary key (k),
  key(h),key(f),key(f,h)
) engine=innodb default charset=utf8 ;

drop table if exists r;
create table r (
  id bigint(20) unsigned not null,
  h varchar(99),  a varchar(99),  b varchar(99),
  c varchar(99),  d varchar(99),  e varchar(99),
  f varchar(99),  g varchar(99),  i varchar(99),
  j varchar(99),  k varchar(99),  l varchar(99),
  primary key (id),  key(h),key(i),key(i,h)
) engine=innodb default charset=utf8  ;

Now monitor the slave status carefully without touching any tables.
Do not run any DML/DDL queries on the slave as this could auto-calc table stats.

Run this on master:

mysqlslap.exe --query="insert ignore into t(h,f,j) values(floor(rand()*1000), floor(rand()*1000),floor(rand()*1000));insert ignore into r(id,h,i) values(floor(rand()*1000),floor(rand()*1000),floor(rand()*1000));update r,t set r.a=t.a,r.b=t.b,r.c=t.c,r.d=t.d,r.e=t.e,r.f=null,r.g=null where r.h=t.h and r.i=t.f and left(t.j,2)='11';"  --number-of-queries=10000 --concurrency=1  --create-schema=test  --iterations=10

slave should lag a lot.
slow query log on slave show be evidence of table scans.

Suggested fix:
workaround1:  stop slave; start slave;
workaround2:  analyze table t,r;
workaround3:  on slave, use --innodb_stats_persistent=0

if the query ran fast on master, we expect them to do the same on slave. the workarounds lead me to believe something odd happens here in the interaction with table stats/replication.

output of my slave status monitor during first part of a run. notice it gets further behind each second.

http://pastebin.com/raw/Dcsf4RYX

Monitoring of slave:

http://pastebin.com/raw/f5H2Vc5m

See loops 0 - 62 which is from 21:20:58 to 21:31:18. 
This is when the test was running on master (11 mins).

Slave took until 00:01:59 to catch up (2.5 hours).

Hi!

This turned out not to be a bug, since this was a matter of slave not updating InnoDB statistics. 

However, a new feature is designed that would remedy this problem. However, this is a new feature and would, therefore have another scheduling and another treatment.