MySQL Bugs: #53375: RBR + no PK => High load on slave (table scan/cpu) => slave failure

Bug #53375	RBR + no PK => High load on slave (table scan/cpu) => slave failure
Submitted:	3 May 2010 14:25	Modified:	15 Dec 2011 11:17
Reporter:	Shane Bester (Platinum Quality Contributor)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Row Based Replication ( RBR )	Severity:	S2 (Serious)
Version:	5.1.x	OS:	Any
Assigned to:	Luis Soares	CPU Architecture:	Any

Description:
if one runs DML on a table that has no indexes, a full table scan is done.
with RBR, the slave might need to scan the full table for *each* row changed.

consider this on the master with --binlog-format=row :
delete from t1 order by rand();

when t1 has N rows, and no primary/unique key,
the slave must read N² rows to process the delete.
for larger datasets this is unrealistic and will be doomed to fail.

on master:
mysql> delete from t1 order by rand();
Query OK, 78130 rows affected (2.61 sec)

on slave it takes 78130*78130 row reads to process (still running):
---TRANSACTION 0 1799, ACTIVE 731 sec, OS thread id 3672 fetching rows
mysql tables in use 1, locked 1
153 lock struct(s), heap size 30704, 78278 row lock(s), undo log entries 10045
MySQL thread id 5, query id 16 Reading event from the relay log

Number of rows inserted 78130, updated 0, deleted 10045, read 370899004
0.00 inserts/s, 0.00 updates/s, 16.20 deletes/s, 600754.85 reads/s

How to repeat:
#setup rbr master and slave.
#run on master:

drop table if exists t1;
create table t1(a int)engine=innodb;
insert t1 values (1),(2),(3),(4),(5);
set @a=5;
insert into t1 select (@a:=@a+1) from 
t1,t1 t2,t1 t3,t1 t4,t1 t5,t1 t6,t1 t7;
delete from t1 order by rand();

#watch as slave hits 100% of 1 cpu core and reads billions or rows.

Suggested fix:
if nothing can be fixed or worked around on the server level, then
clearly document the effects of having unindexed tables and row based replication:
http://dev.mysql.com/doc/refman/5.1/en/replication-sbr-rbr.html

Thank you for the report.

Verified as described.

http://mysqlbugs.blogspot.com/2010/05/beware-of-rbr-and-tables-without.html

This bug needs some serious attention, twice now I have run into this issue where the fix was to force kill the server, run the statement manually and then skip the binary log event.  Worst part is that PROCESSLIST, etc provide absolutely NO obvious indication what is going on, for something that may take 12 hours, 3 days or even more.. this is fairly unacceptable.

This needs a 2-part fix, better status information about reading through a large event and making this process much more effecient.

Workaround: add a primary key to the table.

To diagnose:

Observe 78,278 row locks but only 10,045 undo log entries, so many more rows being scanned than changed. Also observer 16 row deletes per second but 600,754 row reads per second, same mismatch between counts suggesting unindexed accesses are happening.

You may also see "invalidating query cache entries (table)" as a symptom in the processlist. If you see that, check to see whether this is the possible root cause instead of giving full blame to only the query cache.

I was able to verify the problem in 5.1, 5.5 and next-mr

We have seen the same Problem with 5.1.50 on Solaris.
The "problem" breaks our replication for nearly 20 hours before i have to kill the whole instance and restore it with a master dump.

I don't understand why this bug isn't treated with more severity.

It completely hamstrings any large volume setup - in my case, my only decent primary key is a surrogate key - and that's untenable because of the locking and lost concurrency (even with lock_mode = 2). Even if I solved that, I'd have to use the surrogate in partitioning - which more or less defeats the purpose of partitioning by lopsiding the partitions.

Bug#58481 is a duplicate of this one.

FYI: NDB also uses a hidden primary key for tables without user defined 
primary keys. And NDB also really really would like to be able to use it in
this type of scenarios. But...with NDB we insert things into binlog using the injector. Our hidden primary key is normally 8 bytes, but can be 12 for certain partition types.

Jonas,

Does your hidden primary key uniquely identify a row in any cluster?

Sinisa,

Does the InnoDB hidden primary key identify a row on any server, even if the instance have been re-created from a backup?

mats,

no it not unique in "any cluster".
it's simply an auto_increment unsigned 64-bit.

/Jonas

Mats, there's nothing that makes the InnoDB internal key consistent between a master and a slave or before and after backup and restore. Row with internal ID 1 can have completely different end user data values on different servers, so it's useless for the purpose being considered here, unfortunately.

Nor is there any prohibition on a slave having a unique key, which will be promoted to PK, even if there is no unique key on the master. They can even have different PKs and there can be good application reasons for doing that. Though we could require at least a unique key on all slaves that matches a master's PK without it hurting unduly.

It is possible to recommend at least _a_ key (not necessarily unique) on the slave and have replication try key-based lookups to narrow down the number of rows that must examined. That combined with batch processing should cut the pain a lot because we can reasonably ask for at least some non-unique but at least reasonably selective key to use. But this is only recommend, not require. If people want no key, we should let them have no key and be slow.

Dan, Frank, would using a non-unique key on slaves help your cases by reducing the work to find the right row? Is there any sensible way for the server to know which of possibly many non-unique keys should be used?

Or would doing the work using statement-based replication help you? Or mixed?

James,

It is my opinion that switching to SBR has way too many trade offs (if even for one table) to call it an acceptable workaround. The main crux for this argument being that just about the only time you run into this bug is when you have tables with a massive amount of rows - which is exactly where you start paying heavy penalties for SBR (Locking).

As it stands, if you have a massive amount of rows, you will need to add a primary key to get around this issue and live with the downsides of a primary key.

To answer your question - yes, in my particular scenario, having a non-unique key would mitigate this problem substantially, and still allow me to live without a primary key and thus partition sensibly. 

As far as how the server should know which key to use - am I correct in assuming that it will use the optimizer to determine the index, and you are asking what would happen if the optimizer picked the wrong one?

Dan, yes, that any any other ideas you may have about how we can best improve the way we do this.

With row-based we have to find the row somehow and when many rows are touched that means many rows. With mixed or statement-based there's more locking on the master but less work on the slave. I doubt that there's a perfect answer so I'm asking for feedback from anyone who has any ideas about what we should do. Maybe someone will come up with a better potential solution than those we've come up with.

Batching looks promising, at least it would reduce the number of scans. But it would still be very painful if no key could be used. While using a key would be very painful if a high percentage of the rows in the table were being touched. So maybe some mixed solution that depends on the count of rows being touched might be best.

James,

As much as anyone would love to have the perfect optimizer picking the correct index every time, it would seem that at least for now, that whatever the optimizer picks is fine.

I say this because if the optimizer picks the wrong index, that's the optimizer module's problem, not the replication module's problem - it can always be fixed/addressed there.

It would seem this would have no effect on existing RBR setups because you'd use the primary if it existed. Worst case, you're now using an index on what was previously a full table scan.

BUG#47972 is a duplicate of this bug.

See:
http://binlogtorelaylog.blogspot.in/2012/08/Batch-operations-in-RBR.html

Similar testcase is in bug 76252 with following difference:
- applicable only to innodb
- the table has secondary index with very good selectivity (which differs from this testcase)

http://jfg-mysql.blogspot.co.za/2017/08/danger-no-pk-with-RBR-and-mariadb-protection.html

So the real solution is to force all tables to have a PK:

https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_sql_require_pr...

"Enabling this variable helps avoid performance problems in row-based replication that can occur when tables have no primary key. Suppose that a table has no primary key and an update or delete modifies multiple rows. On the replication source server, this operation can be performed using a single table scan but, when replicated using row-based replication, results in a table scan for each row to be modified on the replica. With a primary key, these table scans do not occur. "