Bug #53375 RBR + no PK => High load on slave (table scan/cpu) => slave failure
Submitted: 3 May 2010 14:25 Modified: 15 Dec 2011 11:17
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S2 (Serious)
Version:5.1.x OS:Any
Assigned to: Luis Soares CPU Architecture:Any
Triage: Triaged: D2 (Serious) / R3 (Medium) / E4 (High)

[3 May 2010 14:25] Shane Bester
if one runs DML on a table that has no indexes, a full table scan is done.
with RBR, the slave might need to scan the full table for *each* row changed.

consider this on the master with --binlog-format=row :
delete from t1 order by rand();

when t1 has N rows, and no primary/unique key,
the slave must read N² rows to process the delete.
for larger datasets this is unrealistic and will be doomed to fail.

on master:
mysql> delete from t1 order by rand();
Query OK, 78130 rows affected (2.61 sec)

on slave it takes 78130*78130 row reads to process (still running):
---TRANSACTION 0 1799, ACTIVE 731 sec, OS thread id 3672 fetching rows
mysql tables in use 1, locked 1
153 lock struct(s), heap size 30704, 78278 row lock(s), undo log entries 10045
MySQL thread id 5, query id 16 Reading event from the relay log

Number of rows inserted 78130, updated 0, deleted 10045, read 370899004
0.00 inserts/s, 0.00 updates/s, 16.20 deletes/s, 600754.85 reads/s

How to repeat:
#setup rbr master and slave.
#run on master:

drop table if exists t1;
create table t1(a int)engine=innodb;
insert t1 values (1),(2),(3),(4),(5);
set @a=5;
insert into t1 select (@a:=@a+1) from 
t1,t1 t2,t1 t3,t1 t4,t1 t5,t1 t6,t1 t7;
delete from t1 order by rand();

#watch as slave hits 100% of 1 cpu core and reads billions or rows.

Suggested fix:
if nothing can be fixed or worked around on the server level, then
clearly document the effects of having unindexed tables and row based replication:
[4 May 2010 8:43] Sveta Smirnova
Thank you for the report.

Verified as described.
[24 Jun 2010 10:06] Shane Bester
[1 Jul 2010 2:30] Trent Lloyd
This bug needs some serious attention, twice now I have run into this issue where the fix was to force kill the server, run the statement manually and then skip the binary log event.  Worst part is that PROCESSLIST, etc provide absolutely NO obvious indication what is going on, for something that may take 12 hours, 3 days or even more.. this is fairly unacceptable.

This needs a 2-part fix, better status information about reading through a large event and making this process much more effecient.
[14 Jul 2010 3:16] James Day
Workaround: add a primary key to the table.

To diagnose:

Observe 78,278 row locks but only 10,045 undo log entries, so many more rows being scanned than changed. Also observer 16 row deletes per second but 600,754 row reads per second, same mismatch between counts suggesting unindexed accesses are happening.

You may also see "invalidating query cache entries (table)" as a symptom in the processlist. If you see that, check to see whether this is the possible root cause instead of giving full blame to only the query cache.
[4 Aug 2010 10:57] Alfranio Correia
I was able to verify the problem in 5.1, 5.5 and next-mr
[15 Sep 2010 5:54] Frank Mueller
We have seen the same Problem with 5.1.50 on Solaris.
The "problem" breaks our replication for nearly 20 hours before i have to kill the whole instance and restore it with a master dump.
[12 Oct 2010 17:37] Dan V
I don't understand why this bug isn't treated with more severity.

It completely hamstrings any large volume setup - in my case, my only decent primary key is a surrogate key - and that's untenable because of the locking and lost concurrency (even with lock_mode = 2). Even if I solved that, I'd have to use the surrogate in partitioning - which more or less defeats the purpose of partitioning by lopsiding the partitions.
[28 Oct 2010 2:45] Roel Van de Paar
I have a customer with a 600Gb table. 

Adding an "otherwise usable (i.e. to improve query times)" PK is not really  an option for them since there are no short unique columns. 

A long composite key is also not an option because:

In InnoDB tables, having a long PRIMARY KEY wastes a lot of space. (See Section 13.6.10, “InnoDB Table and Index Structures”.) 
--- http://dev.mysql.com/doc/refman/5.1/en/create-table.html

In InnoDB, the records in nonclustered indexes (also called secondary indexes) contain the primary key columns for the row that are not in the secondary index. InnoDB uses this primary key value to search for the row in the clustered index. If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key. 
--- http://dev.mysql.com/doc/refman/5.1/en/innodb-index-types.html

So, we can create a normal short/auto-increment PK, but this is more or less the same as having the internal/hidden InnoDB PK (which does not seem to be used properly for RBR replication purposes).

As mentioned before, possibly the internal/hidden InnoDB PK can be used to resolve this bug.

In any case, this is a serious shortcoming in RBR. Note the input from Trent and others above.
[25 Nov 2010 7:10] Giuseppe Maxia
Bug#58481 is a duplicate of this one.
[25 Nov 2010 7:31] Jonas Oreland
FYI: NDB also uses a hidden primary key for tables without user defined 
primary keys. And NDB also really really would like to be able to use it in
this type of scenarios. But...with NDB we insert things into binlog using the injector. Our hidden primary key is normally 8 bytes, but can be 12 for certain partition types.
[29 Nov 2010 13:10] Mats Kindahl

Does your hidden primary key uniquely identify a row in any cluster?


Does the InnoDB hidden primary key identify a row on any server, even if the instance have been re-created from a backup?
[29 Nov 2010 13:15] Jonas Oreland

no it not unique in "any cluster".
it's simply an auto_increment unsigned 64-bit.

[30 Nov 2010 21:38] James Day
Mats, there's nothing that makes the InnoDB internal key consistent between a master and a slave or before and after backup and restore. Row with internal ID 1 can have completely different end user data values on different servers, so it's useless for the purpose being considered here, unfortunately.

Nor is there any prohibition on a slave having a unique key, which will be promoted to PK, even if there is no unique key on the master. They can even have different PKs and there can be good application reasons for doing that. Though we could require at least a unique key on all slaves that matches a master's PK without it hurting unduly.

It is possible to recommend at least _a_ key (not necessarily unique) on the slave and have replication try key-based lookups to narrow down the number of rows that must examined. That combined with batch processing should cut the pain a lot because we can reasonably ask for at least some non-unique but at least reasonably selective key to use. But this is only recommend, not require. If people want no key, we should let them have no key and be slow.
[30 Nov 2010 21:46] James Day
Dan, Frank, would using a non-unique key on slaves help your cases by reducing the work to find the right row? Is there any sensible way for the server to know which of possibly many non-unique keys should be used?

Or would doing the work using statement-based replication help you? Or mixed?
[30 Nov 2010 22:18] Dan V

It is my opinion that switching to SBR has way too many trade offs (if even for one table) to call it an acceptable workaround. The main crux for this argument being that just about the only time you run into this bug is when you have tables with a massive amount of rows - which is exactly where you start paying heavy penalties for SBR (Locking).

As it stands, if you have a massive amount of rows, you will need to add a primary key to get around this issue and live with the downsides of a primary key.

To answer your question - yes, in my particular scenario, having a non-unique key would mitigate this problem substantially, and still allow me to live without a primary key and thus partition sensibly. 

As far as how the server should know which key to use - am I correct in assuming that it will use the optimizer to determine the index, and you are asking what would happen if the optimizer picked the wrong one?
[30 Nov 2010 23:11] James Day
Dan, yes, that any any other ideas you may have about how we can best improve the way we do this.

With row-based we have to find the row somehow and when many rows are touched that means many rows. With mixed or statement-based there's more locking on the master but less work on the slave. I doubt that there's a perfect answer so I'm asking for feedback from anyone who has any ideas about what we should do. Maybe someone will come up with a better potential solution than those we've come up with.

Batching looks promising, at least it would reduce the number of scans. But it would still be very painful if no key could be used. While using a key would be very painful if a high percentage of the rows in the table were being touched. So maybe some mixed solution that depends on the count of rows being touched might be best.
[1 Dec 2010 0:38] Dan V

As much as anyone would love to have the perfect optimizer picking the correct index every time, it would seem that at least for now, that whatever the optimizer picks is fine.

I say this because if the optimizer picks the wrong index, that's the optimizer module's problem, not the replication module's problem - it can always be fixed/addressed there.

It would seem this would have no effect on existing RBR setups because you'd use the primary if it existed. Worst case, you're now using an index on what was previously a full table scan.
[28 Jul 2011 9:26] Jon Stephens
BUG#47972 is a duplicate of this bug.
[15 Dec 2011 11:17] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

[15 Dec 2011 11:17] Jon Stephens
Documented bugfix as follows in the 5.5.19 and 5.6.4 changelogs:

        When a statement containing a large number of rows to be applied
        on a slave table that does not contain a primary key, a
        considerable amount of time can be needed to find and change all
        the rows that are to be changed. The current fix helps diagnose
        this issue by printing a message to the error log if the
        execution time for a given statement replicated using row-based
        replication takes more than 60 seconds. log_warnings must be
        greater than 1 for this message to be printed to the error log.

[15 Nov 2012 5:53] Shane Bester
[26 Mar 2015 12:35] Andrii Nikitin
Similar testcase is in bug 76252 with following difference:
- applicable only to innodb
- the table has secondary index with very good selectivity (which differs from this testcase)
[16 Aug 2017 6:20] Shane Bester
[8 Dec 2017 0:26] Shane Bester
"The current fix helps diagnose
        this issue by printing a message to the error log if the
        execution time for a given statement replicated using row-based
        replication takes more than 60 seconds."

Funny that I have literally never seen this error message.  What does it say?
[8 Dec 2017 8:41] Sven Sandberg
Posted by developer:
It only existed for a few commits, possibly around 5.5.18-5.5.20 or so although it is hard to find exactly what was released due to the complexity of the git history around this time.

The message was introduced in 595a007d4e570f935ffc396aeb62480c2f414741. It was gone a month later, in adc4668568173e182cfe43a852ebe2e0cd10d1cf (in fact this is a merge where the left branch has the message and the right branch does not have it).

I suppose it was either lost by mistake during some difficult merge, or the improvements implemented due to WL#5597 were considered sufficient to make the message pointless. FWIW the message was:

+      sql_print_information("The slave is applying a ROW event on behalf of a%s statement "
+                            "on table %s and is currently taking a considerable amount "
+                            "of time (%lu seconds). This is due to the fact that it is %s "
+                            "while looking up records to be processed. Consider adding a "
+                            "primary key (or unique key) to the table to improve "
+                            "performance.", evt_type, table_name, delta, scan_type);