MySQL Bugs: #89339: RBR slave index choosing logic is too simple

Bug #89339	RBR slave index choosing logic is too simple
Submitted:	22 Jan 2018 4:04	Modified:	16 Mar 2018 4:59
Reporter:	Fungo Wang (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Row Based Replication ( RBR )	Severity:	S4 (Feature request)
Version:	5.6.39,5.7.21	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
When applying row event on slave, the logic for picking up index to use is too simple. As we can see in `log_event.cc:search_key_in_table()`, it is based on **rules**, not considered data distribution at all:

1. PK
2. UK without NULL
3. other keys(include UK with NULL)
4. table_scan

When rule 1 and 2 can not be used (such as there are NO PK or UK), slave can easily choose a bad index and cause too many table/index scans.

I can understand that it's asking too much to dive into optimizer for each row applying, but maybe we can leverage some basic index statistics and work as a simplified optimizer.

HASH_SCAN is good improvement for such issue, but under situation when bad index is picked up, it is more a workaround than a solution touching the root cause, i.e. choosing the best index.

How to repeat:
check the logic in `log_event.cc:search_key_in_table()`

Alternatively, let the user define the index use by replication, similar
to the REPLICA IDENTITY functionality in PostgreSQL?

About this issue, MariaDB use index cardinality info to choose a better index, detail at https://mariadb.com/kb/en/library/row-based-replication-with-no-primary-key/

And AliSQL give index with auto_increment high priority over other normal indexes:
https://github.com/alibaba/AliSQL/commit/9d1a800eea8df864ea28f1018d895cfd271a3142

Both try to pick a better selective index based on statistics, beyond the **rule**. 
I think MySQL could do more about this issue.

Hello Fungo Wang,

Thank you for the report and feature request!

Regards,
Umesh