Bug #89339 RBR slave index choosing logic is too simple
Submitted: 22 Jan 2018 4:04 Modified: 16 Mar 2018 4:59
Reporter: Fungo Wang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S4 (Feature request)
Version:5.6.39,5.7.21 OS:Any
Assigned to: CPU Architecture:Any

[22 Jan 2018 4:04] Fungo Wang
Description:
When applying row event on slave, the logic for picking up index to use is too simple. As we can see in `log_event.cc:search_key_in_table()`, it is based on **rules**, not considered data distribution at all:

1. PK
2. UK without NULL
3. other keys(include UK with NULL)
4. table_scan

When rule 1 and 2 can not be used (such as there are NO PK or UK), slave can easily choose a bad index and cause too many table/index scans.

I can understand that it's asking too much to dive into optimizer for each row applying, but maybe we can leverage some basic index statistics and work as a simplified optimizer.

HASH_SCAN is good improvement for such issue, but under situation when bad index is picked up, it is more a workaround than a solution touching the root cause, i.e. choosing the best index.

How to repeat:
check the logic in `log_event.cc:search_key_in_table()`
[22 Jan 2018 8:33] Alexey Kopytov
Alternatively, let the user define the index use by replication, similar
to the REPLICA IDENTITY functionality in PostgreSQL?
[8 Mar 2018 9:55] Fungo Wang
About this issue, MariaDB use index cardinality info to choose a better index, detail at https://mariadb.com/kb/en/library/row-based-replication-with-no-primary-key/

And AliSQL give index with auto_increment high priority over other normal indexes:
https://github.com/alibaba/AliSQL/commit/9d1a800eea8df864ea28f1018d895cfd271a3142

Both try to pick a better selective index based on statistics, beyond the **rule**. 
I think MySQL could do more about this issue.
[16 Mar 2018 4:59] MySQL Verification Team
Hello Fungo Wang,

Thank you for the report and feature request!

Regards,
Umesh