Description:
When using a fulltext plugin for encoded data, e.g. compressed text or text contained within a document format like PDF or Word .doc, the typical use case is that the column data is always encoded while the search term passed into
MATCH ... AGAINST(search term)
is always plain text.
Right now there is no easy way to distinguish between the DML and Query context so for all input the plugin parser first has to check whether it received plain or encoded/enclosed text, e.g. by probing for known "magic bytes".
In use cases where it is clear that DML input always needs special treatment while search query input is always plain text that can be handed over to the default parser right away knowing the actual parsing context would be nice for several reasons:
- less code than an input data check
- less error prone than an input data check
- faster than an input data check (although this might be neglected by the extra effort to pass down the context flag)
- no ambiguities ever
I'm aware that the flag approach fails if the AGAINST() input is ever passed in encoded form or if not all column data is encoded/enclosed, but IMHO there are lots of use cases with a clear data/search input distinction (e.g. i can't imagine a PDF enclosed search term being used in AGAINST()) to justify such extension (maybe with a "when and when not to use it" paragraph in the docs)
How to repeat:
.
Suggested fix:
Add a DML/AGAINST() context flag to st_mysql_ftparser_param.flags