Description:
Many times I am with customers who have multidimensional data and are interested into closeness-relationships.
Scenario 1: A social network has data on users: geographical x/y position, age, a list of interests, a list of bands and so on. The typical query is something like "find all users that are in a 20km range around me, are in an age range between x and y and have at least n intersts in common with me". This ends up being something like "select ..., count(interests) as ni from persons where x between ... and ... and y between ... and ... and age between ... and ... and interest in (...) and bands in ( ...) group by interests having ni > ...".
Essentially we want a subcube out of an n-dimensional hypercube.
Scenario 2: A mail provider wants to find all messages that may be false positives in a spam detection. These are messages that have a certain domainid, are from yesterday and have a spam score in a certain range. "select msgid from spam where domainid in ( ...) and ddate between now() - interval 1 day and now() and spamscore between ... and ..."
Again we cut a 3-dimensional subcube out of a larger cube.
Currently this is not well supported by MySQL: Btrees do not lend themselves to this kind of query very well, and we use only one index per query to speed this up. But this kind of query is relatively common with out users, and I think that this feature is rather urgent.
How to repeat:
This is a feature request.
Suggested fix:
a) create a proper RTREE index type that
- works on any number of dimensions, not just two as the current type does
- works on any compound of columns and is not limited to GIS types
or
b) find a way to use more than one BTREE index per table, e.g. use index 1 to build a temp table in memory, then lookup values from the temp to do lookups in index 2 on the same table to limit the result set further (maybe this can even be done without a materialisation of the temp).