Bug #16564 Efficient way to use IN() in prepared statements
Submitted: 17 Jan 2006 15:36 Modified: 26 Jan 2006 13:38
Reporter: Kevin Musker Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: Prepared statements Severity:S4 (Feature request)
Version: OS:Any
Assigned to: CPU Architecture:Any
Triage: Triaged: D5 (Feature request)

[17 Jan 2006 15:36] Kevin Musker
There doesn't seem to be a way to use an IN() clause in a prepared statement without adding a '?' for each value. This works fine for a one-off, but when using a prepared statement many times, it means adding lots of '?' characters and setting most of them to NULL (or some other suitable value).

How to repeat:
Not a bug - a feature request

Suggested fix:
Either allow multiple values to be assigned to a MYSQL_BIND object.
A function similar to mysql_stmt_send_long_data where the data buffer addresses an array of the corresponding data type (wouldn't work for text or blob types without an extra array containing the lengths of each string).
[17 Jan 2006 17:40] Valeriy Kravchuk
Thank you for a feature request. Can you name a RDBMS that supports this feature? And why not just to put that data into (temporary) table and join with it, if there are so many values in your IN clause?
[18 Jan 2006 10:55] Kevin Musker
Thanks for your response.

I'm not really familiar with any other RDBMS. I have around 3 years of MySQL experience, but very little outside of that.

A little background on out application:
We create objects in C++ from rows in a number of database tables. Up to the present, we have obtained the data from the database via mysql_real_query. These data are then parsed from cstrings into their proper types.
When the objects are constructed using prepared statements, rather than the query mechanism, we experience an improved throughput of between 15% and 20%. To clarify, these are small queries normally returning 1 (or a very small number (< 5) of) row(s); Therefore not having to parse data from cstrings offers a significant benefit.
Another benefit is that the prepared statements can be created upon initialisation of the app and reused throughout. This reduces latency (albeit maybe not by much), but also means that all queries get parsed in one place, so any SQL errors get caught early on.

Back to the request though :)
The values used to decide what data we select to create the objects are obtained from outside of mysql, hence the need for an IN() clause.

I guess there are two options:

The first is to do as you suggest and insert the values into a temporary table, and join to it to decide which rows are needed.
As I haven't tried this yet, I suspect that the 15-20% improvement from use prepared statements would be lost.

The other option is to create the prepared statements with a large number of placeholders in the IN() clause (e.g. IN(?,?,?...)). Executing the statement would then consist of setting a number of the MYSQL_BIND parameters to valid values, and the rest to NULL.
This isn't ideal as it doesn't support an arbitrary number of values. It also seems quite inefficient as the protocol would send all the used values.

This feature may not be used by many people, but I think it would be a significant addition so that the efficiency gains of prepared statements could be made available to a larger number of applications.

Your thoughts are much appreciated.
[23 Jan 2006 16:35] Valeriy Kravchuk
I can suggest you to try my approach, with some optimization like using prepared statements to insert into that temporary table etc. You can also prepare set of statements, with IN (?), IN (?, ?), IN (?, ?, ?) and so on, to not include NULLs or any other values... But my approach can give you additional benefits of efficient JOINs versus not always efficient IN optimization in MySQL.

I can mark this feature request as verified, but to implement it MySQL will need to implement some kind of ARRAY or LIST (COLLECTION) data types, and they will not be implemented in the nearest future for sure.
[25 Jan 2006 16:35] Kevin Musker
Thanks for you help Valeriy.

I have yet to try out your two suggestions, but when I do, I will post the results back here in case anyone might be interested.

Thanks again.
[26 Jan 2006 8:12] Valeriy Kravchuk
OK. Reopen this report when you'll have any useful results. I can surely mark this as verified feature request, but don't expect the feature to be implemented soon.
[26 Jan 2006 13:50] Andrey Hristov
Why don't you preprocess the query you send to the DBMS. Like find where IN(?) is and then just replace with the proper number of "?" separated with ?. To make it event faster you can build some static table of strings so for low number of arguments you copy from already prepared string. copy into a buffer everything before "IN(", then the already prepared strin then everything after the ")". For high number of arguments you may generate the IN() string.

Hope this helps