MySQL Bugs: #87213: PREFETCH Option Not Working with SSPS

Bug #87213	PREFETCH Option Not Working with SSPS
Submitted:	26 Jul 2017 20:29	Modified:	29 Aug 2017 4:33
Reporter:	Vibin Varghese	Email Updates:
Status:	In progress	Impact on me:	None
Category:	Connector / ODBC	Severity:	S2 (Serious)
Version:	5.3.9	OS:	Red Hat
Assigned to:		CPU Architecture:	Any
Tags:	cursor, MySQL, ODBC driver, Prefetch

Description:
Hello There,

I'm running into an issue with results from Prefetch feature of MySQL ODBC Connector.
This a COBOL implementation running embedded SQL hitting Amazon RDS(MySQL), using MySQL ODBC Connector.

The issue is PREFETCH feature of ODBC Connector doesn't seem to be happening when there is parameter marker in the cursor SQL.
For example, the below SQL works fine with Prefetch.
Select C1, C2.. Cn FROM T1 WHERE SEQ_KEY >= 0 

But, the this one eventually times out due to PREFETCH not happening.
MOVE 0 to VAR-SEQ-ID
Select C1, C2.. Cn FROM T1 WHERE SEQ_KEY >= :VAR-SEQ-ID

The COBOL vendor claims their calls to the ODBC connector remain the same in both cases.

Is this a known problem?

Any help in determining the issue would be greatly appreciated.

Environment Details
OS - RHEL 7, 64-bit
Database - Amazon Aurora (Aurora 1.11, InnoDB 1.2.10)
MySQL 5.6.10
ODBC - mysql-connector-odbc-5.3.8-1.el7.x86_64
 

Thanks,
Vibin

How to repeat:
SQL Cursors(retrieving large results sets) with parameter markers timing out.

Hi Vibin,

Thank you for your interest in MySQL software.
Unfortunately, I am not familiar with Cobol and therefore I need to ask a few more questions in order to determine the nature of the problem.

 1. You mentioned host variables, what exactly does that mean?
    Are they MySQL server variables? I do not see any of them being used here.
    Are they parameters for SQL prepared statements or something else?

 2. What cursor are we talking about here?
    Is that a server-side cursor? ODBC cursor? Cursor object in Cobol?

 3. The syntax like "Select C1, C2.. Cn FROM T1 WHERE SEQ_KEY >= 0" is not
    a valid SQL syntax. Is that just a figure meaning a sequence of columns
    C2, C3, C4,..., Cn or a real query?

 4. How do you define the number of rows to prefetch?
    Is it through the connection string, DSN or in some other way?

 5. What is the number of rows to prefetch?

 6. We need to examine the query that was actually sent to the server.
    In other words that should be the actual SQL query from the
    MySQL server general query log exactly as the server had received it.

The list of questions is long, but I would like you to answer all of them.
Thanks for your cooperation.

Hey Bogdan,

Thanks for your response.
Following are the answers to the questions you've raised.

1. You mentioned host variables, what exactly does that mean?
Are they MySQL server variables? I do not see any of them being used here.
Are they parameters for SQL prepared statements or something else?
Ans: These are not MySQL Server variables; indeed, these are parameters in the SQL prepared statements.

2. What cursor are we talking about here?
Is that a server-side cursor? ODBC cursor? Cursor object in Cobol?
Ans: These are cursor objects issued from Cobol.

3. The syntax like "Select C1, C2.. Cn FROM T1 WHERE SEQ_KEY >= 0" is not
a valid SQL syntax. Is that just a figure meaning a sequence of columns
C2, C3, C4,..., Cn or a real query?
Ans: It's just a figure meaning a sequence of columns. There is no issue with the syntax of the SQL. My apologies if the representation gave the wrong impression.

4. How do you define the number of rows to prefetch?
Is it through the connection string, DSN or in some other way?
Ans: The number of rows to be prefetched is provided with the DSN definitions in the odbc.ini file.

5. What is the number of rows to prefetch?
Ans: I've tried different range of values like 1, 25, 50 and 100. Only when there's a parameter marker in the SQL(cursor) the driver is unable to block-fetch data. If the param marker avoided by hard-coding in the intended value, this works like a charm.

6. We need to examine the query that was actually sent to the server.
In other words, that should be the actual SQL query from the
MySQL server general query log exactly as the server had received it.
Ans: The queries and general query log output attached as file "CUS_SCORES_Tables_Queries.txt"
When no param-marker is there for the query in the gen-qry-log, I see multiple instances with LIMIT as the prefetch values and varying offset to facilitate block-fetch.
Whereas with param-marker present, there only one instance of the query with the default place-holder(?) against the variable fields in the query.

Please feel free to let me know if you've got any more questions.

Thanks,
Vibin

Hi Vibin,

Thank you very much for giving the detailed answers to my questions.
I was able to repeat the problem using a C test case with ODBC Unit tests framework:

DECLARE_TEST(t_prefetch_bug)
{
    int nInData = 1;
    DECLARE_BASIC_HANDLES(henv1, hdbc1, hstmt1);

    is(OK == alloc_basic_handles_with_opt(
             &henv1, &hdbc1, &hstmt1, NULL,
             NULL, NULL, NULL, "PREFETCH=5"));

    ok_sql(hstmt, "DROP table IF EXISTS b_prefecth");
    ok_sql(hstmt, "CREATE table b_prefecth(i int)");

    ok_sql(hstmt, "insert into b_prefecth values(1),(2),(3),(4)," \
          "(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16)");

    ok_stmt(hstmt1, SQLPrepare(hstmt1,
      "select* from b_prefecth WHERE i > ?", SQL_NTS));
    ok_stmt(hstmt1, SQLBindParameter(hstmt1, 1, SQL_PARAM_INPUT,
      SQL_C_LONG, SQL_INTEGER, 0, 0, &nInData, 0, NULL));
    ok_stmt(hstmt1, SQLExecute(hstmt1));

    is_num(5, myrowcount(hstmt1));

    free_basic_handles(&henv1, &hdbc1, &hstmt1);

    ok_sql(hstmt, "DROP table IF EXISTS b_prefecth");

    return OK;
}

The result of running the test is:
FAILED
# Total rows fetched: 15
# 5 (5) != 15

Setting the bug status to VERIFIED

Hi Vibin,

The test case enables pre-fetching of 5 rows by setting PREFETCH=5 in the connection string. Then it uses the parametrized query "select* from b_prefecth WHERE i > ?" to get all rows with "id > 1" where ? = 1. There are 15 of rows that satisfy the criteria.

It is expected that the ODBC driver will compose a query like this:

select* from b_prefecth WHERE i > 1 LIMIT 0, 5;

Hence the test program is expected to receive 5 rows in the result set.
However, it receives all 15 rows, which is wrong. Looking into the MySQL server general query log I could see the query without LIMIT 0, 5.

We are currently working on fixing this bug. A temporary workaround, which might work for you is manually adding "LIMIT num_offset, num_rows" to your SELECT. It might also be not applicable to your application and therefore we are working on the solution for the driver itself.

Hey Bogdan,

Thanks a lot for that explanation.

I understand that the fact that you do not see the below query is pretty much a confirmation that prefetch is not working.
select* from b_prefecth WHERE i > 1 LIMIT 0, 5;

Although, just so I'm clear... 
Would the presence of prefetch in the conn str alter the result of the query? I was quite surprised that you were expecting 5 as the count of rows returned?

Appreciate that the work is already in progress.

Would you happen to have an idea as to when would be an updated version GA?

Thanks,
Vibin

Hi Vibin,

Yes, the presence of PREFETCH option in the connection string results in LIMIT clause to be appended to the query. That is why the expected number of rows was 5.

Unfortunately, I am not allowed to answer questions about the future releases.
Accordingly to the company policy I cannot publicly comment on the dates of the software products releases (even approximately) before the official announcement is made.

Hi Vibin,

Another workaround that should work well for you is using NO_SSPS=1; in the connection string or in the Setup dialog for DSN click "Details >>" then "Misc" tab then set a checkbox for "Prepare statements on the client".
It worked for me.

Please let me know if it has helped.

Hello Bogdan,
Thanks so much for getting back. 
Yes, I'm glad to let you know that this option has worked quite well. 

Would you be kind enough to let me know if this would have any potential untoward regression, especially given that this is not the default behavior of the driver? 

Any performance or other considerations, I need to factor in?

I'm a little curious - since my tests indicate that this option somehow influences the Optimizer with query plan evaluation. (in a positive way for my tests).
In a case where I thought I needed to provide a HINT(FORCE INDEX to avoid a FILESORT), the query seems to work well without the HINT under the influence of NO_SSPS option. 

I can send you the details if you so wish.  

As always, Thanks so much. 

Thanks,
Vibin

Hi Vibin,

Thank you for your feedback and for letting me know how the workaround worked for you.

The option NO_SSPS=1 prepares statements on the client side instead of the server side. This could impact the performance both ways depending on the usage scenario (though the performance variations are mostly minor).

Here is what happens when statements are prepared on the server:

 1. Client side sends a statement with a parameter.
   SELECT * FROM tab WHERE id = ?
 2. Client supplies parameter values separately ( ? = N )
 3. Client sends request to execute the query
 4. Server executes the query and sends the results back to the client

If the same query needs to be executed again with a different parameter the client starts again from step 2 by supplying the new parameter value. This saves time when executing sequences of similar queries that have only different parameter values.

Here is what happens when statements are prepared on the client:

 1. Client side sends a statement with a parameter (but driver delays sending of the query the query until step 3).
   SELECT * FROM tab WHERE id = ?
 2. Client supplies parameter values separately ( ? = N ) (parameter values are delayed until step 3)
 3. The driver replaces the parameter markers '?' by the actual values given on step 2.
 4. Client sends the whole query to the server
 5. Server executes the query and sends the results back to the client

Preparing queries on the client might be faster if the query is only executed once because sending packets to the server is done only once. Also, the optimizer might work better because all values are already in the query. However, if the client needs to execute a similar query with a different parameter the process will start from step 1 and therefore it is a less preferred option for such scenarios.

I hope this answers your question.
Please let me know if you have more questions.

Just a small detail: for step 4 describing the client side the server receives the query as "SELECT * FROM tab WHERE id=N"
The parameter marker '?' is replaced by an actual value.

Thanks for that detailed explanation Bogdan.
It confirms my understanding.

However, something that I'm still pondering over is...

There is an SQL in the application for which I have an Optimizer hint on. (FORCE INDEX to avoid a FILESORT for order-by in the query). Now with the NO_SSPS turned ON, the query seems to return instant results even without FORCE INDEX. 
Whereas with NO_SSPS turned OFF, and NO_CACHE turned ON, the query runs for (> 30 mins) and times out eventually for the reason that FILESORT is carried out. 

The explain reports and SQLs are all present in the attachement.

The explain I've done here are after with the actual values(and not parameter markers) - so t he explain should be reflective of the access plan under the influence of NO_SSPS.

Let me know if this behavior is expected. 

Thanks,
Vibin

Hi Vibin,

The parameter NO_SSPS has no influence on the server side.

When NO_SSPS is enabled the server receives the whole query without parameters.
For instance, something like this can be found in the Server general query log:

1233 Query   SELECT * FROM tab WHERE id = 1

If SSPS (Server-side prepared statements) are enabled in ODBC the server log will have this:

1233 Prepare   SELECT * FROM tab WHERE id = ?
1233 Execute   SELECT * FROM tab WHERE id = 1

There is really no execution plan for the Prepare stage, but Execute stage is also different from Query.

Hey Bogdan,

Just an update. 
I was not sure if there needs to be new case reported for this. 
I expect you'd advise if that's necessary when you update.

Here is the thing.

Like you suggested, we've been using the NO_SSPS option of the Connector as a workaround to the PREFETCH related bug that's currently under fix.

We've now hit a problem with 'SELECT FOR UPDATE' type cursor SQLs, where I hit 1064 error - 'Invalid Syntax'. 

-------------------------------------------------------------------------------------------------------------------------
Some ODBC Driver Logs
=====================

SQL = [ SELECT * FROM COP_CORRSP WHERE BATCH_NO > "548550" FOR UPDATE][length = 122 (SQL_NTS)]
[ODBC][30911][1502899791.473408][SQLExecDirect.c][515]
Exit:[SQL_ERROR]
DIAG [42000] [MySQL][ODBC 5.3(a) Driver][mysqld-5.6.10]You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'LIMIT  0, 1000' at line 1
-------------------------------------------------------------------------------------------------------------------------

One of the below variants of the SQL seems to be a workaround to this problem. 
1. SELECT * FROM COP_CORRSP WHERE BATCH_NO > "548550" LIMIT 0,1 FOR UPDATE   ==> Limit before FOR UPDATE
2. (SELECT * FROM COP_CORRSP WHERE BATCH_NO > "548550" FOR UPDATE)           ==> Enclosing all of the SQL within braces.

And in both the cases I do not find PREFETCH working. 

Following are my questions.

1) Is this an expected behavior? 

2) For SELECTs with FOR UPDATE, does the PREFETCH actually work? My tests indicate it doesn't. 

3) If for such cases PREFETCH is supposed to work, I seem to have hit a bug. If not, why does the server add a LIMIT to facilitate PREFETCH?(which is when I hit the syntax error in the first place)

4) I'm assuming the driver or server to facilitate prefetch would not add a LIMIT if it's already present in the prepared SQL. Would that be correct? 

Thanks,
Vibin

Hi Vibin,

The problem with SELECT FOR UPDATE has to be reported as a separate bug.
Both workarounds you suggested are not going to work.

Now, answers to your questions:

1) This is not the expected behavior. This is a bug.

2) No, SELECT FOR UPDATE will fail with PREFETCH disregards of NO_SSPS setting

3) This is an ODBC driver bug. The server side has nothing to do with it.

4) Yes, the driver is supposed to detect the existing LIMIT clause and modify it accordingly to your PREFETCH setting. Looks like it failed to do so. Therefore, I consider it another bug.

There might be two ways of dealing with this problem:

1) Set the SQL_ATTR_MAX_ROWS attribute of ODBC statement using SQLSetStmtAttr() function. The advantage of using this function is that it only affects the particular statement and can be changed at any time unlike PREFETCH, which is set at the connection time.

More information on this ODBC API function can be found here:
https://docs.microsoft.com/en-us/sql/odbc/reference/syntax/sqlsetstmtattr-function

2) Before executing the SELECT FOR UPDATE query your code needs to set the maximum number of rows returned by the server by executing 
"set @@sql_select_limit=N"
After "set @@sql_select_limit=N" is executed all SELECT queries will produce no more than N rows even without LIMIT clause.
Here is a query to remove the limit ""set @@sql_select_limit=DEFAULT"

These two ways might be not applicable if your task is to retrieve all rows as chunks with N rows each chunk.

Thanks Bogdan.
I've raised another case (Bug #87457) as you suggested - understand someone is looking into it. 

I would appreciate if you can clarify a comment you made in the previous update here. 

"Both workarounds you suggested are not going to work."

You were alluding to the options, which I thought were potential ways to workaround the bug.

As a matter of fact, I see that the below approaches work for my application, and I'm progressing on that count. Do you think such an approach for now is going to be a problematic one? If so, can you provide the argument.?

1. SELECT * FROM COP_CORRSP WHERE BATCH_NO > "548550" LIMIT 0,1 FOR UPDATE   ==> Limit before FOR UPDATE
2. (SELECT * FROM COP_CORRSP WHERE BATCH_NO > "548550" FOR UPDATE)           ==> Enclosing all of the SQL within braces.

Appreciate your help as always.

Thanks,
Vibin

Hi Vibin,

Thanks for reporting a new bug.

Sorry for the confusion with the SELECT FOR UPDATE workarounds you provided.
I wanted to say that prefetch would not work with row locking SELECT queries.

The reason they would not work is that the ODBC driver detects "FOR UPDATE" and "IN SHARE MODE" words in your SELECT and does not add LIMIT that allows PREFETCH feature to work.

All my tests showed that SELECT FOR UPDATE queries do not do prefetch.
I am not even sure how it could cause any SQL syntax errors like in the bug 87457.
The blocking of LIMIT for prefetch feature was intended to prevent multiple overlapping row locking in the older versions of MySQL server. This should no longer be a problem with the newer servers. And the ODBC driver code has to be patched for that as well.