MySQL Bugs: #15426: SQL_CALC_FOUND

Bug #15426	SQL_CALC_FOUND_ROWS
Submitted:	2 Dec 2005 9:34	Modified:	25 Jun 2011 0:41
Reporter:	Domas Mituzas	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	MySQL Server: DML	Severity:	S4 (Feature request)
Version:	4.1,5.0	OS:	Linux (Linux)
Assigned to:	Geir Høydalsvik	CPU Architecture:	Any

Description:
SQL queries may have various limiting factors, and SQL_CALC_FOUND_ROWS returns only number of rows that would be supplied if LIMIT keyword was ommited. 

It would help more if SQL_CALC_FOUND_ROWS functionality would be extended to include:

0) rows returned before LIMIT (already included)
a) rows returned before HAVING (by group by)
b) rows returned before GROUP BY (by where)
c) rows returned before WHERE (by join operation)

Including these metrics should not cause additional computing overhead.

Such functionality would allow more profiling opportunities for clients as well as more flexibility in applications as well.

How to repeat:
-

Suggested fix:
-

Thank you for a feature request. Looks like a resonable feature to have.

It effectively kills opportunities for optimization. We can sometimes optimize GROUP BY away, and if we have to return "the number of rows before GROUP BY" this becomes impossible.

What about providing those values where possible and returning NULL in cases where it would be expensive to calculate them?

This is a functionality request to allow more profiling opportunities for clients. Apparently, if the row count is the important value for the customer, he/she should try to run the same query, but without the specified WHERE, GROUP BY and/or HAVING clauses.

The customer originally asked for row count before GROUP BY was applied, but support extended the request to add row count before WHERE and row count before HAVING. It is unknown what user requirement this is supposed to cover.

There are at least 3 posible implementation strategies for this request:

1. Tweak the optimizer to compute the wanted values as part of query execution. It is possible that this will cause some optimizations to no longer be applicable. Will customer accept reduced performance because of this?

2. Let the optimization be unaffected, instrument the executor, and report metrics from this instrumentation. If GROUP BY is optimized away, the number of rows before grouping might be reported as NULL.

We may calculate number of rows scanned and filtered per join-tab. We may report values from last join-tab in query, or sum values over all join-tabs.

3. Transforming the query to add COUNT(*) expressions and ROLLUP functions is another possible implementation strategy.

Other concerns:

ICP may be considered to be an issue for the storage engine, we might not care about the rows being filtered away in the storage engine.

About reporting: Should these row count numbers be reported through SQL_CALC_FOUND_ROWS or through some user variables (e.g. @@query_rows_in_result, @@query_rows_before_where,
@@query_rows_before_group_by, @@query_rows_before_having). It may be needed to execute a SHOW LAST_QUERY STATUS command before the variables are materialized.

(Notice that I changed the original @@select_rows_in_result to @@query_rows_in_result. Generally this may apply to (sub)queries that are part of INSERT, UPDATE and DELETE statements as well as SELECT.)