MySQL Bugs: #99684: incorrect determination of big test

Bug #99684	incorrect determination of big test
Submitted:	25 May 2020 12:19	Modified:	28 May 2020 13:22
Reporter:	andy zhang	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.6, 5.7, 8.0	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	big test, Optimizer

Description:
JOIN::make_join_plan() used double type best_read instead of best_rowcount to check if the query is a big select.

  if (!(thd->variables.option_bits & OPTION_BIG_SELECTS) &&
      best_read > (double)thd->variables.max_join_size &&.   <===== here is the problem line
      !thd->lex->is_explain()) { /* purecov: inspected */
    my_error(ER_TOO_BIG_SELECT, MYF(0));
    error = -1;
    DBUG_RETURN(1);
  }

How to repeat:
create table t4 (c1 int);

insert into t4 values (1), (2), (3);

set max_join_size=2;

select * from t4;   <=== expected error of too big select didn't occur 

Suggested fix:
Use best_rowcount instead.

Hi Mr. zhang,

Thank you for your bug report.

best_read is indeed a limit that has to be surpassed in order that a query is denoted as big SELECT.

Hence, why do you think that best_rowcount should be used. This variable, best_rowcount, does not take into the account the width of the columns and rows that should be read.

Hence, we would like to hear from you a full justification of why you consider your criterium a much better one.

Thanks in advance.

Sinisa,

Here is the what reference guide says on the variable max_join_size and sql_big_selects:

max_join_size:

Do not permit statements that probably need to examine more than max_join_size rows (for single- table statements) or row combinations (for multiple-table statements) or that are likely to do more than max_join_size disk seeks. 

sql_big_selects:

If set to OFF, MySQL aborts SELECT statements that are likely to take a very long time to execute (that is, statements for which the optimizer estimates that the number of examined rows exceeds the value of max_join_size). 

Thanks!

HI Mr. zhang,

Yes, the documentation seems to be wrong.

If you agree, I can verify this as a documentation bug.

That's ok to me.

This is now verified as a documentation enhancement bug.

There is still a problem. The code reads
  if (!(thd->variables.option_bits & OPTION_BIG_SELECTS) &&
      best_read > (double)thd->variables.max_join_size &&.   <===== here is the problem line
      !thd->lex->is_explain()) { /* purecov: inspected */
    my_error(ER_TOO_BIG_SELECT, MYF(0));
    error = -1;
    DBUG_RETURN(1);
  }
so it compares max_join_size to JOIN::best_read, which is an estimate of cost (not of rows), made by the optimizer.
Then the doc says:
https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_max_join_size
"Do not permit statements that probably need to examine more than max_join_size rows (for single-table statements) or row combinations (for multiple-table statements) or that are likely to do more than max_join_size disk seeks"

The part about disk seeks might be consistent with the fact that best_read is a cost, but the part about "more than max_join_size rows" remains false: best_read is not a count of rows, and max_join_size is never compared with a number of rows in code.

By the way, the error message only mentions rows, which is wrong:
ER_TOO_BIG_SELECT 42000
        eng "The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay"

To give an example:
I make a 5-table join, all tables have 6 rows. The cost is 935 (as seen in the optimizer trace). When I set max_join_size to 940, it passes; to 930 it fails. It proves again that max_join_size is compared to the cost.
The number of rows is 6**5, roughly 7700 for this query, max_join_size is not compared to it.

HI Guilhem, mon ami,

Thank you very much for your comment.

We agree with your analysis , which is why we change the category of this report to "Optimiser". Simply, comparison is made between the different types of values.