Bug #36681 Insufficient plans for huge tables
Submitted: 13 May 2008 8:11 Modified: 3 Oct 2009 18:50
Reporter: Yasufumi Kinoshita Email Updates:
Status: No Feedback Impact on me:
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.54 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[13 May 2008 8:11] Yasufumi Kinoshita
When we use huge tables bigger than main-memory,
sometimes MySQL chooses inefficient index or order of joining table
and the query spends very much time.

There seems to be the following factors.

Scanning based on secondary index is much more inefficient than other methods
when the necessary data blocks are not expected to be on memory (data cache, buffer pool).
(Because of increased possibility to read same data block several times by random access)
The cost of the off-memory scanning based on secondary index is not considered.
(ex. ha_innobase::read_time() )

If the range optimizer (opt_range.cc) decides not to use index, the number of estimated records
is discarded and the number of all records in the table instead in the after estimation.
So, the cost estimation of join after this table can be inaccurately high,
and the other insufficient plan can be chosen.

Even if we tune handler::read_time() to solve (1), there may be no effect without using the range optimizer.
Because handler::read_time() is not called from the other of the range optimizer.

The attached patch (Suggested fix) shows clear improvement on accuracy of plan for TPC-H queries.
The results of the improvement will be updated in these days.

How to repeat:
execute the each queries of TPC-H for larger SF(Scale factor) than main memory.

without any STRAIGHT_JOIN or [FORCE, IGNORE, USE] INDEX clauses.

ex. InnoDB

- index on lineitem (l_shipdate asc);
- index on orders (o_orderdate asc);

Suggested fix:
ex. mysql-5.0.54_fix_optimizer.patch

1. Tuning ha_innobase::read_time() (ha_innobase.cc)
      The cost of using secondary index is compensated.
      - if (table + index) > (buffer_pool/2)
      - if many rows are estimated

    -> For accurate choice of index

2. Reusing the number of estimated records at range optimizer (opt_range.cc, sql_select.cc)
      get_quick_record_count() returns the number of estimated records
      even if the index is not used.

    -> For accurate choice of first table

3. Using handler::read_time() at best_access_path() (sql_select.cc)

    -> For accurate order of join
[13 May 2008 8:12] Yasufumi Kinoshita
Suggested fix

Attachment: mysql-5.0.54_fix_optimizer.patch (text/x-patch), 11.32 KiB.

[14 May 2008 3:22] Yasufumi Kinoshita

Attachment: suggested_fix_results.pdf (application/pdf, text), 9.00 KiB.

[14 May 2008 16:25] Susanne Ebrecht
Many thanks for writing a bug report.
[20 May 2008 5:30] Yasufumi Kinoshita
The following patch seems to be good for using MyISAM with the suggested fix.

diff -ru mysql-5.0.54/sql/ha_myisam.cc mysql-5.0.54-configured/sql/ha_myisam.cc
--- mysql-5.0.54/sql/ha_myisam.cc       2007-12-23 04:39:20.000000000 +0900
+++ mysql-5.0.54-configured/sql/ha_myisam.cc    2008-05-15 17:46:46.000000000 +0900
@@ -652,6 +652,24 @@
   return mi_close(tmp);

+double ha_myisam::read_time(uint index, uint ranges, ha_rows rows)
+  double ret=0.0;
+  ret = handler::read_time(index, ranges, rows);
+  /* CHECK ME: these judgements may be too rough */
+  /* Big table makes the cost more expensive. */
+  if (dflt_key_cache_var.param_buff_size / 4 < index_file_length)
+  {
+    ret = ret * 10.0;
+    if (rows > 10000)
+      ret = ret * 10.0;
+  }
+  return(ret);
 int ha_myisam::write_row(byte * buf)
diff -ru mysql-5.0.54/sql/ha_myisam.h mysql-5.0.54-configured/sql/ha_myisam.h
--- mysql-5.0.54/sql/ha_myisam.h        2007-07-12 20:29:50.000000000 +0900
+++ mysql-5.0.54-configured/sql/ha_myisam.h     2008-05-15 17:41:59.000000000 +0900
@@ -62,6 +62,7 @@

   int open(const char *name, int mode, uint test_if_locked);
   int close(void);
+  double read_time(uint index, uint ranges, ha_rows rows);
   int write_row(byte * buf);
   int update_row(const byte * old_data, byte * new_data);
   int delete_row(const byte * buf);
[26 May 2008 2:11] Yasufumi Kinoshita
The following PDF file is new results of the both engine (InnoDB & MyISAM).

- applied the above patch for MyISAM.
- added index on part (p_brand, p_container) [for Query No. 17]
[26 May 2008 2:12] Yasufumi Kinoshita
new results (InnoDB & MyISAM)

Attachment: suggested_fix_results_new.pdf (application/pdf, text), 11.74 KiB.

[28 May 2008 6:27] Yasufumi Kinoshita
The same suggested fix for 5.1

Attachment: mysql-5.1.24_fix_optimizer.patch (text/x-patch), 7.15 KiB.

[28 May 2008 6:28] Yasufumi Kinoshita
5.1 results (InnoDB & MyISAM)

Attachment: suggested_fix_results_5.1.pdf (application/pdf, text), 11.73 KiB.

[3 Sep 2009 18:49] Liz Drachnik
Hello Yasufumi

In order for us to continue the process of reviewing your contribution to MySQL - We need
you to review and sign the Sun|MySQL contributor agreement (the "SCA")

The process is explained here: 

Getting a signed/approved SCA on file will help us facilitate your contribution-- this
one, and others in the future.

Thank you !
[3 Oct 2009 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".