Bug #30423 InnoDBs treatment of NULL in index stats causes bad "rows examined" estimates
Submitted: 15 Aug 2007 1:30 Modified: 27 Aug 2007 11:28
Reporter: Lachlan Mulcahy
Status: Verified
Category:Server: InnoDB Severity:S4 (Feature request)
Version:5.0.46 OS:Any
Assigned to: Heikki Tuuri Target Version:
Tags: innodb index statistics performance optimizer
Triage: Triaged: D5 (Feature request)

[15 Aug 2007 1:30] Lachlan Mulcahy
Description:
InnoDB's treatment of NULLs in index statistics calculation can cause oversized "rows
examined" estimates in the query plan. 

This can make use of MAX_JOIN_SIZE as a "safety net" trigger many false positives
(queries that will *supposedly* examine a lot of rows, but do not really.)

How to repeat:
1) load attached optimizer_bug_0813.dump.gz
2) explain SELECT COUNT(*), 0 FROM orgs2 orgs LEFT JOIN sa_opportunities2
sa_opportunities ON orgs.org_id=sa_opportunities.org_id LEFT JOIN contacts2 contacts ON
orgs.org_id=contacts.org_id ;
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+-------+-------------+
| 1 | SIMPLE | orgs | index | NULL | orgs$org_id | 4 | NULL | 128 | Using index |
| 1 | SIMPLE | sa_opportunities | ref | sa_opp$org_id | sa_opp$org_id | 5 |
foo2.orgs.org_id | 5460 | Using index |
| 1 | SIMPLE | contacts | ref | contacts$org_id | contacts$org_id | 5 | foo2.orgs.org_id
| 24014 | Using index |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+-------+-------------+

MySQL estimates the above will scan 16.7 billion rows though in reality it is closer to
200. 

3) Alter this to myisam:

mysql> alter table orgs2 engine=myisam;
Query OK, 128 rows affected (0.04 sec)
Records: 128 Duplicates: 0 Warnings: 0

mysql> alter table sa_opportunities2 engine=myisam;
Query OK, 11239 rows affected (0.09 sec)
Records: 11239 Duplicates: 0 Warnings: 0

mysql> alter table contacts2 engine=myisam;
Query OK, 47212 rows affected (0.24 sec)
Records: 47212 Duplicates: 0 Warnings: 0

mysql> desc SELECT COUNT(*), 0 FROM orgs2 orgs LEFT JOIN sa_opportunities2
sa_opportunities ON orgs.org_id=sa_opportunities.org_id LEFT JOIN contacts2 contacts ON
orgs.org_id=contacts.org_id ;
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
| 1 | SIMPLE | orgs | index | NULL | orgs$org_id | 4 | NULL | 128 | Using index |
| 1 | SIMPLE | sa_opportunities | ref | sa_opp$org_id | sa_opp$org_id | 5 |
foo2.orgs.org_id | 1 | Using index |
| 1 | SIMPLE | contacts | ref | contacts$org_id | contacts$org_id | 5 | foo2.orgs.org_id
| 1 | Using index |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+

Now the estimate is 128 rows.

4) We get similar when myisam_stats_method=nulls_ignored:

mysql> alter table contacts2 engine=myisam;Query OK, 47212 rows affected (0.09 sec)
Records: 47212 Duplicates: 0 Warnings: 0

mysql> alter table sa_opportunities2 engine=myisam;Query OK, 11239 rows affected (0.03
sec)
Records: 11239 Duplicates: 0 Warnings: 0

mysql> alter table orgs2 engine=myisam;Query OK, 128 rows affected (0.02 sec)
Records: 128 Duplicates: 0 Warnings: 0

mysql> desc SELECT COUNT(*), 0 FROM orgs2 orgs LEFT JOIN sa_opportunities2
sa_opportunities ON orgs.org_id=sa_opportunities.org_id LEFT JOIN contacts2 contacts ON
orgs.org_id=contacts.org_id ;
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
| 1 | SIMPLE | orgs | index | NULL | orgs$org_id | 4 | NULL | 128 | Using index |
| 1 | SIMPLE | sa_opportunities | ref | sa_opp$org_id | sa_opp$org_id | 5 |
foo2.orgs.org_id | 1 | Using index |
| 1 | SIMPLE | contacts | ref | contacts$org_id | contacts$org_id | 5 | foo2.orgs.org_id
| 2 | Using index |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+

nulls_ignored is the best method as it is most acurate.

mysql> select count(*) from sa_opportunities2 where org_id is not NULL;
+----------+
| count(*) |
+----------+
| 20 |
+----------+
1 row in set (0.01 sec)

mysql> select count(distinct(org_id)) from sa_opportunities2 where org_id is not NULL;
+-------------------------+
| count(distinct(org_id)) |
+-------------------------+
| 18 |
+-------------------------+
1 row in set (0.01 sec)

sa_opportunities2 join should be ~1

mysql> select count(*) from contacts2 where org_id is not NULL;
+----------+
| count(*) |
+----------+
| 224 |
+----------+
1 row in set (0.00 sec)

mysql> select count(distinct(org_id)) from contacts2 where org_id is not NULL;
+-------------------------+
| count(distinct(org_id)) |
+-------------------------+
| 118 |
+-------------------------+
1 row in set (0.00 sec)

Suggested fix:
Alter the way InnoDB treats NULLs in index statistic calculation -- perhaps configurable
similar to MyISAM.
[15 Aug 2007 1:34] Lachlan Mulcahy
Data for reproducing issue

Attachment: optimizer_bug_0813.dump.gz (application/x-gzip, text), 2.47 KiB.

[24 Aug 2007 15:27] Marko Mäkelä
In btr_estimate_number_of_different_key_vals(), depending on this suggested configuration
parameter, InnoDB should check if the record contains NULL columns. The comparison
function cmp_rec_rec_with_match() invoked by the estimation algorithm defines NULL as the
smallest possible value of a column.
[20 Sep 2007 10:02] Marko Mäkelä
We cannot change the query optimizer in stable or frozen versions of MySQL (5.0 and 5.1).
This bug will not be fixed before MySQL 5.2.