Description:
InnoDB's treatment of NULLs in index statistics calculation can cause oversized "rows
examined" estimates in the query plan.
This can make use of MAX_JOIN_SIZE as a "safety net" trigger many false positives
(queries that will *supposedly* examine a lot of rows, but do not really.)
How to repeat:
1) load attached optimizer_bug_0813.dump.gz
2) explain SELECT COUNT(*), 0 FROM orgs2 orgs LEFT JOIN sa_opportunities2
sa_opportunities ON orgs.org_id=sa_opportunities.org_id LEFT JOIN contacts2 contacts ON
orgs.org_id=contacts.org_id ;
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+-------+-------------+
| 1 | SIMPLE | orgs | index | NULL | orgs$org_id | 4 | NULL | 128 | Using index |
| 1 | SIMPLE | sa_opportunities | ref | sa_opp$org_id | sa_opp$org_id | 5 |
foo2.orgs.org_id | 5460 | Using index |
| 1 | SIMPLE | contacts | ref | contacts$org_id | contacts$org_id | 5 | foo2.orgs.org_id
| 24014 | Using index |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+-------+-------------+
MySQL estimates the above will scan 16.7 billion rows though in reality it is closer to
200.
3) Alter this to myisam:
mysql> alter table orgs2 engine=myisam;
Query OK, 128 rows affected (0.04 sec)
Records: 128 Duplicates: 0 Warnings: 0
mysql> alter table sa_opportunities2 engine=myisam;
Query OK, 11239 rows affected (0.09 sec)
Records: 11239 Duplicates: 0 Warnings: 0
mysql> alter table contacts2 engine=myisam;
Query OK, 47212 rows affected (0.24 sec)
Records: 47212 Duplicates: 0 Warnings: 0
mysql> desc SELECT COUNT(*), 0 FROM orgs2 orgs LEFT JOIN sa_opportunities2
sa_opportunities ON orgs.org_id=sa_opportunities.org_id LEFT JOIN contacts2 contacts ON
orgs.org_id=contacts.org_id ;
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
| 1 | SIMPLE | orgs | index | NULL | orgs$org_id | 4 | NULL | 128 | Using index |
| 1 | SIMPLE | sa_opportunities | ref | sa_opp$org_id | sa_opp$org_id | 5 |
foo2.orgs.org_id | 1 | Using index |
| 1 | SIMPLE | contacts | ref | contacts$org_id | contacts$org_id | 5 | foo2.orgs.org_id
| 1 | Using index |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
Now the estimate is 128 rows.
4) We get similar when myisam_stats_method=nulls_ignored:
mysql> alter table contacts2 engine=myisam;Query OK, 47212 rows affected (0.09 sec)
Records: 47212 Duplicates: 0 Warnings: 0
mysql> alter table sa_opportunities2 engine=myisam;Query OK, 11239 rows affected (0.03
sec)
Records: 11239 Duplicates: 0 Warnings: 0
mysql> alter table orgs2 engine=myisam;Query OK, 128 rows affected (0.02 sec)
Records: 128 Duplicates: 0 Warnings: 0
mysql> desc SELECT COUNT(*), 0 FROM orgs2 orgs LEFT JOIN sa_opportunities2
sa_opportunities ON orgs.org_id=sa_opportunities.org_id LEFT JOIN contacts2 contacts ON
orgs.org_id=contacts.org_id ;
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
| 1 | SIMPLE | orgs | index | NULL | orgs$org_id | 4 | NULL | 128 | Using index |
| 1 | SIMPLE | sa_opportunities | ref | sa_opp$org_id | sa_opp$org_id | 5 |
foo2.orgs.org_id | 1 | Using index |
| 1 | SIMPLE | contacts | ref | contacts$org_id | contacts$org_id | 5 | foo2.orgs.org_id
| 2 | Using index |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
nulls_ignored is the best method as it is most acurate.
mysql> select count(*) from sa_opportunities2 where org_id is not NULL;
+----------+
| count(*) |
+----------+
| 20 |
+----------+
1 row in set (0.01 sec)
mysql> select count(distinct(org_id)) from sa_opportunities2 where org_id is not NULL;
+-------------------------+
| count(distinct(org_id)) |
+-------------------------+
| 18 |
+-------------------------+
1 row in set (0.01 sec)
sa_opportunities2 join should be ~1
mysql> select count(*) from contacts2 where org_id is not NULL;
+----------+
| count(*) |
+----------+
| 224 |
+----------+
1 row in set (0.00 sec)
mysql> select count(distinct(org_id)) from contacts2 where org_id is not NULL;
+-------------------------+
| count(distinct(org_id)) |
+-------------------------+
| 118 |
+-------------------------+
1 row in set (0.00 sec)
Suggested fix:
Alter the way InnoDB treats NULLs in index statistic calculation -- perhaps configurable
similar to MyISAM.