Bug #30423 InnoDBs treatment of NULL in index stats causes bad "rows examined" estimates
Submitted: 14 Aug 2007 23:30 Modified: 10 Jan 2012 1:46
Reporter: Lachlan Mulcahy Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S4 (Feature request)
Version:5.0.46 OS:Any
Assigned to: Jimmy Yang CPU Architecture:Any
Tags: innodb index statistics performance optimizer

[14 Aug 2007 23:30] Lachlan Mulcahy
Description:
InnoDB's treatment of NULLs in index statistics calculation can cause oversized "rows examined" estimates in the query plan. 

This can make use of MAX_JOIN_SIZE as a "safety net" trigger many false positives (queries that will *supposedly* examine a lot of rows, but do not really.)

How to repeat:
1) load attached optimizer_bug_0813.dump.gz
2) explain SELECT COUNT(*), 0 FROM orgs2 orgs LEFT JOIN sa_opportunities2 sa_opportunities ON orgs.org_id=sa_opportunities.org_id LEFT JOIN contacts2 contacts ON orgs.org_id=contacts.org_id ;
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+-------+-------------+
| 1 | SIMPLE | orgs | index | NULL | orgs$org_id | 4 | NULL | 128 | Using index |
| 1 | SIMPLE | sa_opportunities | ref | sa_opp$org_id | sa_opp$org_id | 5 | foo2.orgs.org_id | 5460 | Using index |
| 1 | SIMPLE | contacts | ref | contacts$org_id | contacts$org_id | 5 | foo2.orgs.org_id | 24014 | Using index |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+-------+-------------+

MySQL estimates the above will scan 16.7 billion rows though in reality it is closer to 200. 

3) Alter this to myisam:

mysql> alter table orgs2 engine=myisam;
Query OK, 128 rows affected (0.04 sec)
Records: 128 Duplicates: 0 Warnings: 0

mysql> alter table sa_opportunities2 engine=myisam;
Query OK, 11239 rows affected (0.09 sec)
Records: 11239 Duplicates: 0 Warnings: 0

mysql> alter table contacts2 engine=myisam;
Query OK, 47212 rows affected (0.24 sec)
Records: 47212 Duplicates: 0 Warnings: 0

mysql> desc SELECT COUNT(*), 0 FROM orgs2 orgs LEFT JOIN sa_opportunities2 sa_opportunities ON orgs.org_id=sa_opportunities.org_id LEFT JOIN contacts2 contacts ON orgs.org_id=contacts.org_id ;
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
| 1 | SIMPLE | orgs | index | NULL | orgs$org_id | 4 | NULL | 128 | Using index |
| 1 | SIMPLE | sa_opportunities | ref | sa_opp$org_id | sa_opp$org_id | 5 | foo2.orgs.org_id | 1 | Using index |
| 1 | SIMPLE | contacts | ref | contacts$org_id | contacts$org_id | 5 | foo2.orgs.org_id | 1 | Using index |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+

Now the estimate is 128 rows.

4) We get similar when myisam_stats_method=nulls_ignored:

mysql> alter table contacts2 engine=myisam;Query OK, 47212 rows affected (0.09 sec)
Records: 47212 Duplicates: 0 Warnings: 0

mysql> alter table sa_opportunities2 engine=myisam;Query OK, 11239 rows affected (0.03 sec)
Records: 11239 Duplicates: 0 Warnings: 0

mysql> alter table orgs2 engine=myisam;Query OK, 128 rows affected (0.02 sec)
Records: 128 Duplicates: 0 Warnings: 0

mysql> desc SELECT COUNT(*), 0 FROM orgs2 orgs LEFT JOIN sa_opportunities2 sa_opportunities ON orgs.org_id=sa_opportunities.org_id LEFT JOIN contacts2 contacts ON orgs.org_id=contacts.org_id ;
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
| 1 | SIMPLE | orgs | index | NULL | orgs$org_id | 4 | NULL | 128 | Using index |
| 1 | SIMPLE | sa_opportunities | ref | sa_opp$org_id | sa_opp$org_id | 5 | foo2.orgs.org_id | 1 | Using index |
| 1 | SIMPLE | contacts | ref | contacts$org_id | contacts$org_id | 5 | foo2.orgs.org_id | 2 | Using index |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+

nulls_ignored is the best method as it is most acurate.

mysql> select count(*) from sa_opportunities2 where org_id is not NULL;
+----------+
| count(*) |
+----------+
| 20 |
+----------+
1 row in set (0.01 sec)

mysql> select count(distinct(org_id)) from sa_opportunities2 where org_id is not NULL;
+-------------------------+
| count(distinct(org_id)) |
+-------------------------+
| 18 |
+-------------------------+
1 row in set (0.01 sec)

sa_opportunities2 join should be ~1

mysql> select count(*) from contacts2 where org_id is not NULL;
+----------+
| count(*) |
+----------+
| 224 |
+----------+
1 row in set (0.00 sec)

mysql> select count(distinct(org_id)) from contacts2 where org_id is not NULL;
+-------------------------+
| count(distinct(org_id)) |
+-------------------------+
| 118 |
+-------------------------+
1 row in set (0.00 sec)

Suggested fix:
Alter the way InnoDB treats NULLs in index statistic calculation -- perhaps configurable similar to MyISAM.
[14 Aug 2007 23:34] Lachlan Mulcahy
Data for reproducing issue

Attachment: optimizer_bug_0813.dump.gz (application/x-gzip, text), 2.47 KiB.

[24 Aug 2007 13:27] Marko Mäkelä
In btr_estimate_number_of_different_key_vals(), depending on this suggested configuration parameter, InnoDB should check if the record contains NULL columns. The comparison function cmp_rec_rec_with_match() invoked by the estimation algorithm defines NULL as the smallest possible value of a column.
[20 Sep 2007 8:02] Marko Mäkelä
We cannot change the query optimizer in stable or frozen versions of MySQL (5.0 and 5.1). This bug will not be fixed before MySQL 5.2.
[23 Dec 2010 9:27] Jimmy Yang
Implemented a global system variable "innodb_stats_method" to allow user to collect statistics with "nulls_unequal" and "nulls_ignored" option (default is nulls_equal).

mysql> set global innodb_stats_method = default;
Query OK, 0 rows affected (0.00 sec)

mysql> select @@innodb_stats_method;
+-----------------------+
| @@innodb_stats_method |
+-----------------------+
| nulls_equal           |
+-----------------------+
1 row in set (0.00 sec)

mysql> set global innodb_stats_method = "nulls_igno";
Query OK, 0 rows affected (0.00 sec)

mysql> select @@innodb_stats_method;
+-----------------------+
| @@innodb_stats_method |
+-----------------------+
| nulls_ignored         |
+-----------------------+
1 row in set (0.00 sec)

mysql> analyze table contacts2;
+----------------+---------+----------+----------+
| Table          | Op      | Msg_type | Msg_text |
+----------------+---------+----------+----------+
| test.contacts2 | analyze | status   | OK       |
+----------------+---------+----------+----------+
1 row in set (0.02 sec)

mysql> analyze table orgs2;
+------------+---------+----------+----------+
| Table      | Op      | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.orgs2 | analyze | status   | OK       |
+------------+---------+----------+----------+
1 row in set (0.00 sec)

mysql> explain SELECT COUNT(*), 0
    -> FROM orgs2 orgs
    -> LEFT JOIN sa_opportunities2 sa_opportunities
    -> ON orgs.org_id=sa_opportunities.org_id
    -> LEFT JOIN contacts2 contacts
    -> ON orgs.org_id=contacts.org_id ;
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
| id | select_type | table            | type  | possible_keys   | key             | key_len | ref              | rows | Extra       |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
|  1 | SIMPLE      | orgs             | index | NULL            | orgs$org_id     | 4       | NULL             |  128 | Using index |
|  1 | SIMPLE      | sa_opportunities | ref   | sa_opp$org_id   | sa_opp$org_id   | 5       | test.orgs.org_id |    1 | Using index |
|  1 | SIMPLE      | contacts         | ref   | contacts$org_id | contacts$org_id | 5       | test.orgs.org_id |    1 | Using index |
+----+-------------+------------------+-------+-----------------+-----------------+---------+------------------+------+-------------+
3 rows in set (0.01 sec)
[12 Jan 2011 18:05] Stephen Jackson
Jimmy, will this be back ported to 5.1.55?

We are suffering from this particular problem with NULLS, but we will not be moving to 5.5.x at least until the summer...

Cheers

//Steve Jackson
[12 Jan 2011 18:54] Calvin Sun
Stephen,

It will be in 5.1 release, just not sure whether it is .55 or .56.

Thanks,
Calvin
[17 Jan 2011 12:01] Jimmy Yang
Fix checked in mysql-5.1-innodb and ported to mysql-5.5-innodb and mysql-trunk-innodb on Jan 15th
[30 Jan 2011 16:58] Bugs System
Pushed into mysql-5.1 5.1.56 (revid:vasil.dimov@oracle.com-20110130164158-1q99a41kb2wvkw3a) (version source revid:vasil.dimov@oracle.com-20110130164158-1q99a41kb2wvkw3a) (merge vers: 5.1.56) (pib:24)
[30 Jan 2011 16:59] Bugs System
Pushed into mysql-trunk 5.6.2 (revid:vasil.dimov@oracle.com-20110130165639-1pr3opz839b98q5j) (version source revid:vasil.dimov@oracle.com-20110130165522-m0o6al0pn5ig9kv3) (merge vers: 5.6.2) (pib:24)
[30 Jan 2011 17:00] Bugs System
Pushed into mysql-5.5 5.5.10 (revid:vasil.dimov@oracle.com-20110130165343-he9art47agq1a3gr) (version source revid:vasil.dimov@oracle.com-20110130165137-5lvzsq9j29j0hp1s) (merge vers: 5.5.10) (pib:24)
[11 Feb 2011 0:17] John Russell
Added to doc change log:

InnoDB could could return values for "rows examined" in the query
plan that were higher than expected. NULL values were treated in an
inconsistent way. The inaccurate statistics could trigger "false
positives" in combination with the MAX_JOIN_SIZE setting, because the
queries did not really examine as many rows as reported.
[1 Apr 2011 9:03] Stephen Jackson
Can someone tell me if this fix is implemented only in the built-in, plugin, or both?

We are planning to upgrade to 5.1.56 on the 7th (with plugin), so I would like to know this.

Cheers

//Steve
[1 Apr 2011 11:00] Vasil Dimov
Stephen, it has been fixed in both builtin and plugin:

------------------------------------------------------------
revision-id: jimmy.yang@oracle.com-20110114170228-788l136hkgpyskpu
committer: Jimmy Yang <jimmy.yang@oracle.com>
timestamp: Fri 2011-01-14 09:02:28 -0800
message:
  Fix Bug#30423 "InnoDBs treatment of NULL in index stats causes bad 
  "rows examined" estimates". This change implements "innodb_stats_method"
  with options of "nulls_equal", "nulls_unequal" and "null_ignored".
        
  rb://553 approved by Marko
added:
  mysql-test/suite/innodb/r/innodb_bug30423.result innodb_bug30423.resu-20101223090805-6oq0801a09mp3md3-1
  mysql-test/suite/innodb/t/innodb_bug30423.test innodb_bug30423.test-20101223090812-0s0fbzrp2njptcya-1
  mysql-test/suite/innodb_plugin/r/innodb_bug30423.result innodb_bug30423.resu-20110113143550-92zlt2w4zbrbe5xf-1
  mysql-test/suite/innodb_plugin/t/innodb_bug30423.test innodb_bug30423.test-20110113143555-zzlg82alzcvwq2gq-1
modified:
  storage/innobase/btr/btr0cur.c 2@cee13dc7-1704-0410-992b-c9b4543f1246:trunk%2Fbtr%2Fbtr0cur.c
  storage/innobase/dict/dict0dict.c 2@cee13dc7-1704-0410-992b-c9b4543f1246:trunk%2Fdict%2Fdict0dict.c
  storage/innobase/handler/ha_innodb.cc 2@cee13dc7-1704-0410-992b-c9b4543f1246:trunk%2Fhandler%2Fha_innodb.cc
  storage/innobase/include/btr0cur.h 2@cee13dc7-1704-0410-992b-c9b4543f1246:trunk%2Finclude%2Fbtr0cur.h
  storage/innobase/include/dict0mem.h 2@cee13dc7-1704-0410-992b-c9b4543f1246:trunk%2Finclude%2Fdict0mem.h
  storage/innobase/include/rem0cmp.h 2@cee13dc7-1704-0410-992b-c9b4543f1246:trunk%2Finclude%2Frem0cmp.h
  storage/innobase/include/rem0cmp.ic 2@cee13dc7-1704-0410-992b-c9b4543f1246:trunk%2Finclude%2Frem0cmp.ic
  storage/innobase/include/srv0srv.h 2@cee13dc7-1704-0410-992b-c9b4543f1246:trunk%2Finclude%2Fsrv0srv.h
  storage/innobase/rem/rem0cmp.c 2@cee13dc7-1704-0410-992b-c9b4543f1246:trunk%2Frem%2Frem0cmp.c
  storage/innobase/srv/srv0srv.c 2@cee13dc7-1704-0410-992b-c9b4543f1246:trunk%2Fsrv%2Fsrv0srv.c
  storage/innodb_plugin/ChangeLog 2425@16c675df-0fcb-4bc9-8058-dcc011a37293:branches%2Fzip%2FChangeLog
  storage/innodb_plugin/btr/btr0cur.c 2@16c675df-0fcb-4bc9-8058-dcc011a37293:trunk%2Fbtr%2Fbtr0cur.c
  storage/innodb_plugin/dict/dict0dict.c 2@16c675df-0fcb-4bc9-8058-dcc011a37293:trunk%2Fdict%2Fdict0dict.c
  storage/innodb_plugin/handler/ha_innodb.cc 2@16c675df-0fcb-4bc9-8058-dcc011a37293:trunk%2Fhandler%2Fha_innodb.cc
  storage/innodb_plugin/include/btr0cur.h 2@16c675df-0fcb-4bc9-8058-dcc011a37293:trunk%2Finclude%2Fbtr0cur.h
  storage/innodb_plugin/include/dict0mem.h 2@16c675df-0fcb-4bc9-8058-dcc011a37293:trunk%2Finclude%2Fdict0mem.h
  storage/innodb_plugin/include/rem0cmp.h 2@16c675df-0fcb-4bc9-8058-dcc011a37293:trunk%2Finclude%2Frem0cmp.h
  storage/innodb_plugin/include/rem0cmp.ic 2@16c675df-0fcb-4bc9-8058-dcc011a37293:trunk%2Finclude%2Frem0cmp.ic
  storage/innodb_plugin/include/srv0srv.h 2@16c675df-0fcb-4bc9-8058-dcc011a37293:trunk%2Finclude%2Fsrv0srv.h
  storage/innodb_plugin/rem/rem0cmp.c 2@16c675df-0fcb-4bc9-8058-dcc011a37293:trunk%2Frem%2Frem0cmp.c
  storage/innodb_plugin/srv/srv0srv.c 2@16c675df-0fcb-4bc9-8058-dcc011a37293:trunk%2Fsrv%2Fsrv0srv.c
[10 Jun 2011 15:53] Joe Grasse
I am unable to locate the documentation or the change log entry for the new server variable that this added. Can someone point me to the correct place?
[11 Jun 2011 8:11] Stephen Jackson
Joe,

Unfortunately, I dont think they have updated the documentation to reflect this

The possible configurations are

innodb_stats_method= nulls_equal OR nulls_unequal OR nulls_ignored

If you have cardinality issues due to skewed datasets, I would also recommend using the innodb_stats_sample_pages directive to up the number of index dives (note: this is only available in innodb_plugin... I use 64 - its also a good idea to change innodb_stats_on_metadata to 0 when using larger index dives)

You should find yourself having MUCH more stable query plans, and monitoring responsiveness  if you use all three of these...

Anyway. Request to devs. Please add the innodb_stats_method directive to the documentation....
[3 Aug 2011 7:00] John Russell
Reopening with status 'Documenting' to take care of the new status variable in the docs.
[10 Jan 2012 1:46] John Russell
The information was added to the documentation some time ago:

MySQL 5.5: http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_stats_method

MySQL 5.1: http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_stats_method