Bug #52549 Low performance due to incorrect cardinality
Submitted: 2 Apr 2010 14:37 Modified: 3 Apr 2010 12:35
Reporter: Geoffrey de Kleijn Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-7.0 OS:Linux (Debian 5.0)
Assigned to: CPU Architecture:Any
Tags: cardinality, cluster, mysql-5.1.41, ndb-7.0.13, performance

[2 Apr 2010 14:37] Geoffrey de Kleijn
Description:
We have recently migrated a database with MyISAM tables to MySQL Cluster. During a number tests, performance was much lower than expected. We tracked this down to the query planner making poor decisions, because the cardinality for all indices except PRIMARY's is incorrect.

All indices have either the same cardinality as the number of rows, or NULL. Table structure and indices are attached. 

When using sub-selects, the query planner makes poor discisions:

mysql> SELECT ObjectID,   (SELECT sysBestandsnaam FROM tblBOGObjectMedia WHERE ObjectID = tblBOGObject.ObjectID AND MediaGroep = 'beeld' AND sysVerwijderd='0' AND sysBestandAanwezig = '1' AND HoofdFoto = '1') as MediaLocationPath FROM tblBOGObject WHERE  Postcode BETWEEN '1300AA' AND '1369ZZ'     AND  sysTransactiePlaatsgevonden = '0' ORDER BY Straatnaam LIMIT 0,5;
+----------+----------------------------+
| ObjectID | MediaLocationPath          |
+----------+----------------------------+
|    94002 | 009/40/02/0094002.0001.jpg |
|    97124 | 009/71/24/0097124.0001.jpg |
|    98677 | 009/86/77/0098677.1000.jpg |
|    72302 | 007/23/02/0072302.0001.jpg |
|    98299 | NULL                       |
+----------+----------------------------+
5 rows in set (42.91 sec)

mysql> EXPLAIN SELECT ObjectID,   (SELECT sysBestandsnaam FROM tblBOGObjectMedia WHERE ObjectID = tblBOGObject.ObjectID AND MediaGroep = 'beeld' AND sysVerwijderd='0' AND sysBestandAanwezig = '1' AND HoofdFoto = '1') as MediaLocationPath FROM tblBOGObject WHERE  Postcode BETWEEN '1300AA' AND '1369ZZ'     AND  sysTransactiePlaatsgevonden = '0' ORDER BY Straatnaam LIMIT 0,5;
+----+--------------------+-------------------+------+--------------------------------------------------------------------+-----------------------------------+---------+-------+------+---------------------------------------------------+
| id | select_type        | table             | type | possible_keys                                                      | key                               | key_len | ref   | rows | Extra                                             |
+----+--------------------+-------------------+------+--------------------------------------------------------------------+-----------------------------------+---------+-------+------+---------------------------------------------------+
|  1 | PRIMARY            | tblBOGObject      | ref  | sysTransactiePlaatsgevonden_index,Postcode_index                   | sysTransactiePlaatsgevonden_index | 2       | const |   10 | Using where with pushed condition; Using filesort |
|  2 | DEPENDENT SUBQUERY | tblBOGObjectMedia | ref  | PRIMARY,sysBestandAanwezig_index,HoofdFoto_index,setPriority_index | sysBestandAanwezig_index          | 1       | const |   10 | Using where                                       |
+----+--------------------+-------------------+------+--------------------------------------------------------------------+-----------------------------------+---------+-------+------+---------------------------------------------------+
2 rows in set (0.00 sec)

The 'sysBestandAanwezig_index' which is picked by the planner is based on a ENUM with 2 values. Using that particular index makes it very slow (almost 43 seconds).

We have to use Index hinting to achieve better performance:

mysql> SELECT ObjectID, (SELECT sysBestandsnaam FROM tblBOGObjectMedia USE INDEX (PRIMARY) WHERE ObjectID = tblBOGObject.ObjectID AND MediaGroep = 'beeld' AND sysVerwijderd='0' AND sysBestandAanwezig = '1' AND HoofdFoto = '1') as  MediaLocationPath FROM tblBOGObject WHERE  Postcode BETWEEN '1300AA' AND '1369ZZ' AND  sysTransactiePlaatsgevonden = '0' ORDER BY Straatnaam LIMIT 0,5;
+----------+----------------------------+
| ObjectID | MediaLocationPath          |
+----------+----------------------------+
|    97124 | 009/71/24/0097124.0001.jpg |
|    94002 | 009/40/02/0094002.0001.jpg |
|    98299 | NULL                       |
|    72302 | 007/23/02/0072302.0001.jpg |
|    98677 | 009/86/77/0098677.1000.jpg |
+----------+----------------------------+
5 rows in set (0.25 sec)

We've tried ALTER TABLE OPTIMIZE PARTITIONS; ANALYZE TABLE; OPTIMIZE TABLE etc., but can't get rid of these nasty cardinalities.

The MyISAM tables have proper cardinality, and the same query is much faster than MySQL Cluster on a much more low-end platform. (will attach log files later).

How to repeat:
- Create NDBCLUSTER TABLE(-S)
- Import data using LOAD DATA INFILE
- run SHOW INDEXES FROM TABLE ...
- EXPLAIN some sub-query and watch the query planner make bad discisions

Suggested fix:
- Make the cardinality of combined primary the same for all member columns (instead of NULLs)
- Calculate correct cardinality for other indices
[2 Apr 2010 14:38] Geoffrey de Kleijn
NDB table structure

Attachment: mysql-bug-52549-NDB-structure.txt (text/plain), 7.30 KiB.

[2 Apr 2010 14:38] Geoffrey de Kleijn
MyISAM table info and queries

Attachment: mysql-bug-52549-MyISAM-queries.txt (text/plain), 13.25 KiB.

[3 Apr 2010 12:35] Sveta Smirnova
Thank you for the report.

This is duplicate of bug #44760