MySQL Bugs: #42948: MySQL Optimizer does not properly use Indexes on Views with "Group By"

Bug #42948	MySQL Optimizer does not properly use Indexes on Views with "Group By"
Submitted:	17 Feb 2009 23:02	Modified:	3 Mar 2009 18:56
Reporter:	John Kounis	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	5.0.67-0ubuntu6, 6.0.11	OS:	Linux (Ubuntu 8.10)
Assigned to:		CPU Architecture:	Any
Tags:	GROUP BY, Optimizer

Description:
If I have a key on a field like SubscriberID and I perform a query such as :

SELECT * FROM qryTEST WHERE SubscriberID=12345;

then it is very fast, since it uses the "SubscriberID" key. However, if I create a view such as:

CREATE VIEW qryTEST AS
SELECT SubscriberID, SUM(Amount) as Total FROM tblTEST GROUP BY SubscriberID;

then it does a table scan, which is very slow.

How to repeat:
Create a test table as follows:

mysql> CREATE TABLE IF NOT EXISTS `tblTEST` (
    ->   `SubscriberID` int(11) NOT NULL,
    ->   `TransactionID` int(11) NOT NULL,
    ->   `Amount` decimal(19,4) NOT NULL,
    ->    PRIMARY KEY  (`TransactionID`),
    ->    KEY (`SubscriberID`));
Query OK, 0 rows affected (0.03 sec)

Insert a bunch of test data (I have a transaction table with over 1 million rows in it:

mysql>
mysql> INSERT INTO tblTEST
    -> SELECT SubscriberID, TransactionID, Amount
    -> FROM tblTransaction;
Query OK, 1245064 rows affected (12.24 sec)
Records: 1245064  Duplicates: 0  Warnings: 0

Create a view:

mysql>
mysql> CREATE VIEW qryTEST AS
    -> SELECT SubscriberID, SUM(Amount) as Total FROM tblTEST GROUP BY SubscriberID;
Query OK, 0 rows affected (0.01 sec)

Test the use of keys:
mysql>
mysql> EXPLAIN SELECT SUM(Amount) FROM tblTEST WHERE SubscriberID=12345;
+----+-------------+---------+------+---------------+--------------+---------+-------+------+-------+
| id | select_type | table   | type | possible_keys | key          | key_len | ref   | rows | Extra |
+----+-------------+---------+------+---------------+--------------+---------+-------+------+-------+
|  1 | SIMPLE      | tblTEST | ref  | SubscriberID  | SubscriberID | 4       | const |   25 |       |
+----+-------------+---------+------+---------------+--------------+---------+-------+------+-------+
1 row in set (0.00 sec)

mysql>
mysql> EXPLAIN SELECT * FROM qryTEST WHERE SubscriberID=12345;
+----+-------------+------------+------+---------------+------+---------+------+---------+---------------------------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows    | Extra                           |
+----+-------------+------------+------+---------------+------+---------+------+---------+---------------------------------+
|  1 | PRIMARY     | <derived2> | ALL  | NULL          | NULL | NULL    | NULL |   53228 | Using where                     |
|  2 | DERIVED     | tblTEST    | ALL  | NULL          | NULL | NULL    | NULL | 1245064 | Using temporary; Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+---------+---------------------------------+
2 rows in set (1.52 sec)

Changed severity from S3 to S5 (Performance)

Formally, this is not a bug. View is created as follows:

mysql> show create view qryTEST\G
*************************** 1. row ***************************
                View: qrytest
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `qrytest` AS select `tbltest`.`SubscriberID` AS `SubscriberID`,sum(`tbltest`.`Amount`) AS `Total` from `tbltest` group by `tbltest`.`SubscriberID`
character_set_client: latin1
collation_connection: latin1_swedish_ci
1 row in set (0,00 sec)

and as sum() is used server is forced to use emporary table to materialize the view, as explained at http://dev.mysql.com/doc/refman/5.1/en/view-algorithms.html:

mysql> EXPLAIN SELECT * FROM qryTEST WHERE SubscriberID=12345\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 402400
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: tbltest
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 524288
        Extra: Using temporary; Using filesort
2 rows in set (20,09 sec)

It would be nice to remove this limitation one day and push condition down to the SELECT defining view, so this is a reasonable feature request.

Thank you for the clarification. I read the link you provided and it's clear why this is happening.

If it would affect the priority of the feature request, perhaps I could give a real-world example that I'm sure is common for database applications:

I have many views to provide information about client transactions that use either aggregate functions or limit functions. A typical monthly invoice run looks something like this:

SELECT CustomerID, <and a bunch of other stuff>
FROM Customers
INNER JOIN qryThisMonthsBalance ON Customer.ID = qryThisMonthsBalance.CustomerID
INNER JOIN qryLastMonthsBalance ON Customer.ID = qryLastMonthsBalance.CustomerID
INNER JOIN qryLastPayment on Customer.ID = qryLastPayment.CustomerID
INNER JOIN qryLastInvoiceSent on Customer.ID = qryLastInvoiceSent.CustomerID
... and so on....

The performance of a query like the one above became unacceptable after converting from MSSQL to MySQL, since each of the views on which I was doing a join was doing a full table scan of 1.2 million records instead of using the "CustomerID" index.

To restore normal performance, I had to write stored procedures to create temporary tables instead of using views as in the example above.