MySQL Bugs: #734: MySQL query optimisation and UNIQUE keys

Bug #734	MySQL query optimisation and UNIQUE keys
Submitted:	26 Jun 2003 7:43	Modified:	29 Aug 2003 12:31
Reporter:	Richard Seaman	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	3.23.45 / 3.23.54	OS:	Solaris (Solaris 8)
Assigned to:		CPU Architecture:	Any

Description:
MySQL query optimisation and UNIQUE keys

How to repeat:
Author        -  Richard Seaman 26/06/03
Mysql Version -  Tested on 3.23.45 and 3.23.54
Platform      -  Solaris 8

The portion of the database concerned is - 

CREATE TABLE legs (
  sc_id           int(11) NOT NULL default '0',
  st_id_tx        int(11) NOT NULL default '0',
  st_id_rx        int(11) NOT NULL default '0',
  leg             int(11) NOT NULL default '0'
) TYPE=MyISAM;

#
# Indices to create on table 'legs'
#
create index legs_sc_id_idx on legs(sc_id);
create index legs_st_id_tx_idx on legs(st_id_tx);
create index legs_st_id_rx_idx on legs(st_id_rx);

CREATE TABLE billrec (
  br_id       int(11) NOT NULL auto_increment,        
  bs_id_tx    int(11) NOT NULL default '0',           
  bs_id_rx    int(11) NOT NULL default '0',           
  st_id       int(11) NOT NULL default '0',           
  leg         int(11) NOT NULL default '0',          
  billdate    int(11) NOT NULL default '0',           
  time        int(11) NOT NULL default '0',
  duration    int(11) NOT NULL default '0',    
  type        tinyint(4) NOT NULL default '0',        
  direction   tinyint(4) NOT NULL default '0',
  PRIMARY KEY  (br_id),
  UNIQUE KEY bs_id_tx (bs_id_tx,bs_id_rx,billdate,time,duration,type,direction),
  KEY billrec_st_id_idx (st_id)
) TYPE=MyISAM;

create table billsubsc (
	bs_id	int4		auto_increment	PRIMARY KEY,
	s_id	int4,
	voice	char(30)	NOT NULL,
	UNIQUE 	(voice)
        );

create index billsubsc_voice_idx on billsubsc(voice);

Performing the following query seems to take an excessive amount of time... especially when I have to do it a few hundred
times in the same web page!!!

mysql> SELECT billrec.br_id as BillRecID,billrec.time as BillTime
    -> FROM legs
    -> inner join billrec on ( legs.st_id_tx=billrec.st_id or legs.st_id_rx=billrec.st_id )
    -> inner join billsubsc as billsubscb on billsubscb.bs_id=billrec.bs_id_rx
    -> inner join billsubsc as billsubsca on billsubsca.bs_id=billrec.bs_id_tx
    -> WHERE
    -> legs.sc_id=29640 AND billrec.br_id>0
    -> AND legs.leg = billrec.leg ;
+-----------+----------+
| BillRecID | BillTime |
+-----------+----------+
|     42857 |  2160000 |
|     39112 |  2206300 |
+-----------+----------+
2 rows in set (1.65 sec)

Doing an explain on the query gives...

mysql> explain
    -> SELECT billrec.br_id as BillRecID,billrec.time as BillTime
    -> FROM legs
    -> inner join billrec on ( legs.st_id_tx=billrec.st_id or legs.st_id_rx=billrec.st_id )
    -> inner join billsubsc as billsubscb on billsubscb.bs_id=billrec.bs_id_rx
    -> inner join billsubsc as billsubsca on billsubsca.bs_id=billrec.bs_id_tx
    -> WHERE
    -> legs.sc_id=29640 AND billrec.br_id>0
    -> AND legs.leg = billrec.leg ;
+------------+--------+---------------------------------------------------------------+----------------+---------+------------------+------+------------+
| table      | type   | possible_keys                                                 | key            | key_len | ref              | rows | Extra      |
+------------+--------+---------------------------------------------------------------+----------------+---------+------------------+------+------------+
| legs       | ref    | legs_sc_id_idx,legs_st_id_tx_idx,legs_st_id_rx_idx,legs_s_idx | legs_sc_id_idx |       4 | const            |    1 | where used |
| billsubsca | ALL    | PRIMARY                                                       | NULL           |    NULL | NULL             |   29 |            |
| billrec    | ref    | PRIMARY,bs_id_tx_idx,billrec_st_id_idx                        | bs_id_tx_idx   |       4 | billsubsca.bs_id |  368 | where used |
| billsubscb | eq_ref | PRIMARY                                                       | PRIMARY        |       4 | billrec.bs_id_rx |    1 |            |
+------------+--------+---------------------------------------------------------------+----------------+---------+------------------+------+------------+
4 rows in set (0.01 sec)

My understanding of the EXPLAIN output is the join to billsubsca has the possible use of the PRIMARY key which seems sensible, but hasn't
actually used it!!! Unlike the join to billsubscb which also has the same PRIMARY key and has used it. Plus the join to billrec
has used bs_id_tx_idx which is a unique key and looks something like this...

create UNIQUE index  bs_id_tx_idx on billrec(bs_id_tx,bs_id_rx,billdate,time,duration,type,direction);

I am of the opinion that the purupose of the UNIQUE KEY is to ensure duplication does not occur in the process of an INSERT or an UPDATE. Why therefore
has MySql decided to use it as part of the SELECT query.

By removing this UNIQUE KEY from the billrec table the query is infinitley quicker...

mysql> SELECT billrec.br_id as BillRecID,billrec.time as BillTime
    -> FROM legs
    -> inner join billrec on ( legs.st_id_tx=billrec.st_id or legs.st_id_rx=billrec.st_id )
    -> inner join billsubsc as billsubscb on billsubscb.bs_id=billrec.bs_id_rx
    -> inner join billsubsc as billsubsca on billsubsca.bs_id=billrec.bs_id_tx
    -> WHERE
    -> legs.sc_id=29640 AND billrec.br_id>0
    -> AND legs.leg = billrec.leg ;
+-----------+----------+
| BillRecID | BillTime |
+-----------+----------+
|     39112 |  2206300 |
|     42857 |  2160000 |
+-----------+----------+
2 rows in set (0.01 sec)

and doing an explain gives...

mysql> explain
    -> SELECT billrec.br_id as BillRecID,billrec.time as BillTime
    -> FROM legs
    -> inner join billrec on ( legs.st_id_tx=billrec.st_id or legs.st_id_rx=billrec.st_id )
    -> inner join billsubsc as billsubscb on billsubscb.bs_id=billrec.bs_id_rx
    -> inner join billsubsc as billsubsca on billsubsca.bs_id=billrec.bs_id_tx
    -> WHERE
    -> legs.sc_id=29640 AND billrec.br_id>0
    -> AND legs.leg = billrec.leg ;
+------------+--------+---------------------------------------------------------------+----------------+---------+------------------+-------+----------------------------------------------+
| table      | type   | possible_keys                                                 | key            | key_len | ref              | rows  | Extra                                        |
+------------+--------+---------------------------------------------------------------+----------------+---------+------------------+-------+----------------------------------------------+
| legs       | ref    | legs_sc_id_idx,legs_st_id_tx_idx,legs_st_id_rx_idx,legs_s_idx | legs_sc_id_idx |       4 | const            |     1 | where used                                   |
| billrec    | ALL    | billrec_st_id_idx                                             | NULL           |    NULL | NULL             | 36887 | range checked for each record (index map: 8) |
| billsubscb | eq_ref | PRIMARY                                                       | PRIMARY        |       4 | billrec.bs_id_rx |     1 |                                              |
| billsubsca | eq_ref | PRIMARY                                                       | PRIMARY        |       4 | billrec.bs_id_tx |     1 |                                              |
+------------+--------+---------------------------------------------------------------+----------------+---------+------------------+-------+----------------------------------------------+
4 rows in set (0.00 sec)

Now both joins to the billsubsc table have used the PRIMARY key but surprisingly a full table scan of billrec has now occured and MySql hasn't bothered looking
at the key on billrec.st_id (KEY billrec_st_id_idx (st_id)) which I have tried to utilise by joining legs to billrec through the st_id. 

I would have thought the second query would be slower than first on account of the full table scan but apparantley not (although having said
that the first query despite having less rows to query joins through the non-indexed column billrec.bs_id_tx). This is extremely worrying
as we do have systems with 1/2 a million plus records in the billrec table. I doubt a full table scan of that would have the same result, in terms
of speed.

On describing the billrec table I can see that the UNIQUE KEY bs_id_tx_idx seems have an asociation with the bs_id_tx column in billrec...

mysql> desc billrec;
+-------------+-------------+------+-----+---------+----------------+
| Field       | Type        | Null | Key | Default | Extra          |
+-------------+-------------+------+-----+---------+----------------+
| br_id       | int(11)     |      | PRI | NULL    | auto_increment |
| bs_id_tx    | int(11)     |      | MUL | 0       |                | *** UNIQUE KEY bs_id_tx_idx ???
| bs_id_rx    | int(11)     |      |     | 0       |                |
| st_id       | int(11)     |      | MUL | 0       |                |
| leg         | int(11)     |      |     | 0       |                |
| billdate    | int(11)     |      | MUL | 0       |                |
| time        | int(11)     |      |     | 0       |                |
| duration    | int(11)     |      |     | 0       |                |
| type        | tinyint(4)  |      | MUL | 0       |                |
| direction   | tinyint(4)  |      |     | 0       |                |
+-------------+-------------+------+-----+---------+----------------+

By rearranging the column ordering of the UNIQUE KEY I was able to get the same result as if removing it completely however (e.g. infinitley quicker query execution).

BEFORE unique key
-----------------
create UNIQUE index  bs_id_tx_idx on billrec(bs_id_tx,bs_id_rx,billdate,time,duration,type,direction);

AFTER unique key
-------------
create UNIQUE index  bs_id_tx_idx on billrec(type,direction,bs_id_tx,bs_id_rx,billdate,time,duration);

mysql> desc billrec;
+-------------+-------------+------+-----+---------+----------------+
| Field       | Type        | Null | Key | Default | Extra          |
+-------------+-------------+------+-----+---------+----------------+
| br_id       | int(11)     |      | PRI | NULL    | auto_increment |
| bs_id_tx    | int(11)     |      |     | 0       |                |
| bs_id_rx    | int(11)     |      |     | 0       |                |
| st_id       | int(11)     |      | MUL | 0       |                |
| leg         | int(11)     |      |     | 0       |                |
| billdate    | int(11)     |      | MUL | 0       |                |
| time        | int(11)     |      |     | 0       |                |
| duration    | int(11)     |      |     | 0       |                |
| type        | tinyint(4)  |      | MUL | 0       |                |*** UNIQUE KEY bs_id_tx_idx ???
| direction   | tinyint(4)  |      |     | 0       |                |
+-------------+-------------+------+-----+---------+----------------+

Now I am slightly unsure as to which way to go to move this forward. Speed is of vital importance. The first query is no good because its slow
and is using the UNIQUE key incorrectly in my view, but the second query is worrying because of the full table scan.

I have tested on linux and solaris platforms with 3.23.45 and 3.23.54 and the problem is apparent on both but the speed is only really a factor on the Solaris box.

Any explanation/resolution for this problem would be most appreciated.

Besides ensuring that no duplicates occur, UNIQUE acts just like 
a regular index.

MySQL optimizer tries to find the best possible index to be used
in the query. Most of the times it succeeds, but there are cases
where there is a better option available. There have been more
JOIN cases optimized in 4.0, so you might want to try upgrading
to the latest 4.0 (which as of now is 4.0.14) to see if that solves
your problems.

If not, or if you don't want to upgrade, you can use the USE INDEX
clause in your query to manually force MySQL to use another 
index that is chosen by the optimizer.

The index to be used is chosen based on many different criteria,
including (but not limited to) value distribution in the table, so
unfortunately we're not able to try to reproduce your problem 
without the exact same data that you have.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".