Description:
MySQL query optimisation and UNIQUE keys
How to repeat:
Author - Richard Seaman 26/06/03
Mysql Version - Tested on 3.23.45 and 3.23.54
Platform - Solaris 8
The portion of the database concerned is -
CREATE TABLE legs (
sc_id int(11) NOT NULL default '0',
st_id_tx int(11) NOT NULL default '0',
st_id_rx int(11) NOT NULL default '0',
leg int(11) NOT NULL default '0'
) TYPE=MyISAM;
#
# Indices to create on table 'legs'
#
create index legs_sc_id_idx on legs(sc_id);
create index legs_st_id_tx_idx on legs(st_id_tx);
create index legs_st_id_rx_idx on legs(st_id_rx);
CREATE TABLE billrec (
br_id int(11) NOT NULL auto_increment,
bs_id_tx int(11) NOT NULL default '0',
bs_id_rx int(11) NOT NULL default '0',
st_id int(11) NOT NULL default '0',
leg int(11) NOT NULL default '0',
billdate int(11) NOT NULL default '0',
time int(11) NOT NULL default '0',
duration int(11) NOT NULL default '0',
type tinyint(4) NOT NULL default '0',
direction tinyint(4) NOT NULL default '0',
PRIMARY KEY (br_id),
UNIQUE KEY bs_id_tx (bs_id_tx,bs_id_rx,billdate,time,duration,type,direction),
KEY billrec_st_id_idx (st_id)
) TYPE=MyISAM;
create table billsubsc (
bs_id int4 auto_increment PRIMARY KEY,
s_id int4,
voice char(30) NOT NULL,
UNIQUE (voice)
);
create index billsubsc_voice_idx on billsubsc(voice);
Performing the following query seems to take an excessive amount of time... especially when I have to do it a few hundred
times in the same web page!!!
mysql> SELECT billrec.br_id as BillRecID,billrec.time as BillTime
-> FROM legs
-> inner join billrec on ( legs.st_id_tx=billrec.st_id or legs.st_id_rx=billrec.st_id )
-> inner join billsubsc as billsubscb on billsubscb.bs_id=billrec.bs_id_rx
-> inner join billsubsc as billsubsca on billsubsca.bs_id=billrec.bs_id_tx
-> WHERE
-> legs.sc_id=29640 AND billrec.br_id>0
-> AND legs.leg = billrec.leg ;
+-----------+----------+
| BillRecID | BillTime |
+-----------+----------+
| 42857 | 2160000 |
| 39112 | 2206300 |
+-----------+----------+
2 rows in set (1.65 sec)
Doing an explain on the query gives...
mysql> explain
-> SELECT billrec.br_id as BillRecID,billrec.time as BillTime
-> FROM legs
-> inner join billrec on ( legs.st_id_tx=billrec.st_id or legs.st_id_rx=billrec.st_id )
-> inner join billsubsc as billsubscb on billsubscb.bs_id=billrec.bs_id_rx
-> inner join billsubsc as billsubsca on billsubsca.bs_id=billrec.bs_id_tx
-> WHERE
-> legs.sc_id=29640 AND billrec.br_id>0
-> AND legs.leg = billrec.leg ;
+------------+--------+---------------------------------------------------------------+----------------+---------+------------------+------+------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+------------+--------+---------------------------------------------------------------+----------------+---------+------------------+------+------------+
| legs | ref | legs_sc_id_idx,legs_st_id_tx_idx,legs_st_id_rx_idx,legs_s_idx | legs_sc_id_idx | 4 | const | 1 | where used |
| billsubsca | ALL | PRIMARY | NULL | NULL | NULL | 29 | |
| billrec | ref | PRIMARY,bs_id_tx_idx,billrec_st_id_idx | bs_id_tx_idx | 4 | billsubsca.bs_id | 368 | where used |
| billsubscb | eq_ref | PRIMARY | PRIMARY | 4 | billrec.bs_id_rx | 1 | |
+------------+--------+---------------------------------------------------------------+----------------+---------+------------------+------+------------+
4 rows in set (0.01 sec)
My understanding of the EXPLAIN output is the join to billsubsca has the possible use of the PRIMARY key which seems sensible, but hasn't
actually used it!!! Unlike the join to billsubscb which also has the same PRIMARY key and has used it. Plus the join to billrec
has used bs_id_tx_idx which is a unique key and looks something like this...
create UNIQUE index bs_id_tx_idx on billrec(bs_id_tx,bs_id_rx,billdate,time,duration,type,direction);
I am of the opinion that the purupose of the UNIQUE KEY is to ensure duplication does not occur in the process of an INSERT or an UPDATE. Why therefore
has MySql decided to use it as part of the SELECT query.
By removing this UNIQUE KEY from the billrec table the query is infinitley quicker...
mysql> SELECT billrec.br_id as BillRecID,billrec.time as BillTime
-> FROM legs
-> inner join billrec on ( legs.st_id_tx=billrec.st_id or legs.st_id_rx=billrec.st_id )
-> inner join billsubsc as billsubscb on billsubscb.bs_id=billrec.bs_id_rx
-> inner join billsubsc as billsubsca on billsubsca.bs_id=billrec.bs_id_tx
-> WHERE
-> legs.sc_id=29640 AND billrec.br_id>0
-> AND legs.leg = billrec.leg ;
+-----------+----------+
| BillRecID | BillTime |
+-----------+----------+
| 39112 | 2206300 |
| 42857 | 2160000 |
+-----------+----------+
2 rows in set (0.01 sec)
and doing an explain gives...
mysql> explain
-> SELECT billrec.br_id as BillRecID,billrec.time as BillTime
-> FROM legs
-> inner join billrec on ( legs.st_id_tx=billrec.st_id or legs.st_id_rx=billrec.st_id )
-> inner join billsubsc as billsubscb on billsubscb.bs_id=billrec.bs_id_rx
-> inner join billsubsc as billsubsca on billsubsca.bs_id=billrec.bs_id_tx
-> WHERE
-> legs.sc_id=29640 AND billrec.br_id>0
-> AND legs.leg = billrec.leg ;
+------------+--------+---------------------------------------------------------------+----------------+---------+------------------+-------+----------------------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+------------+--------+---------------------------------------------------------------+----------------+---------+------------------+-------+----------------------------------------------+
| legs | ref | legs_sc_id_idx,legs_st_id_tx_idx,legs_st_id_rx_idx,legs_s_idx | legs_sc_id_idx | 4 | const | 1 | where used |
| billrec | ALL | billrec_st_id_idx | NULL | NULL | NULL | 36887 | range checked for each record (index map: 8) |
| billsubscb | eq_ref | PRIMARY | PRIMARY | 4 | billrec.bs_id_rx | 1 | |
| billsubsca | eq_ref | PRIMARY | PRIMARY | 4 | billrec.bs_id_tx | 1 | |
+------------+--------+---------------------------------------------------------------+----------------+---------+------------------+-------+----------------------------------------------+
4 rows in set (0.00 sec)
Now both joins to the billsubsc table have used the PRIMARY key but surprisingly a full table scan of billrec has now occured and MySql hasn't bothered looking
at the key on billrec.st_id (KEY billrec_st_id_idx (st_id)) which I have tried to utilise by joining legs to billrec through the st_id.
I would have thought the second query would be slower than first on account of the full table scan but apparantley not (although having said
that the first query despite having less rows to query joins through the non-indexed column billrec.bs_id_tx). This is extremely worrying
as we do have systems with 1/2 a million plus records in the billrec table. I doubt a full table scan of that would have the same result, in terms
of speed.
On describing the billrec table I can see that the UNIQUE KEY bs_id_tx_idx seems have an asociation with the bs_id_tx column in billrec...
mysql> desc billrec;
+-------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+----------------+
| br_id | int(11) | | PRI | NULL | auto_increment |
| bs_id_tx | int(11) | | MUL | 0 | | *** UNIQUE KEY bs_id_tx_idx ???
| bs_id_rx | int(11) | | | 0 | |
| st_id | int(11) | | MUL | 0 | |
| leg | int(11) | | | 0 | |
| billdate | int(11) | | MUL | 0 | |
| time | int(11) | | | 0 | |
| duration | int(11) | | | 0 | |
| type | tinyint(4) | | MUL | 0 | |
| direction | tinyint(4) | | | 0 | |
+-------------+-------------+------+-----+---------+----------------+
By rearranging the column ordering of the UNIQUE KEY I was able to get the same result as if removing it completely however (e.g. infinitley quicker query execution).
BEFORE unique key
-----------------
create UNIQUE index bs_id_tx_idx on billrec(bs_id_tx,bs_id_rx,billdate,time,duration,type,direction);
AFTER unique key
-------------
create UNIQUE index bs_id_tx_idx on billrec(type,direction,bs_id_tx,bs_id_rx,billdate,time,duration);
mysql> desc billrec;
+-------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+----------------+
| br_id | int(11) | | PRI | NULL | auto_increment |
| bs_id_tx | int(11) | | | 0 | |
| bs_id_rx | int(11) | | | 0 | |
| st_id | int(11) | | MUL | 0 | |
| leg | int(11) | | | 0 | |
| billdate | int(11) | | MUL | 0 | |
| time | int(11) | | | 0 | |
| duration | int(11) | | | 0 | |
| type | tinyint(4) | | MUL | 0 | |*** UNIQUE KEY bs_id_tx_idx ???
| direction | tinyint(4) | | | 0 | |
+-------------+-------------+------+-----+---------+----------------+
Now I am slightly unsure as to which way to go to move this forward. Speed is of vital importance. The first query is no good because its slow
and is using the UNIQUE key incorrectly in my view, but the second query is worrying because of the full table scan.
I have tested on linux and solaris platforms with 3.23.45 and 3.23.54 and the problem is apparent on both but the speed is only really a factor on the Solaris box.
Any explanation/resolution for this problem would be most appreciated.