Bug #8113 | Select with between optimization | ||
---|---|---|---|
Submitted: | 24 Jan 2005 15:43 | Modified: | 18 Aug 2009 10:09 |
Reporter: | Emmanuel van der Meulen | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.0, 5.1, azalea bzr | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[24 Jan 2005 15:43]
Emmanuel van der Meulen
[14 Aug 2009 10:56]
Sveta Smirnova
Thank you for the report. Please provide repeatable test case: CREATE TABLE for every table used, exact query you use. Also please check newer versions 5.0.84 and 5.1.37 as a lot of improvement in optimizer were made.
[14 Aug 2009 11:20]
Emmanuel van der Meulen
Thank you Sveta Smirnova, >Please provide repeatable test case: CREATE TABLE for every table used The tables are as simple as in the original examples. No need for more. >exact query you use. The query is as simple as in the original example. No need for more. >Also please check newer versions 5.0.84 and 5.1.37 as a lot of improvement in optimizer were made. Can't say for sure, haven't used these versions but to reproduce this on your side would be pretty easy and straight forward. NB1. Once the issue I describe is understood--or as I like saying, once you get it--it would be easy to reproduce. NB2. If you do not have large volumes the performance won't be noticeable. NB3. This is a simple issue of the between algorithm which shouldn't be difficult to reproduce and to confirm that an improvement is or isn't possible. Kind regards Emmanuel
[14 Aug 2009 11:45]
Sveta Smirnova
Thank you for the feedback. But in the example provided: ---- table 'a' property 'num' num --- 1000 2000 3000 table 'b' properties 'fromNum' and 'toNum' fromNum toNum row 1. 0001 1500 row 2. 1501 2000 row 3. 2001 4000 --- I can guess table 'a' contains single field and 3 rows and table 'b' contains 2 fields and 3 rows. Also I can guess all fields are of INT type. But > select... where 'num' between 'fromNum' and 'toNum'. Here I need WHERE clause, because it is not clear how you join these 2 tables. Please provide.
[14 Aug 2009 12:47]
Emmanuel van der Meulen
Thank you Sveta Smirnova, > Here I need WHERE clause, because it is not clear how you join these 2 tables. Please provide. No join. The requirement: - num is an ipAddress. - fromNum and toNum are ipAddress range blocks - find in which ipAddress range block num existed (once found obtain required properties from 'b' - as I recall the info required from 'b' was country) The between would have been a great way to do it. But processing took too long so I used 'limit 1'. This was after working for days to determine why the query was so slow. Eventually used: WHERE toNum >= num limit 1 Would have liked to use: WHERE num between fromNum and toNum Does this help? Emmanuel
[14 Aug 2009 13:08]
Sveta Smirnova
Thank you for the feedback. Please provide output of SHOW CREATE TABLE a, SHOW CREATE TABLE b and full SELECT statement.
[14 Aug 2009 13:14]
Sveta Smirnova
Test case showing everything works fast in my environment: mysql> create temporary table a(num int); Query OK, 0 rows affected (0.07 sec) mysql> insert into a values(1000),(2000),(3000); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> create temporary table b(fromNum int, toNum int); Query OK, 0 rows affected (0.10 sec) mysql> insert into b values(0001, 1500),(1501, 2000),(2001, 4000); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> select * from a, b where 'num' between 'fromNum' and 'toNum'; +------+---------+-------+ | num | fromNum | toNum | +------+---------+-------+ | 1000 | 1 | 1500 | | 2000 | 1 | 1500 | | 3000 | 1 | 1500 | | 1000 | 1501 | 2000 | | 2000 | 1501 | 2000 | | 3000 | 1501 | 2000 | | 1000 | 2001 | 4000 | | 2000 | 2001 | 4000 | | 3000 | 2001 | 4000 | +------+---------+-------+ 9 rows in set (0.00 sec)
[14 Aug 2009 13:50]
Emmanuel van der Meulen
The moment table 'b' has say 500,000 rows, the query will drag its feet. If I recall it was around 4 seconds. Also 'b' didn't have duplicates. ipAddress ranges cannot be duplicated. Also toNum and fromNum do not overlap. ipAddress ranges cannot overlap. Therefore, if the query realized the table consists of non-duplicates and non-overlapping ranges--thus the columns are distinct and the ranges are distinct, then it does not need to traverse the full table. Thereby shortening the query time.
[14 Aug 2009 14:17]
Sveta Smirnova
Thank you for the feedback. MySQL did a lot of changes in optimizer code. This problem can be solved already. But to find what is wrong or correct in your case we need repeatable test case including CREATE TABLE definition. This is why I ask this information. You can add this privately as hidden comment if you don't want other people to see field names.
[14 Aug 2009 17:43]
Emmanuel van der Meulen
Thank you Sveta Smirnova, I feel if you 'get' what I'm saying it's real easy to reproduce. Below is further assistance to reproduce the slow query. Here is the query--in the WHERE I originally used BETWEEN then the same query runs 4 secs--as it now stands it's much faster: (here is a note I added in my Java program: with high volume; 'limit' over 'between' reduces query time from 4 secs to .01 sec:) SELECT IPRange.ipRangeId, IPRange.addedOn, IPRange.ipFrom, IPRange.ipTo, IPRange.ipRangeIP2CityLink, IP2City.ip2CityId, IP2City.addedOn, IP2City.ipRangeIP2CityLink, IP2City.countryShort, IP2City.regionId, IP2City.cityId, City.cityId, City.addedOn, City.countryShort, City.regionShort, City.cityLong, Region.regionId, Region.addedOn, Region.countryShort, Region.regionShort, Region.regionLong, locality.Country.countryId, locality.Country.addedOn, locality.Country.countryShort, locality.Country.countryLong FROM IPRange LEFT JOIN IP2City ON IPRange.ipRangeIP2CityLink = IP2City.ipRangeIP2CityLink LEFT JOIN locality.Country ON IP2City.countryShort = locality.Country.countryShort LEFT JOIN Region ON IP2City.regionId = Region.regionId LEFT JOIN City ON IP2City.cityId = City.cityId WHERE ipTo >= 2057920169 limit 1 Here is the link to the particular database backup/restore scripts (zipped); https://rcpt.yousendit.com/726269967/1ac248f7cb7e579d6e52c5b767cd1a7a Kind regards Emmanuel
[16 Aug 2009 12:26]
Sveta Smirnova
Thank you for the feedback. I have similar results with both BETWEEN and original queries with current 5.0 version in my environment. Explain shows same result as well. Please see file attached with results. Please try current version 5.0.85 in your environment and if problem still exists provide output of SELECT queries with time and EXPLAIN like I did.
[16 Aug 2009 12:26]
Sveta Smirnova
results of queries in viersion 5.0.85
Attachment: bug8113.txt (text/plain), 9.87 KiB.
[16 Aug 2009 12:34]
Emmanuel van der Meulen
Thank you, Sveta Smirnova, Okay, but wait, please run the between without the 'limit 1'. Then you'll see the query is 4 seconds. By adding limit 1 to between, you're achieving the same as the >= query with limit 1 and that's not what I'm asking for. I feel the SQL engine should be clever enough without the limit 1. Kind regards Emmanuel
[18 Aug 2009 9:27]
Sveta Smirnova
Thank you for the feedback. I finally got your point: you want query which uses BETWEEN condition which selects single row works as fast as with LIMIT 1. Feature request verified as described, although this is repeatable only with ~2000000 rows with tables from your database. If I cut a table to 1000000 rows there is no performance issue. Also FORCE INDEX doesn't make query any faster. Also adding [unique] index on both columns doesn't make query any faster as well. So feature request should sound like "implement some smart algorithm for such cases". How to repeat. 1. Setup table using following PHP script: <?php $db = new mysqli('127.0.0.1', 'root', '', 'test', 3360); $db->query("CREATE TABLE IF NOT EXISTS `bug8113` ( `ipFrom` bigint(20) NOT NULL DEFAULT '0', `ipTo` bigint(20) NOT NULL DEFAULT '0', UNIQUE KEY `ipFrom` (`ipFrom`), UNIQUE KEY `ipTo` (`ipTo`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1"); $j = 0; for ($i = 0; $i < 4000000; $i ++) { $db->query('insert into bug8113 values(' . ($j + 1) . ', ' . ($j += 100) . ')'); } ?> 2. From mysql client: mysql> explain select * from bug8113 where 205792019 between ipFrom and ipTo ; +----+-------------+---------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+------+---------------+------+---------+------+---------+-------------+ | 1 | SIMPLE | bug8113 | ALL | ipFrom,ipTo | NULL | NULL | NULL | 4000000 | Using where | +----+-------------+---------+------+---------------+------+---------+------+---------+-------------+ 1 row in set (0.00 sec) mysql> explain select * from bug8113 where 205792019 between ipFrom and ipTo limit 1; +----+-------------+---------+-------+---------------+------+---------+------+---------+------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+-------+---------------+------+---------+------+---------+------------------------+ | 1 | SIMPLE | bug8113 | range | ipFrom,ipTo | ipTo | 8 | NULL | 1940783 | Using where; Using MRR | +----+-------------+---------+-------+---------------+------+---------+------+---------+------------------------+ 1 row in set (0.01 sec) mysql> select * from bug8113 where 205792019 between ipFrom and ipTo ; +-----------+-----------+ | ipFrom | ipTo | +-----------+-----------+ | 205792001 | 205792100 | +-----------+-----------+ 1 row in set (3.74 sec) mysql> select * from bug8113 where 205792019 between ipFrom and ipTo limit 1; +-----------+-----------+ | ipFrom | ipTo | +-----------+-----------+ | 205792001 | 205792100 | +-----------+-----------+ 1 row in set (0.17 sec) Performance degradation is noticeable, altough it is easy to guess there would be single row in result set, because both indexes are unique.
[18 Aug 2009 10:09]
Emmanuel van der Meulen
Sveta Smirnova, thank you for your patience and perseverance. Kind regards Emmanuel
[2 Aug 2011 20:17]
Sergei Golubchik
How "there would be single row in result set" follows from "both indexes are unique"? create table t1 (a int, b int, unique(a), unique(b)); insert t1 (1,100),(2,101),(3,102); select * from t1 where 10 between a and b;
[3 Aug 2011 22:37]
Sveta Smirnova
Sergei, I believe in time when I wrote this comment I was thinking about feature original reporter asks for: if dataset is such what row is single, run fast. Assume we have a query with condition WHERE x BETWEEN idx1 AND idx2 I'd search for values in idx1 which are nearer to x, then do same for idx2. Now we have idx1.min, idx1.max, idx2.min, idx2.max, so we can get min = min(idx1.min, idx2.min) and max=max(idx1.max,idx2.max), then get query like WHERE idx1 BETWEEN min AND max AND idx2 BETWEEN min AND max
[30 May 2014 15:28]
Ruslan Sivak
I am running into this issue as well. I was able to mitigate it by using a STRAIGHT_JOIN. CREATE TABLE `t1` ( `num` int(10) unsigned NOT NULL, PRIMARY KEY (`num`) USING BTREE ) ENGINE=InnoDB DEFAULT CHARSET=latin1 CREATE TABLE `t2` ( `numLow` int(10) unsigned NOT NULL, `numHigh` int(10) unsigned NOT NULL, PRIMARY KEY (`numHigh`,`numLow`) USING BTREE, UNIQUE KEY `unique` (`numLow`,`numHigh`) USING BTREE ) ENGINE=InnoDB DEFAULT CHARSET=latin1 MIN_ROWS=9000000 PACK_KEYS=1 ROW_FORMAT=FIXED explain SELECT count(*) FROM t2 INNER JOIN t1 ON t1.num BETWEEN t2.numLow AND t2.numHigh; # id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra '1', 'SIMPLE', 't1', 'index', 'PRIMARY', 'PRIMARY', '4', NULL, '50015', 'Using index' '1', 'SIMPLE', 't2', 'ALL', 'PRIMARY,unique', NULL, NULL, NULL, '9597177', 'Range checked for each record (index map: 0x3)' explain SELECT count(*) FROM t2 STRAIGHT_JOIN t1 ON t1.num BETWEEN t2.numLow AND t2.numHigh; +------+-------------+-------+-------+----------------+---------+---------+------+---------+------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+----------------+---------+---------+------+---------+------------------------------------------------+ | 1 | SIMPLE | t2 | index | PRIMARY,unique | PRIMARY | 8 | NULL | 9597177 | Using index | | 1 | SIMPLE | t1 | ALL | PRIMARY | NULL | NULL | NULL | 50015 | Range checked for each record (index map: 0x1) | +------+-------------+-------+-------+----------------+---------+---------+------+---------+------------------------------------------------+ 2 rows in set (0.00 sec) The STRAIGHT_JOIN takes about 100s. The inner join never completes. Postgres was able to do the same query in under 12 seconds. Here is the postgres query plan: "Aggregate (cost=1771305342.61..1771305342.62 rows=1 width=0)" " -> Nested Loop (cost=0.29..1635274519.70 rows=54412329164 width=0)" " -> Seq Scan on t2 (cost=0.00..148096.97 rows=9613297 width=16)" " -> Index Only Scan using t1_pkey on t1 (cost=0.29..113.49 rows=5660 width=8)" " Index Cond: ((num >= t2.numlow) AND (num <= t2.numhigh))"
[11 Nov 2017 21:11]
Federico Razzoli
Seems to be solve in (or before) 8.0.3: mysql> explain select * from bug8113 where 205792019 between ipFrom and ipTo \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: bug8113 partitions: NULL type: range possible_keys: ipFrom,ipTo key: ipTo key_len: 8 ref: NULL rows: 7 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (0.00 sec) mysql> explain select * from bug8113 where 205792019 between ipFrom and ipTo limit 1\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: bug8113 partitions: NULL type: range possible_keys: ipFrom,ipTo key: ipTo key_len: 8 ref: NULL rows: 7 filtered: 100.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)
[13 Nov 2017 10:40]
Knut Anders Hatlen
Posted by developer: I'm still seeing the old plan when I run Sveta's repro with 4 million rows on head of trunk: mysql> explain select * from bug8113 where 205792019 between ipFrom and ipTo\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: bug8113 partitions: NULL type: ALL possible_keys: ipFrom,ipTo key: NULL key_len: NULL ref: NULL rows: 4000000 filtered: 24.98 Extra: Using where 1 row in set, 1 warning (0,00 sec)