Bug #26232 | review of bug #20944 / proof of concept attached | ||
---|---|---|---|
Submitted: | 9 Feb 2007 17:28 | Modified: | 15 Feb 2007 18:36 |
Reporter: | Martin Friebe (Gold Quality Contributor) (OCA) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 4.1.22, 5.0.34, 5.0.36-BK | OS: | Linux (Linux) |
Assigned to: | CPU Architecture: | Any | |
Tags: | Contribution, Optimizer, qc, range |
[9 Feb 2007 17:28]
Martin Friebe
[10 Feb 2007 23:23]
Martin Friebe
From my very limited search through the source code, it would appear to me that this can be amended in mi_range.c. mi_records_in_range seems to be called for each value in the IN list (or each pair, if multi column key, with 2 IN) It calls _mi_record_pos twice (for start_pos and end_pos), if both results are equal (same entry in the index?) then mi_records_in_range returns 1 row. However, I believe that if: 1) the min_key and max_key are equal 2) both results from _mi_record_pos are equlal 3) both keylookups returned, but did not actually find a hit. Then there a 0 rows and this could be returned. - The 1st condition, does probably not need to be checked (or only an assert for debug) - For the 3rd condition a way needs to be found to return the additional info. One place that indicates a non-hit seems here: if (pos == HA_OFFSET_ERROR) DBUG_RETURN(0.5); But there may be more places to check, for non-hits? - I have tested this, and it appears to lower the effect: - the narrow conditon stil sometimes returns more, but less than without the above condition - also 5.0 still expects more than 4.1 I also seem to be able to pass all tests (as far as I pass them without this), except where I expected the behaviour to change. --- Maybe on failure the 2nd call to _mi_record_pos can even be ommitted? (I havent tested this), but if both (min_key and max_key are equal, and the 1st call fails to hit an entry in the index, so should the 2nd?
[11 Feb 2007 0:28]
Martin Friebe
a proof of concept patch / sorry for the crude implementation / not complete
Attachment: range.patch (text/x-patch), 6.07 KiB.
[11 Feb 2007 0:35]
Martin Friebe
I have attached a patch, wich demonstrates how to get better results from mysql, on IN range queries Unfortunatly I had to comment out one line if (flag > 0 && ! nod_flag) { // *non_key_hit = (*non_key_hit) +1; offset= 1.0; } With this line, the IN/IN range on a multicolumn index, actually does get better results than the query on only the left part. Yet, it fails the "merge" test. And also cause issues on the help test. (I hope those issues can be fixed.) Also, there will most likely be a better place on the existing object structure, to push the additional info around. I do see the extra variable as a hack purely made for proof of concept only.
[11 Feb 2007 20:37]
Martin Friebe
2nd try / works with all but one test: mysqldump
Attachment: range.patch (text/x-patch), 6.77 KiB.
[12 Feb 2007 11:26]
Valeriy Kravchuk
Thank you for a bug report. Verified just as described with 5.0.36-BK on Linux: ... mysql> select count(*) from i1 where a=317; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from i1 where a=316; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from i1 where b=1222; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) mysql> explain select * from i1 where a in (317); +----+-------------+-------+------+---------------+------+---------+-------+---- --+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | row s | Extra | +----+-------------+-------+------+---------------+------+---------+-------+---- --+-------------+ | 1 | SIMPLE | i1 | ref | a | a | 4 | const | 1 | Using index | +----+-------------+-------+------+---------------+------+---------+-------+---- --+-------------+ 1 row in set (0.00 sec) mysql> explain select * from i1 where a in (317) and b in (1222); +----+-------------+-------+------+---------------+------+---------+------------ -+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------------ -+------+-------------+ | 1 | SIMPLE | i1 | ref | a | a | 8 | const,const | 1 | Using index | +----+-------------+-------+------+---------------+------+---------+------------ -+------+-------------+ 1 row in set (0.00 sec) mysql> explain select * from i1 where a in (317) and b in (1222, 1226); +----+-------------+-------+-------+---------------+------+---------+------+---- --+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | row s | Extra | +----+-------------+-------+-------+---------------+------+---------+------+---- --+--------------------------+ | 1 | SIMPLE | i1 | range | a | a | 8 | NULL | 2 | Using where; Using index | +----+-------------+-------+-------+---------------+------+---------+------+---- --+--------------------------+ 1 row in set (0.00 sec) mysql> explain select * from i1 where a in (317) and b in (1222, 1226, 1228); +----+-------------+-------+-------+---------------+------+---------+------+---- --+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | row s | Extra | +----+-------------+-------+-------+---------------+------+---------+------+---- --+--------------------------+ | 1 | SIMPLE | i1 | range | a | a | 8 | NULL | 3 | Using where; Using index | +----+-------------+-------+-------+---------------+------+---------+------+---- --+--------------------------+ 1 row in set (0.01 sec) mysql> explain select * from i1 where a in (316, 317) and b in (1222, 1226, 122 8); +----+-------------+-------+-------+---------------+------+---------+------+---- --+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | row s | Extra | +----+-------------+-------+-------+---------------+------+---------+------+---- --+--------------------------+ | 1 | SIMPLE | i1 | range | a | a | 8 | NULL | 6 | Using where; Using index | +----+-------------+-------+-------+---------------+------+---------+------+---- --+--------------------------+ 1 row in set (0.00 sec) mysql> select version(); +-----------+ | version() | +-----------+ | 5.0.36 | +-----------+ 1 row in set (0.00 sec)
[12 Feb 2007 18:18]
Martin Friebe
Just a thought. It may also be needed to check for min_key->flag == HA_READ_KEY_EXACT && max_key->flag == HA_READ_AFTER_KEY I don't know which other combinations may appear, and if the assumption 0 rows will be true for all of them. Hiting the first or last or even the single entry in the keyfile, may be of special interest, depending on the flags?
[13 Feb 2007 23:38]
Martin Friebe
passes all tests (except expected explain changes) + minor perfomance tweak
Attachment: range1.patch (text/x-patch), 8.02 KiB.
[15 Feb 2007 18:36]
Martin Friebe
One more thing, I found around this code. I *believe* there is a potential for overestimate in a very rare situation: (Not making that a bug of its own, as I am not 100% sure) mi_range.c: double _mi_search_pos if bin_search returned 0 (exact hit), SEARCH_FIND is true and this is a node, then the previous sub-node is searched, as it can contain equal entries. If this sub-node does not contain a match, then offset should be 1.0, so we would return the position of the exact match. But offset willbe whatever the call to _mi_search_pos for the sub-node returned. Normally this offset will still be very close to 1.0 and the estimate will be 0 or 1 off. But if we had a large amount of records, and the index was not optimized for a long time, maybe even "delete quick" was used, we could have had a subnode with just 1 or 2 entries. In this case offset would have been returned as "(x+0.5)/(x+1)" (0.5 is the offset for HA_OFFSET_ERROR / x the keynr of the last node) 1.5/2 = 0.75 2.5/3 = 0.84 Multiplying this difference with 10,000 rows, means an extra estimate of 2500 or 1600 rows IF this is true, There are several possible fixes: if(HA_OFFSET_ERROR) return 1.0; // this is if it can't be triggered otherwise in the block for "if (flag)" after "if(flag > 0 && !nod_flag)" if(flag<0 && afterkey) offset =1.0 or with the patch applied, check for non_key_hit, after the call to _mi_search_pos
[20 Sep 2007 15:03]
Chad MILLER
Martin says: The same thing can be experienced with innodb tables. Heap tables (using btree index) do not overestimate in this way.