MySQL Bugs: #26624: high mem usage (crash) in range optimizer (depends on order of fields in where)

Bug #26624	high mem usage (crash) in range optimizer (depends on order of fields in where)
Submitted:	25 Feb 2007 21:25	Modified:	24 Apr 2007 1:39
Reporter:	Martin Friebe (Gold Quality Contributor) (OCA)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S2 (Serious)
Version:	5.0.38 4.1.23	OS:	Any (*)
Assigned to:	Sergey Petrunya	CPU Architecture:	Any
Tags:	crash, Memory, Optimizer, qc, range

Description:
The how to repeat contains 2 queries.

Running the 1st query, will allocate huge amounts of memory (and also lead to a crash)

Running the 2nd will work with little memory.
(It will take forever, but this long running query time is equal to bug ##20932)

(So there are 2 bugs: This one is about the fact that an unecessary amount of memory is (tried) to be allocated.)

Both queries perform the same task (and the only differnce is the ordering of fields in the where). No functional difference.

The 1st query should not use more memory than the 2nd.

(I will add a separate bug, about the "out of memory" not being reported as error, but leading to a crash)

How to repeat:
drop table if exists xl1;
create table xl1 (
c1 char(10), c2 char(10), c3 char(10), c4 char(10),
c5 char(10), c6 char(10), c7 char(10), c8 char(10),
c9 char(10), c10 char(10),c11 char(10), c12 char(10),
c13 char(10),c14 char(10), c15  char(10), c16  char(10),
index( c1, c2, c3, c4, c5, c6, c7, c8, c9, c10,
 c11, c12,c13,c14,c15,c16 )
);

insert into xl1 (c1) select '1';
insert into xl1 (c1) select '1';
insert into xl1 (c1) select '1';
insert into xl1 (c1) select '1';

select * from xl1 where
 c1 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c2 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c3 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c4 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c5 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c6 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c7 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c8 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c9 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c10 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c11 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c12 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c13 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c14 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c15 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c16 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
;

select * from xl1 where
 c16 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c15 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c14 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c13 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c12 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c11 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c10 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c9 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c8 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c7 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c6 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c5 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c4 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c3 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c2 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
 and c1 in ("abcdefgh", "123456789", "qwertyuio", "asddfgh")
;

Suggested fix:
The problem is in sql/opt_range.cc in function key_and (around line 4700):
  key1->use_count--;
  if (key1->use_count > 0)
   if (!(key1= key1->clone_tree()))
     return 0;				// OOM
  return and_all_keys(key1,key2,clone_flag);

in combination with function and_all_keys
   if (next->next_key_part)
    {
      SEL_ARG *tmp=key_and(next->next_key_part,key2,clone_flag);
  ...

In the 2nd query, key1 apears in descending key-part order, and never has next_key_part. Therefore keys are never cloned

In the 1st query, when the data for "c3" is appended, it finds that key1 allready has a next_key_part (c1 => c2). "and_key" is called to sort out c2 and c3, but all 4 values for c1 are pointing to c2, so it has a use_count of 4 and is cloned.

Thank you for the bug report. Below the error message printed by 5.0
MySQL Windows server:

C:\build\5.0\bin>mysqld-nt --standalone --console --basedir=c:\build\5.0 --datadir=c:\build\5.0\data
070225 21:18:04  InnoDB: Started; log sequence number 0 135412
070225 21:18:05 [Note] mysqld-nt: ready for connections.
Version: '5.0.36'  socket: ''  port: 3306  Source distribution
070225 21:20:46 [ERROR] Out of memory; check if mysqld or some other process uses all available memory; if not, you may
imit' to allow mysqld to use more memory or you can add more swap space
070225 21:20:48 [ERROR] Out of memory; check if mysqld or some other process uses all available memory; if not, you may
imit' to allow mysqld to use more memory or you can add more swap space

2 thinks:

1) the effect of
static SEL_ARG *key_and(SEL_ARG *key1, SEL_ARG *key2, uint clone_flag)
{
....
   if (key1->use_count > 0)
     if (!(key1= key1->clone_tree()))
  	return 0;				// OOM
  
seems poorly tested. The only test that fails (if the 3 lines above are removed) is an explain in null_key. (And I am not even sure, this test was intended to test the above code.)

2) I have written some SQL that will need the clone:
create table t1 (a char(20), b char(20), c char(20), index (a,b,c));

select * from t1 where
( (a = "1111" and c = "3311" ) or
  (a = "1122" )
) and b = "2211";

The tree is build in the following steps:
# after first line
(a="1111") -> (c="3311")

# after 2nd line
(a="1111") -> (c="3311")
(a="1122")

If intto the above tree the condition b="2211" is inserted,
then for the first list entry b->next_key_part will point to c

(a="1111") -> (b="2211") -> (c="3311")
...

This modified b must NOT be put into (a="1122")->next_keypart. as that would incorrectly narrow down the condition.
(a="1111") -> (b="2211") -> (c="3311")
(a="1122") -> (b="2211") -> (c="3311")  WRONG
instead of
(a="1111") -> (b="2211")   -> (c="3311")
(a="1122") -> clone:(b="2211")

There is a better way to decide on cloning (or to reduce clonning)

If on the above example the entry for "b" has been cloned, it could be remembered that we do now have a special clone "b" for any entry where a points at (c="3311"), and we could re-use ths clone.

If all a points at (c="3311"), then we are left with the unused original.

--- Solution 1
So, if we encounter the possible need to clone, we should scan forward (within the list for the current key-part) if any other key, has a different next_key_part.

If all entries have the same next_key_part => no need to clone

If some entries differ =>
- clone
- apply to all entries with the same keypart
 (they may be interuppted by entries with a differnt next_key_part)
- continue from the position where the need to clone was encountered

Now some of the entries in the list for the current keypart, will have been modified, some not. this looks like a problem, but can be solved.

add an clone_id field to the SEL_ARG class. use the address of the original to populate the clone_id.

While we keep scanning, we can make the following test

key->next_key_part && keyx->next_key_part->clone_id == key_that_was_cloned

and check if this entry is still to be processed.

---
--- Solution 2
The above needs a lot of forward scans.
If we accept that the original may be left over unused at the end, we can:

- if we need to clone => clone
- put a reference to the clone, on the object that next_key_part did point to

if we come to the next list entry, check if the refered object in next_key_part  has a reference to a clone, that we can use.

on the end of the and, perform a 2nd loop, and remove all the references to an usable clone. (or use a clone_id, and ensure no clones are accepted that could have been created earlier.)

---

Hope this is detailed enough to make sense.

It will be a little bit more tricky, if we have more keyparts
(and solution 1(forward scans) will not do))

part1 $ part2 $ p3 $ part4 $ part5
11 -> $ 21    $    $ --     $ 51

12    $       $

13 -> $ 21    $    $ 41
14 -^ $       $    $ 42

15 -> $ 21    $    $        $ 51

In order to and with key2
 p3 = 33 and p6=60

The search will start in the p1-list. (need to swap depending who has lowest part)

it will: p1=11  and p3=33
- p1 has a next-key-part
- p1->part < p3->part

get/check p2(21) = p1->next_ke_part
- p2->part < p3->part
get/check p5(11) = p2->next_ke_part
- p5->part !< p3->part

need to insert between p2(21) and p5(51)
- clone p3 (unfortunately we cant use the original, but we will limit cloning)
- clone:p3(33)->clone_id = & p3(33)
  # this will allow us to identify the clone as valid 
  # for the current and / we can ignore older left overs
- clone:p3(33)->next_keypart = p5(51)
- p2(21)->next_keypart = clone:p3(33)
- p5(51)->ref_by_clone = clone:p3(33)
  # if we get to p1(15)->p2(21)->p5(51) we can reuse clone:p3(33)
  # p1(15)->p2(21)->next_key_part = clone:p3(33)

for p1(12) we do not clone, we just assign p3 as it is. p3->next_keypart will be left empty. (or in this case p6(60)
(This is save, since we populate from the *inner* and (from the "where" clause) to the outer. any further key will affect all list entries)

for p1(13)->p2(21) we need a new clone
# note that p2(21) is a different clone from p2(21) than for p1(11)

for p1(14) we can reuse the clone from the previous row

for p1(15) we reuse the very first clone

---------------------------

while doing the above, after inserting p3(33) (not using a previous clone), in any row, we need to then look at p6(60) and the key, clone:p3(33) is now pointing to.

If we reuse the clone:p3(33) this is already done.

-------------------------
IMHO
clone does not need to clone the content of ->next_keypart

In the example above, there is no need to ever clone p6(60). We only need to lone, if we assign to p6(60)->next_keypart

-------------------------
IMHO
If the original p3(30) would not have been used (if p1(12) did ot exist) it could be destroyed, and the memory freed

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/23178

ChangeSet@1.2627, 2007-03-28 20:16:01+04:00, sergefp@mysql.com +3 -0
  BUG#26624: high mem usage (crash) in range optimizer
  - Added PARAM::alloced_sel_args where we count the # of SEL_ARGs
    created by SEL_ARG tree cloning operations.
  - Made the range analyzer to shortcut and not do any more cloning 
    if we've already created MAX_SEL_ARGS SEL_ARG objects in cloning.
  - Added comments about space complexity of SEL_ARG-graph 
    representation.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/23240

ChangeSet@1.2629, 2007-03-29 10:27:58+04:00, sergefp@mysql.com +1 -0
  BUG#26624: high mem usage (crash) in range optimizer
   - Post-review fixes

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/23241

ChangeSet@1.2422, 2007-03-29 10:35:28+04:00, sergefp@mysql.com +4 -0
  Merge of BUG#26624 and BUG#26625
  MERGE: 1.1616.2877.73

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/23273

ChangeSet@1.2509, 2007-03-29 14:54:08+04:00, sergefp@mysql.com +2 -0
  BUG#26624, merge to 5.1: make partition pruning module account for the 
  changes made by the bug fix.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/23439

ChangeSet@1.2630, 2007-03-31 00:29:18+04:00, sergefp@mysql.com +3 -0
  BUG#26624: high mem usage (crash) in range optimizer
  Pushbuild fixes: 
   - Make MAX_SEL_ARGS smaller (even 16K records_in_range() calls is 
     more than it makes sense to do in typical cases)
   - Don't call sel_arg->test_use_count() if we've already allocated 
     more than MAX_SEL_ARGs elements. The test will succeed but will take
     too much time for the test suite (and not provide much value).

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/23440

ChangeSet@1.2425, 2007-03-31 00:48:31+04:00, sergefp@mysql.com +3 -0
  BUG#26624, pushbuild fixes: Merge to 5.0
  MERGE: 1.1616.2877.74

Pushed into 5.1.18-beta

Pushed into 5.0.40

Pushed into 4.1.23

The changes made in this fix are:

Range optimizer could consume combinatorial amount of memory for certain
class of WHERE clauses. 
(I cannot provide a precise definition of this class, but
  - the clause must have certain structures
  - the index must have >2 key parts
  - the clause must be have a sufficient number of predicates 
    ('t.keypart $CMP$ const'))

Now the range optimizer ignores those kinds of WHERE clauses. This removes
the excessive memory usage and, most likely, will make queries run faster.
(because range analysis of the WHERE clause is not worth doing even if we 
had no memory limitations - it will took longer than the query execution)

Noted in 4.1.23, 5.0.40, 5.1.18 changelogs.