MySQL Bugs: #75599: ScanOperation allocate to much memory for their receive buffers

Bug #75599	ScanOperation allocate to much memory for their receive buffers
Submitted:	23 Jan 2015 11:49	Modified:	13 Mar 2015 13:53
Reporter:	Ole John Aske	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	7.1.34	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
A scan operation, both a plain single table scan, and a 
'query scan' used by pushed join, stores the result set 
in a buffer. This maximum size of this buffer is calculated
and preallocated before the scan operation is started.

This buffer may consume considerable amount of memory, in
some cases we have observed a 2Gb buffer footprint in a test 
executing 100 parallel scans. This was for a tiny 2-node,
non-mt config (2 fragments), and the memory consumption will scale
linearly with more fragments

There are several root causes for this problem:

1. Result rows are 'unpacked' to full NdbRecord format before they
   are stored in the buffer. If only some of the table columns are
   selected from a table, there will be lots of empty (wasted) space
   in the buffer.
2. Due to the 'unpacked' buffer format, varchar/varbinary columns has
   to be allocated for the max size defined for the columns.
3. The 'BatchByteSize' and 'MaxScanBatchSize' is not taken into consideration
   as a limiting factor when calculating max buffer size.
4  As buffer size is scaled by 'BatchSize', the problem became worse
   with 7.2 where the default was raised from 64 to 256

There has been several bug reports from customer complaining about different
kind of 'API memory leak'. Several of these are believed to be due to
the problem described in this bug report.

How to repeat:
There are several AutoTests crashing randomly without any
traces of root causes found in the logs. Most of these are
believed to be due to OOM

- testScan -n ScanRead100 -l 100 T1 D1 D2
- testScan -n ScanRead40 -l 100 T1 D1 D2
- testScan -n ScanRead40RandomTable -l 100 T1
- testScan -n ScanRead488 -l 10 T6 D1 D2
- testScan -n ScanRead488O -l 10 T6 D1 D2
- testScan -n ScanRead488T -l 10 T6 D1 D2 
- testScan -n ScanRead488_Mixed -l 10 T6 D1 D2 
- testScan -n TupScanRead100 -l 100 T1 D1 D2

Suggested fix:

Store scan result rows in the buffer in a 'packed' format and unpack
first when the row is navigated to (made 'current row')

Take BatchByteSize into consideration as a limiting factor
when setting up buffer memory

Documented fix as follows in the NDB 7.1.35, 7.2.20, 7.3.9, and 7.4.5 changelogs:

    A scan operation, whether it is a single table scan or a query
    scan used by a pushed join, stores the result set in a buffer.
    This maximum size of this buffer is calculated and preallocated
    before the scan operation is started. This buffer may consume a
    considerable amount of memory; in some cases we have observed a
    2 GB buffer footprint in tests that executed 100 parallel scans
    with 2 single-threaded (ndbd) data nodes. Memory consumption was
    found to scale linearly with additional fragments.

    A number of root causes were discovered that led to this
    problem:

      -Result rows were unpacked to full NdbRecord format before they
    were stored in the buffer. If only some but not all columns of a
    table were selected, the buffer contained empty space
    (essentially wasted).

      -Due to the buffer format being unpacked, VARCHAR and VARBINARY
    columns had to be allocated for the maximum size defined for
    such columns.

      -BatchByteSize and MaxScanBatchSize values were not taken into
    consideration as a limiting factor when calculating the maximum
    buffer size.

    These issues became more evident in NDB 7.2 and later MySQL
    Cluster release series. This was due to the fact buffer size is
    scaled by BatchSize, and that the default value for this
    parameter was increased fourfold (from 64 to 256) beginning with
    MySQL Cluster NDB 7.2.1.

    This fix causes result rows to be buffered using the packed
    format instead of the unpacked format; a buffered scan result
    row is now not unpacked until it becomes the current row. In
    addition, BatchByteSize and MaxScanBatchSize are now used as
    limiting factors when calculating the required buffer size.

    Also as part of this fix, refactoring has been done to separate
    handling of buffered (packed) from handling of unbuffered result
    sets, and to remove code that had been unused since NDB 7.0 or
    earlier. The NdbRecord class declaration has also been cleaned
    up by removing a number of unused or redundant member variables.

Closed.