Bug #116272 Total FTS index cache is not limited by the innodb_ft_total_cache_size
Submitted: 1 Oct 2024 1:08 Modified: 4 Oct 2024 7:08
Reporter: Pranay Motupalli Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:8.0, 9.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: fts, memory consumption

[1 Oct 2024 1:08] Pranay Motupalli
Description:
The total full-text search (FTS) index cache is not properly limited by the innodb_ft_total_cache_size configuration. This variable is meant to set a global memory limit for all full-text search indexes across the system.

The issue is that the fts_is_sync_needed function, which checks if the total FTS cache size has exceeded the global limit and triggers a forced sync, is not being called regularly. As a result, the FTS cache can continue to grow unchecked, consuming large amounts of memory even beyond the 640 MB default limit set by innodb_ft_total_cache_size.

The issue can be reproduced when there are numerous tables, each with a full-text search index ( without hitting the table level limit ), as the FTS cache for all these indexes is aggregated without any enforcement of the global limit. This can quickly consume a significant portion of the available system memory.

Expected Behavior: The total FTS cache size should be limited by the innodb_ft_total_cache_size configuration parameter.

How to repeat:
Scnearios and Code Analysis:

1. fts_is_sync_needed function is only called when the background optimize thread queue is empty. The fts_is_sync_needed function performs the following steps:

    * Acquire the dict_sys mutex.
    * Open the tables and index list of all the tables in the FTS cache.
    * Calculate the sum of individual table cache sizes to determine the total FTS cache size.
    * Check if the total FTS cache size exceeds the configured limit.
    * If the limit is exceeded, set the fts_need_sync variable to True.
    * Release the dict_sys mutex.

This function is responsible for checking if the total FTS cache size exceeds the configured limit and setting a flag (fts_need_sync) if it does. However, when the background FTS optimize thread is busy, the cache can continue growing unchecked.

/* Timeout ? */
      if (msg == nullptr) {
        if (fts_is_sync_needed(tables)) {
          fts_need_sync = true;
        }

        continue;
      }

https://github.com/mysql/mysql-server/blob/trunk/storage/innobase/fts/fts0opt.cc#L2883

2. After engine restart and cache recovery (fts_init_recover_doc function), the total cache size or fts_need_sync flag is not being checked. This allows new documents to be inserted into the cache, potentially exceeding the configured limit.

https://github.com/mysql/mysql-server/blob/trunk/storage/innobase/fts/fts0fts.cc#L6123

Here's a scenario where the cache can grow beyond the limit:

    * Create 3000 tables and insert a few rows into each table, ensuring that the per-table cache size is not exceeded.
    * Reboot the engine.
    * After the reboot, create another set of 3000 tables and insert data as before, filling the cache again.
    * Perform a select operation on the first set of 3000 tables, causing the cache to grow to double the size.

---------

Repro: ( Insert Only Scenario )

mysql> show global variables like '%version%';
+--------------------------+------------------------------+
| Variable_name            | Value                        |
+--------------------------+------------------------------+
| admin_tls_version        | TLSv1.2,TLSv1.3              |
| innodb_version           | 8.0.39                       |
| protocol_version         | 10                           |
| replica_type_conversions |                              |
| slave_type_conversions   |                              |
| tls_version              | TLSv1.2,TLSv1.3              |
| version                  | 8.0.39                       |
| version_comment          | MySQL Community Server - GPL |
| version_compile_machine  | x86_64                       |
| version_compile_os       | Linux                        |
| version_compile_zlib     | 1.2.13                       |
+--------------------------+------------------------------+
11 rows in set (0.01 sec)

mysql> show global variables like '%ft%';
+---------------------------------+----------------+
| Variable_name                   | Value          |
+---------------------------------+----------------+
| ft_boolean_syntax               | + -><()~*:""&| |
| ft_max_word_len                 | 84             |
| ft_min_word_len                 | 4              |
| ft_query_expansion_limit        | 20             |
| ft_stopword_file                | (built-in)     |
| innodb_ft_aux_table             |                |
| innodb_ft_cache_size            | 8000000        |
| innodb_ft_enable_diag_print     | OFF            |
| innodb_ft_enable_stopword       | ON             |
| innodb_ft_max_token_size        | 84             |
| innodb_ft_min_token_size        | 3              |
| innodb_ft_num_word_optimize     | 2000           |
| innodb_ft_result_cache_limit    | 2000000000     |
| innodb_ft_server_stopword_table |                |
| innodb_ft_sort_pll_degree       | 2              |
| innodb_ft_total_cache_size      | 640000000      |
| innodb_ft_user_stopword_table   |                |
+---------------------------------+----------------+
17 rows in set (0.01 sec)

Repro Python Code: ( For Insert Only Scenario )

import sys
import mysql.connector

# Get the database hosts from command-line arguments
if len(sys.argv) < 1:
    print("Usage: python script.py <host>")
    sys.exit(1)
db_host = sys.argv[1]

DB_USER = "xxxxx"
DB_PASS = "xxxxx"
DB_NAME = "xxxxxxxxxxx"

# Connect to the MySQL server (writer)
conn = mysql.connector.connect(
    host=db_host,
    user=DB_USER,
    password=DB_PASS,
    database=DB_NAME
)

# Create a cursor object
cursor = conn.cursor()

# Create database and table, insert initial data

cursor.execute("CREATE DATABASE IF NOT EXISTS test")
cursor.execute("""
    CREATE TABLE IF NOT EXISTS test.opening_lines (
        id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
        opening_line TEXT(500),
        author VARCHAR(200),
        title VARCHAR(200),
        FULLTEXT idx (opening_line)
    ) ENGINE=InnoDB;
""")

initial_data = [
    ("Call me Ishmael.", "Herman Melville", "Moby-Dick"),
    ("A screaming comes across the sky.", "Thomas Pynchon", "Gravity's Rainbow"),
    ("I am an invisible man.", "Ralph Ellison", "Invisible Man"),
    ("Where now? Who now? When now?", "Samuel Beckett", "The Unnamable"),
    ("It was love at first sight.", "Joseph Heller", "Catch-22"),
    ("All this happened, more or less.", "Kurt Vonnegut", "Slaughterhouse-Five"),
    ("Mrs. Dalloway said she would buy the flowers herself.", "Virginia Woolf", "Mrs. Dalloway"),
    ("It was a pleasure to burn.", "Ray Bradbury", "Fahrenheit 451")
]

insert_query = "INSERT INTO test.opening_lines (opening_line, author, title) VALUES (%s, %s, %s)"
cursor.executemany(insert_query, initial_data)

# Double the number of rows until approximately 262,144 rows

for _ in range(11):
    cursor.execute("INSERT INTO test.opening_lines(opening_line, author, title) SELECT opening_line, author, title FROM test.opening_lines")
    conn.commit()

# Create 3000 tables
for i in range(1, 3000):
    table_name = f"test.opening_lines_{i}"
    create_table_query = f"""
        CREATE TABLE {table_name} (
            id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
            opening_line TEXT(500),
            author VARCHAR(200),
            title VARCHAR(200),
            FULLTEXT idx (opening_line)
        ) ENGINE=InnoDB;
    """
    cursor.execute(create_table_query)

# Insert 10000 rows into each table
for i in range(1, 3000):
    table_name = f"test.opening_lines_{i}"
    insert_query = f"""
        INSERT INTO {table_name} (opening_line, author, title)
        SELECT opening_line, 'Anonymous', title
        FROM test.opening_lines
        LIMIT 10000;
    """
    cursor.execute(insert_query)
    conn.commit()

# Close the cursor and connection
cursor.close()
conn.close()

----

1. Install and start MySQL community server latest version.
2. create database test in it.
3. Run the above given script
4. Monitor the memory usage using below query.

select * from memory_global_by_current_bytes where event_name like '%fts%';

My System FTS Config:

mysql> show global variables like '%version%';
+--------------------------+------------------------------+
| Variable_name            | Value                        |
+--------------------------+------------------------------+
| admin_tls_version        | TLSv1.2,TLSv1.3              |
| innodb_version           | 8.0.39                       |
| protocol_version         | 10                           |
| replica_type_conversions |                              |
| slave_type_conversions   |                              |
| tls_version              | TLSv1.2,TLSv1.3              |
| version                  | 8.0.39                       |
| version_comment          | MySQL Community Server - GPL |
| version_compile_machine  | x86_64                       |
| version_compile_os       | Linux                        |
| version_compile_zlib     | 1.2.13                       |
+--------------------------+------------------------------+
11 rows in set (0.01 sec)

mysql> show global variables like '%ft%';
+---------------------------------+----------------+
| Variable_name                   | Value          |
+---------------------------------+----------------+
| ft_boolean_syntax               | + -><()~*:""&| |
| ft_max_word_len                 | 84             |
| ft_min_word_len                 | 4              |
| ft_query_expansion_limit        | 20             |
| ft_stopword_file                | (built-in)     |
| innodb_ft_aux_table             |                |
| innodb_ft_cache_size            | 8000000        |
| innodb_ft_enable_diag_print     | OFF            |
| innodb_ft_enable_stopword       | ON             |
| innodb_ft_max_token_size        | 84             |
| innodb_ft_min_token_size        | 3              |
| innodb_ft_num_word_optimize     | 2000           |
| innodb_ft_result_cache_limit    | 2000000000     |
| innodb_ft_server_stopword_table |                |
| innodb_ft_sort_pll_degree       | 2              |
| innodb_ft_total_cache_size      | 640000000      |
| innodb_ft_user_stopword_table   |                |
+---------------------------------+----------------+
17 rows in set (0.01 sec)

Attached Memory usage graph tracked using below query during insertion. As you can see, the FTS cache memory allocated exceeded the default limit.

Suggested fix:
1. Implement a more efficient mechanism to track the total FTS cache size in a global variable of using the current logic of taking latch and adding all the individual table memory limits on demand.

2. Use the above tracked variable to quickly verify on every iteration of fts_optmize_thread loop and remove dependency to check when the queue is empty. This will check the limits and set fts_need_sync every time it enters the loop.

3. Check the total cache size or fts_need_sync flag in the fts_init_recover_doc function path and add it to background queue for fts optimize thread to sync lazily. This can temporarily increase the buffer without affecting the performance of the foreground query but eventually background thread syncs and there by limits the total ft cache size to the configured value.

static bool fts_init_recover_doc(void *row,      /*!< in: sel_node_t* */
     cache->next_doc_id = doc_id + 1;
   }

+  if ( fts_need_sync ) {
+    fts_optimize_request_sync_table(cache->sync->table);
+  }
+
   return true;
 }
[1 Oct 2024 1:12] Pranay Motupalli
Memory Usage during insert test

Attachment: Pasted Graphic copy.jpg (image/jpeg, text), 337.29 KiB.

[4 Oct 2024 7:08] MySQL Verification Team
Hello Pranay,

Thank you for the report and feedback.

regards,
Umesh