Bug #120575 Range optimizer chooses bad range plan for large IN lists and execution is very slow and memory usage spikes.
Submitted: 30 May 6:57
Reporter: hel le Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.41+ OS:Any
Assigned to: CPU Architecture:Any
Tags: Memory, Optimizer, performance, range_optimizer_max_mem_size, range-optimizer

[30 May 6:57] hel le
Description:
                                                                                                                                                                                                                                                 MySQL optimizer produces a range scan execution plan with significant row overestimation when querying a table with a composite index i_primary_unit_div_emp_order and large IN-list conditions (8598 division_id values and 2323 unit_id values). The actual matching rows are much smaller ( approximately 6623). Additionally, executing this query causes significant memory usage increase of 20GB+.                                                                                                                                                                                                                             

This can be identify issues with MySQL's range optimizer when dealing with queries that have large IN-list values combined with index conditions. 

How to repeat:
// setup 1 : load data

DROP TABLE IF EXISTS t2;
CREATE TABLE `t2` (
  `rid` varchar(36) NOT NULL,
  `tenant_code` varchar(36) DEFAULT NULL,
  `emp_id` varchar(36) DEFAULT NULL,
  `emp_no` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL,
  `unit_id` varchar(36) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL,
  `division_id` varchar(36) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL,
  `position_id` varchar(36) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL,
  `is_primary` int(11) DEFAULT NULL,
  `biz_tenant` varchar(36) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL DEFAULT '~',
  `del_flag` int NOT NULL DEFAULT '0',
  PRIMARY KEY (`rid`) USING BTREE,
  KEY `idx_biz_tenant` (`biz_tenant`) USING BTREE,
  KEY `biz_tenant_empid_index` (`biz_tenant`,`emp_id`),
  KEY `biz_tenant_divisionid_index` (`biz_tenant`,`division_id`),
  KEY `i_index_biz_tenant_unitid` (`biz_tenant`,`unit_id`),
  KEY `i_as_unit_empno` (`unit_id`,`emp_no`),
  KEY `i_primary_unit_div_emp_order` (`biz_tenant`,`is_primary`,`unit_id` DESC,`division_id` DESC,`emp_no` DESC)
) ENGINE=InnoDB;

SET @row1=0;
INSERT INTO t2 (rid, tenant_code, emp_id, emp_no, unit_id, division_id, position_id, is_primary, biz_tenant, del_flag)
SELECT UUID(),'ua1uxpnc',UUID(),CONCAT('EMP_',LPAD(@row1:=@row1+1,10,'0')),
       CONCAT('U_',LPAD(FLOOR(RAND()*522),6,'0')),
       CONCAT('D_',LPAD(FLOOR(RAND()*1662),8,'0')),
       UUID(),1,'ua1uxpnc',0
FROM information_schema.columns c1, information_schema.columns c2 LIMIT 6623;

SET @row2=0;
INSERT INTO t2 (rid, tenant_code, emp_id, emp_no, unit_id, division_id, position_id, is_primary, biz_tenant, del_flag)
SELECT UUID(),'wxwqxjkv',UUID(),CONCAT('EMP_',LPAD(@row2:=@row2+1+6623,10,'0')),
       CONCAT('U_',LPAD(522+FLOOR(RAND()*12),6,'0')),
       CONCAT('D_',LPAD(1662+FLOOR(RAND()*7993),8,'0')),
       UUID(),1,'wxwqxjkv',0
FROM information_schema.columns c1, information_schema.columns c2, information_schema.columns c3 LIMIT 275588;

SET @row3=0;
INSERT INTO t2 (rid, tenant_code, emp_id, emp_no, unit_id, division_id, position_id, is_primary, biz_tenant, del_flag)
SELECT UUID(),'yc8o1f81',UUID(),CONCAT('EMP_',LPAD(@row3:=@row3+1+282211,10,'0')),
       CONCAT('U_',LPAD(534,6,'0')),
       CONCAT('D_',LPAD(9655+FLOOR(RAND()*5998),8,'0')),
       UUID(),1,'yc8o1f81',0
FROM information_schema.columns c1, information_schema.columns c2, information_schema.columns c3 LIMIT 85767;

SET @row4=0;
INSERT INTO t2 (rid, tenant_code, emp_id, emp_no, unit_id, division_id, position_id, is_primary, biz_tenant, del_flag)
SELECT UUID(),'ze6duwya',UUID(),CONCAT('EMP_',LPAD(@row4:=@row4+1+367978,10,'0')),
       CONCAT('U_',LPAD(535+FLOOR(RAND()*44),6,'0')),
       CONCAT('D_',LPAD(15653+FLOOR(RAND()*156),8,'0')),
       UUID(),1,'ze6duwya',0
FROM information_schema.columns c1, information_schema.columns c2 LIMIT 13130;

SET @row5=0;
INSERT INTO t2 (rid, tenant_code, emp_id, emp_no, unit_id, division_id, position_id, is_primary, biz_tenant, del_flag)
SELECT UUID(),'tenant_fill1',UUID(),CONCAT('EMP_',LPAD(@row5:=@row5+1+381108,10,'0')),
       CONCAT('U_',LPAD(579+FLOOR(RAND()*1000),6,'0')),
       CONCAT('D_',LPAD(FLOOR(RAND()*8598),8,'0')),
       UUID(),1,'tenant_fill1',0
FROM information_schema.columns c1, information_schema.columns c2 LIMIT 10000;

SET @row6=0;
INSERT INTO t2 (rid, tenant_code, emp_id, emp_no, unit_id, division_id, position_id, is_primary, biz_tenant, del_flag)
SELECT UUID(),'tenant_fill2',UUID(),CONCAT('EMP_',LPAD(@row6:=@row6+1+391108,10,'0')),
       CONCAT('U_',LPAD(1579+FLOOR(RAND()*744),6,'0')),
       CONCAT('D_',LPAD(FLOOR(RAND()*8598),8,'0')),
       UUID(),1,'tenant_fill2',0
FROM information_schema.columns c1 LIMIT 744;

ANALYZE TABLE t2;
OPTIMIZE TABLE t2;

// step 2, create select query and execute query

#!/bin/bash
# =====================================================
# Generate range scan query SQL script
# =====================================================

MYSQL_CMD="${MYSQL_CMD:-mysql -uroot -pxxx .. }"
DB_NAME="${DB_NAME:-test}"
OUTPUT_FILE="${1:-/tmp/test_query.sql}"
DIV_IN_NUM="${2:-15800}"    # division_id IN count
UNIT_IN_NUM="${3:-2323}"    # unit_id IN count

echo "=============================================="
echo "Generate range scan query SQL (local version)"
echo "=============================================="
echo "Params: DIV_IN=$DIV_IN_NUM, UNIT_IN=$UNIT_IN_NUM"
echo "Output: $OUTPUT_FILE"
echo ""

# Get emp_id for NOT IN (from ua1uxpnc tenant)
EMP_ID=$($MYSQL_CMD $DB_NAME -N -e "SELECT emp_id FROM t2 WHERE biz_tenant='ua1uxpnc' AND is_primary=1 LIMIT 1;" 2>/dev/null)
echo "EMP_ID for NOT IN: $EMP_ID"

# Extract division_id from full table
echo "Extracting division_id ($DIV_IN_NUM)..."
DIVISION_IDS=$($MYSQL_CMD $DB_NAME -N -e "SELECT DISTINCT division_id FROM t2 ORDER BY division_id LIMIT $DIV_IN_NUM;" 2>/dev/null)

# Extract unit_id from full table
echo "Extracting unit_id ($UNIT_IN_NUM)..."
UNIT_IDS=$($MYSQL_CMD $DB_NAME -N -e "SELECT DISTINCT unit_id FROM t2 ORDER BY unit_id LIMIT $UNIT_IN_NUM;" 2>/dev/null)
  
DIVISION_COUNT=$(echo "$DIVISION_IDS" | grep -c .)
UNIT_COUNT=$(echo "$UNIT_IDS" | grep -c .)
echo "Actual count: division_id=$DIVISION_COUNT, unit_id=$UNIT_COUNT"

# Build IN list string
echo "Building IN lists..."
DIVISION_IN=$(echo "$DIVISION_IDS" | awk '{printf "'"'"'%s'"'"', ", $0}' | sed 's/, $//')
UNIT_IN=$(echo "$UNIT_IDS" | awk '{printf "'"'"'%s'"'"', ", $0}' | sed 's/, $//')

# Generate SQL file
echo "Generating SQL file..."
cat > "$OUTPUT_FILE" << EOF
explain SELECT T0.emp_id AS \`staffId\`, T0.emp_no AS \`staffCode\`, T0.unit_id AS \`orgId\`, T0.division_id AS \`deptId\`
FROM \`test\`.t2 T0
WHERE T0.emp_id NOT IN ('$EMP_ID')
        AND T0.division_id IN ($DIVISION_IN)
        AND T0.unit_id IN ($UNIT_IN)
        AND T0.is_primary = 1
        AND T0.tenant_code = 'ua1uxpnc'
        AND T0.del_flag = 0
        AND T0.biz_tenant = 'ua1uxpnc'
ORDER BY T0.unit_id DESC, T0.division_id DESC, T0.emp_no DESC;

SELECT T0.emp_id AS \`staffId\`, T0.emp_no AS \`staffCode\`, T0.unit_id AS \`orgId\`, T0.division_id AS \`deptId\`
FROM \`test\`.t2 T0
WHERE T0.emp_id NOT IN ('$EMP_ID')
        AND T0.division_id IN ($DIVISION_IN)
        AND T0.unit_id IN ($UNIT_IN)
        AND T0.is_primary = 1
        AND T0.tenant_code = 'ua1uxpnc'
        AND T0.del_flag = 0
        AND T0.biz_tenant = 'ua1uxpnc'
ORDER BY T0.unit_id DESC, T0.division_id DESC, T0.emp_no DESC;
EOF

echo ""
echo "=============================================="
echo "Generation completed"
echo "=============================================="
echo ""
echo "Verifying execution plan..."
$MYSQL_CMD $DB_NAME < "$OUTPUT_FILE" 2>/dev/null | head -5

// explain result

id      select_type     table   partitions      type    possible_keys   key     key_len ref     rows    filtered        Extra
1       SIMPLE  T0      NULL    range   idx_biz_tenant,biz_tenant_empid_index,biz_tenant_divisionid_index,i_index_biz_tenant_unitid,i_as_unit_empno,i_primary_unit_div_emp_order        i_primary_unit_div_emp_order    445     NULL    97644000        0.90    Using index condition; Using where

// performance_schema

mysql> SELECT EVENT_NAME, CURRENT_NUMBER_OF_BYTES_USED/1024/1024/1024 AS G FROM performance_schema.memory_summary_global_by_event_name WHERE CURRENT_NUMBER_OF_BYTES_USED > 0 ORDER BY CURRENT_NUMBER_OF_BYTES_USED DESC LIMIT 20;
+-----------------------------------------------------------------------------+-----------------+
| EVENT_NAME                                                                  | G               |
+-----------------------------------------------------------------------------+-----------------+
| memory/sql/QUICK_RANGE_SELECT::alloc                                        | 37.493601231836 |
| memory/sql/Quick_ranges                                                     |  0.250000000000 |
| memory/innodb/buf_buf_pool                                                  |  0.127929687500 |
| memory/innodb/ut0new                                                        |  0.062576899305 |
| memory/performance_schema/events_statements_summary_by_digest               |  0.038743019104 |
| memory/performance_schema/events_errors_summary_by_thread_by_error          |  0.035051345825 |
| memory/innodb/ut0link_buf                                                   |  0.023437544703 |
| memory/performance_schema/events_errors_summary_by_user_by_error            |  0.017525672913 |
| memory/performance_schema/events_errors_summary_by_account_by_error         |  0.017525672913 |
| memory/performance_schema/events_errors_summary_by_host_by_error            |  0.017525672913 |
| memory/performance_schema/events_statements_history_long                    |  0.013560056686 |
| memory/performance_schema/events_statements_summary_by_thread_by_event_name |  0.009563446045 |
| memory/performance_schema/events_statements_summary_by_digest.digest_text   |  0.009536743164 |
| memory/performance_schema/events_statements_history_long.sql_text           |  0.009536743164 |
| memory/performance_schema/events_statements_history_long.digest_text        |  0.009536743164 |
| memory/performance_schema/table_handles                                     |  0.008850097656 |
| memory/mysys/KEY_CACHE                                                      |  0.007814511657 |
| memory/performance_schema/memory_summary_by_thread_by_event_name            |  0.007724761963 |
| memory/sql/thd::main_mem_root                                               |  0.007239428349 |
| memory/innodb/sync0arr                                                      |  0.006208233535 |
+-----------------------------------------------------------------------------+-----------------+
20 rows in set (0.00 sec)

// right explain result

1. ALTER TABLE t2 alter index i_primary_unit_div_emp_order invisible;
2. execute query;

id      select_type     table   partitions      type    possible_keys   key     key_len ref     rows    filtered        Extra
1       SIMPLE  T0      NULL    ref     idx_biz_tenant,biz_tenant_empid_index,biz_tenant_divisionid_index,i_index_biz_tenant_unitid,i_as_unit_empno     idx_biz_tenant  146     const   12628   0.04    Using where; Using filesort

It can be seen that the optimizer chose the wrong index and the wrong scan method, resulting in excessive memory consumption and slow execution.

Suggested fix:
No